Event Log Workflow

Why event logs are used by data systems and accountants

Mature data systems use an event log mechanism, which means they track every change.

Let’s consider the canonical example of a bank with a database of its customers’ financial records. A simple implementation of this database would be a row for each customer account and columns for the account id and the account balance. Whenever a transaction occurs, the account balance is adjusted. Each transaction information, once it’s been accounted for, is promptly discarded to save space.

This implementation is problematic for several reasons:

  • What if the customer wants to verify the accuracy of the account balance? The simplest way to do this would be to provide a record of each transaction since the inception of the account.

  • What if the credit card is stolen three days ago and the bank wants to undo those recent transactions? Since none of the transaction information was stored permanently.

  • How do you handle concurrent updates to the same account? If two threads both see $100 as the existing balance and they each try to update the balance to a new number, will you end up losing one of the updates?

Event log is a simple but powerful concept where you store each event permanently and then later generate views based on those events. Much like how an accountant never “erases” a record when a debt is paid off, an accountant only adds a new entry to the general ledger.

The only two downsides are the storage cost of keeping each event and computation cost of generating each view. In real-world systems, you may end up compacting old events into a snapshot. Likewise, an accountant might calculate the current balance by looking at last year’s audited account balance and then tallying all the transactions in the current year. In theory, one could generate it by tallying all the transactions since the inception of the audited firm, but that would be excessive work with negligible benefit.

Using event log to “Get Things Done” (GTD)

We can use the same event log methodology and use it to keep track of everyday activities to maximize productivity and ensure correctness.

The actual category names aren't important but I'll tell you mine to make this description more concrete.

The first group is called in progress and these are the tasks that I’m working on right now. Typically I'm only going to have one task that I’m working on at any given point. Sometimes, I might be waiting for someone else before I finish a task (e.g. getting a code review) and in that case it's okay to have two tasks in progress. The goal is to limit the cognitive load by focusing your attention on one thing and doing one thing well: you might recognize this as the Unix philosophy.

The second group is called upcoming -  this is the work I haven't done yet. Each of these tasks should be discretely defined with a clear definition of what “done” means. And if I can't precisely define the task yet, that’s the first sub-task that I will do for that task. The goal is to minimize the number of large hairy tasks that I might feel inertia to starting and instead have many small tasks that I can accomplish in a day or so.

The last group is called completed and these are all the tasks that I’ve finished previously.  

What's important to note here is that I never delete a task. The lifecycle of a task is that it starts in the upcoming group and then it goes to the in progress group and then finally it lands in the completed group.

If one of the tasks that I'm working on isn't needed anymore, I won’t actually delete it. Instead I’ll move it to the completed group and mark it with the keyword “skip”. If I find out a month later that I actually do need to do the task, I haven’t lost any information.

Lastly, I like doing this in a Google doc, it keeps things really simple because there's no fancy UI to distract me. Furthermore, everything is tracked in the revision history.