From Board Games To Hybrid Logical Clocks

この記事の日本語版はこちらです。

Modern software often requires that data exists simultaneously across multiple frontend and backend platforms. Common use cases include collaborative document editing and offline-first app support. Let’s call this the problem of data synchronization.

Solving Data Synchronization

Recently, I decided an app I’m working on does require offline access to its data. There are several third-party synchronization APIs like Firebase and MongoDB Atlas, but their support for my framework of choice is either lacking, or the service itself is absurdly expensive for non-trivial amounts of data. It’s time to solve this one on my own.

In researching a homebrew implementation, I stumbled upon conflict-free replicated data types (CRDTs). A CRDT is a data structure with a built-in heuristic for merging unordered changes to its data, crucially without further input from the user. It does this by storing extra metadata for each piece of data in the CRDT.

For example, a CRDT might store the timestamp at which the latest update occurred. Future updates are applied if and only if the timestamp advances. Updates that arrive too late may be dropped, but that’s the “conflict-free” aspect in play.

This applies more generally, too. A CRDT receives events out-of-order, then proceeds to order them using the event’s metadata, storing only the “greater” event.

But ordering has an edge case. What happens when the ordering for two events is identical?

A Healthy Dose of Déjà Vu

Actually, we’ve all seen this problem before.

Let’s say you’re playing a board game that specifies who goes first using the following rules:

  1. The youngest player goes first.
  2. If the same age, the player who has played the highest number of times goes first.
  3. If played the same number of times, the player whose name is greater goes first.

If we assume players have unique names, then the last rule guarantees a starting player. (Assuming unique names is safe because you can always just assign everyone a random UUID “name”.)

When you play a board game, the game’s state is often represented by cards or tokens on the table. But before the game starts, there is no physical representation of state yet, so the game depends on information stored on the players themselves.

My observation is that this hypothetical board game is a simple CRDT that stores one piece of information - the player that starts the game. Instead of a board or cards, it stores the necessary ordering metadata in players' heads because the game hasn’t started yet.

The first rule orders by birth date, which is literally a timestamp. But the game doesn’t fail to start just because twins are playing - it has more rules, until a final rule is guaranteed to succeed, albeit arbitrarily.

A CRDT can do the exact same thing as a board game - it just stores information on top of a simple timestamp, a sort of “enhanced timestamp”.

Hybrid Logical Clocks

There are a couple variants to these enhanced timestamps, and a common one is the hybrid logical clock (HLC).

HLCs store three pieces of information: a timestamp, a version number, and a unique node name. And in a miraculous twist of fate, an HLC is defined to sort events using rules eerily similar to our board game:

  1. The later event is selected.
  2. If at the same time, select the event with the highest version.
  3. If the versions are the same, select the event whose node name is greater.

Just like a board game, by storing multiple pieces of information and employing fallback comparison rules, a CRDT resolves the case where events occur simultaneously.

Conclusions

There are actual, production systems out there that store HLCs in order to implement distributed CRDTs. By understanding the individual components of such a system, we engineers can make better tradeoffs, like whether or not to use a third-party service or to build just the aspect we need for a particular application.

Note that there are some implementation details about how and when to update an HLC in order to enable data synchronization, but that’s not the goal of today’s blog post. If you’re interested in the details, I published a Dart HLC for use in my app. I also recommend reading this Martin Fowler article for an in-depth explanation.