CRDT - Conflict-free replicated data type

CRDT stands for "Conflict-free Replicated Data Types". It is a data structure that can be replicated and synchronized between multiple devices or nodes without the need for complex conflict resolution. CRDTs are designed to address difficult problems in distributed systems, especially in cases where network partitioning or latency may occur.

Let's imagine a common problem: multiple people collaborating online to edit the same document (such as Google Docs). Each person can make edits on their own device, and these changes need to be synchronized to all other devices. But what if two people make edits at almost the same time in the same location? How should we handle this conflict? This is a typical conflict problem.

Traditional solutions may involve using some form of locking mechanism or conflict resolution strategy, but these methods often make the system complex and perform poorly in cases of network latency or disconnection.

CRDTs provide a more elegant solution. The design idea behind CRDTs is that if we can create a data structure where all possible operations on it are commutative (meaning the order of operations doesn't matter and the result is the same), then we can avoid conflicts. This way, each device can freely and independently perform operations without waiting or locking, and then synchronize these operations to other devices. As long as all operations eventually reach all devices, the data state on all devices will be consistent.

CRDTs are a type of data structure used to simplify distributed data storage systems and multi-user applications. In many systems, copies of certain data need to be stored on multiple computers (i.e., replicas). Examples of such systems include:

Mobile applications that store data locally and need to synchronize that data to other devices of the same user (e.g., calendars, notes, contacts, or reminders).
Distributed databases that maintain multiple replicas of data in the same data center or different locations, so that the system can continue to function properly when some replicas are offline.
Online collaborative software, such as Google Docs, Trello, Figma, etc., where multiple users can simultaneously make changes to the same file or data.
Large-scale data storage and processing systems that replicate data for scalability on a global scale.

All these systems need to handle the situation where data may be modified simultaneously on different replicas. Broadly speaking, there are two ways to handle such data modifications:

Strong consistency replication: Replicas coordinate with each other to determine when and how modifications should be applied. This approach supports strong consistency models such as serializability and linearizability. However, waiting for this coordination can reduce the performance of these systems. Additionally, the CAP theorem tells us that when a replica becomes disconnected from the rest of the system (e.g., due to network partitioning or being a mobile device with intermittent connectivity), no data changes can be made to that replica.
Optimistic replication: Users can independently modify data on any replica, even if the replica is offline or disconnected from other replicas. This approach achieves maximum performance and availability, but conflicts may arise when multiple clients or users simultaneously modify the same data. These conflicts then need to be resolved when communicating between replicas.

CRDTs are used in optimistic replication systems to handle conflicts. CRDTs ensure that no matter what data modifications are made on different replicas, the data can always be merged into a consistent state. This merging is automatically performed by CRDTs without the need for any special conflict resolution code or user intervention.

Furthermore, an important feature of CRDTs is their support for decentralized operation: they do not assume the use of a single server and can be used in peer-to-peer networks and other decentralized environments. In this regard, CRDTs differ from the algorithms used by Google Docs, Trello, and Figma, which require all communication between users to go through servers.

To achieve this, CRDTs need to handle modifications to data in a special way. For example, for a CRDT set, adding and removing elements may require remembering some additional information so that conflicts can be handled correctly when synchronizing with other devices. One common type is the "Grow-only Set". In this type of CRDT, you can add elements to the set but cannot remove them. The benefit of this design is that the addition operations are commutative, meaning that the final result is the same regardless of the order in which elements are added.

However, in real-life scenarios, we often need to remove elements from a set. To address this issue, another type of CRDT called the "Two-phase Set" is used. In this type of CRDT, each element has two states: added and removed. The initial state is added, and when you remove an element, its state changes to removed. This means that even if other devices simultaneously add the same element, as long as one device marks it as removed, the element will be considered removed. This also means that removal operations have priority in this set. This design allows conflicts to be handled correctly when synchronizing between different devices.

In general, the design of CRDTs handles data modifications in a way that allows operations to be performed in parallel on different devices and ensures data consistency during merging. This makes CRDTs well-suited for distributed systems and multi-user applications, as they can handle complex problems arising from parallel operations and network latency.

CRDTs come in two types, both of which provide strong eventual consistency: operation-based CRDTs and state-based CRDTs.

State-based CRDTs are usually easier to design and implement. They only require some form of gossip protocol for communication at the underlying level. The drawback is that the entire state of each CRDT must eventually be transmitted to every other replica, which can be costly. In contrast, operation-based CRDTs only transmit update operations, which are typically small. However, operation-based CRDTs require guarantees from the communication middleware that operations will not be dropped or duplicated when transmitted to other replicas, and they are transmitted in causal order.

CmRDT
Operation-based CRDTs are also known as commutative replicated data types (CmRDTs). CmRDT replicas propagate their state by transmitting update operations. For example, a CmRDT for a single integer can broadcast operations like (+10) or (-20). Replicas receive updates and apply them locally. These operations are commutative. However, they are not necessarily idempotent. Therefore, the communication infrastructure must ensure that all operations on one replica are delivered to other replicas without duplication, but in any order.
CvRDT
State-based CRDTs are known as convergent replicated data types (CvRDTs). Unlike CmRDTs, CvRDTs send their complete local state to other replicas, where the states must be merged using commutative, associative, and idempotent functions. The merge function provides a join for the states of any pair of replicas, so that the collection of all states forms a lattice. The update function must monotonically increase the internal state according to the same partial order rules as the lattice.

In conclusion, CRDTs are designed to handle data modifications by using special handling that allows operations to be performed in parallel on different devices and ensures data consistency during merging. They provide an elegant solution for distributed systems and multi-user applications, as they can handle parallel operations and complex issues caused by network latency.