CRDTs and other matters of commutativity

kim · February 20, 2020, 9:46am

Continuing the discussion from Radicle Collaboration Tools:

Creating a new topic for elaboration on matters of commutativity

I guess what it boils down to, at least in my head, is that a git history (a hash-linked DAG) is already a state-based CRDT for most practical purposes: there is always at least one topological order, and using timestamps as a secondary ordering works in non-pathological cases. The “C” in CRDT stands for “convergent”, though, not for “commutative”.

If we do not utilise this “native timestamping”, then the data itself needs to carry as much information as is necessary to yield the desired semantics. Maybe we can get away with simple Lamport clocks, but intuitively this would limit us to only a few primitives. Although that might be enough, idk.

alexgood · February 20, 2020, 2:09pm

This paper might be relevant: Merkle-CRDTs

The “C” seems to have a lot of different meanings depending on who is writing the paper, here it is “conflict free”,I’ve also seen papers use “commutative” and “convergent”.

fintohaps · February 20, 2020, 3:15pm

I just wanna clarify that what you’re saying is that we could use git’s commit history as the backing method for converging state?

And also to make sure I’m following your thought-process, would comments on an issue be a series of commits? I guess my problem is that I’m unsure of how we’re imagining this data model to look like I might not be reading between the lines enough

And if all of the above is the case are we tying ourselves to a git implementation then or do you think we can make this malleable?

kim · February 20, 2020, 4:26pm

Yes

Yes. I think that, if a “comment” is just a blob, every comment should go into its own commit. If it is instead an operation (like in git-bug), multiple such operations can be batched together in a single commit – the order of the operations within the batch is determined by the data structure that models them, and the order with respect to other people’s comments (on the same issue) is determined by the git history.

Well, yeah, we’re debating essentially if the information necessary to establish ordering should be stored inside the datastructure (that’s usually how CRDTs are described), or externally (ie. by piggybacking on the VCS).

Additionally, @mmassi is challenging that there needs to be any order at all

I don’t think so, but I haven’t tried to generalise beyond what I know about pijul. There, patches have a notion of dependencies, ie. they can reference other patches which need to come first (I’m unsure about the theory behind this, tbh). The way “tags” work in pijul currently is that they are simply empty patches which contain as dependencies all the other patches seen so far.

So, it seems to me that establishing a partial ordering external to the datastructures we want to use could be left as an implementation detail of the underlying VCS.

I have only skimmed the paper @alexgood posted, but it does seem to formalise what I’m handwaving about.