Implementing collaboration tools (issue tracker, code reviews, and more) in Radicle
Collaboration data representation: the merge problem
By “collaboration tools” we initially mean an issue tracker and a code review system.
Over time more tools could be added (like a wiki, a more complex CMS, a project tracking system…).
The implementation strategy is to build these tools on top of git.
This means handling git as a database which the tools use as backing storage.
Since the tools are collaborative one of the main issues is how to avoid (or automatically resolve) conflicts at the git level when multiple users modify the data.
One way of doing it would be to use CRTDs to encode the tools data.
This is doable, and is the approachn taken by git-bug.
However this has the downside that the storage format is not “natural”: the git objects need to be operations and the semantic object content needs to be reconstructed by applying them sequentially.
This makes the implementation complex; also making ergonomic CLI UIs becomes harder.
A different approach
For this reason we are exploring another approach: using barebones git plumbing commands as a data storage and transmission layer and resolving eventual conflicts at the application level.
To understand this better, let’s consider this scenario: project P
has two users, U1
and U2
, each with his own view of the project (P1
and P2
).
At some point user U1
creates an issue (I
) with a comment C1
.
The issue is therefore committed by U1
in P1
so we can call it P1/I
.
User U2
tracks P
therefore they will get a copy of P1
, which means they will see P1/I
. They will then “merge” I
inside P2
, including its comment C1
.
Then U2
adds another comment to I
, C2
and pushes it into P2
.
Eventually, thanks to replication, user U1
will also receive P2
and therefore P2/I
, and they will be able to merge C2
into P2/I
(no conflict here).
Now, imagine that U2
tried to modify comment C1
. This operation would be illegal because C1
has been authored by U1
. In principle the Radicle UI should not allow U2
to do this but let’s say they did it anyway. At this point U1
will see that P2/I
has C1
in an invalid state (edited by U2
) and will refuse the merge.
The general observation is the following: in git the stotage content is supposed to be source code, which can in principle be modified by anyone and is merged with a line oriented algorithm.
However when implementing a collaboration tool (like an issue tracker) there are “business logic” rules about modifying data that should be implemented and enforced. Therefore it makes sense not to use the git merge
algorithm and instead implement an ad hoc semantic merge that will abide to those business rules and ensure that each participant observes them.
When handling the branches that store “collaboration data” Radicle should get fetch
them but never git merge
them: the logical merge should happen at the application layer and the resulting data should be then committed to the local repo.
The intention is to have a merge algorithm that never generates conflicts.
This is easy to achieve if every individually editable item is never merged “partially” (like source code, line by line).
Instead the semantics must be that the new revision wins over the previous one and replaces it entirely.
This works well if the items that can be modified are kept small, like individual comments inside an issue.
And note that in an issue comments are generally apended, with no conflict at all; we have potential conflicts only if a comment is modified.
Moreover, in most cases each “editable item” should have a single author who can modify it, making the risk of conflicts even smaller. For instance, in an issue tracker the only “shared editable” items are an issue description, its title, and its set of labels (where each label counts as an individual item).
Data propagation in the P2P mesh
The idea is that inside a radicle project collaboration data is a sort of “shared CMS”, viewable and editable by each Radicle user (but of course project members could have different permissions on specific items).
While for source code merge operations are explicit, for this CMS they should be implicit and happen as soon as a git fetch operation receives new data from a peer, provided that the received data abides to the business rules.
The radicle P2P gossip replication (based on git fetches) will disseminate the data on all repositories, and since the merge algorithm will be deterministic and conflict free the system will be eventually consistent because it should naturally converge to the same state in which each user has the same data (including every update from every other user).
In practice these “automatic merges” will move data from the remote branches into the local “master CMS” branch, which will in turn be mirrored into the remote branches of other peers.
Actual primitives
As a sort of guideline, we could use the following building blocks to represent collaboration data:
-
As leaves, files containing a combination of YAML, TOML, JSON and Markdown text (whatever we find more ergonomic to represent the needed data). These files could represent, for instance, issue comments or descriptions. Ideally each individual file should be an atomically editable entity (as described above when modified they will not be merged but fully replaced). Importantly, each of these items could be signed to prove its author (see the discussion about identity for a reasoning about signatures).
-
To contain leaves, “tree objects” (directory trees) with a specific structure. for instance, the comments of an issue could be files with numbers as names all collected in the same directory, and the issue could be a directory with a file named
description
and a directory namedcomments
). -
Groups of objects that should be generally downloaded together should be collected in a tree pointed to by a specific branch. For instance, each issue should have its own branch that represents the evolution of the issue, and to download the issue it would be enough to fetch that branch.
-
A “section” of this “shared CMS” would then be a group of branches, easy to organize because branches can be namespaced; for instance, the issue tracher could be implemented by branches with the name
radicle-cms/issues/issue-xyz
.