By “collaboration tools” we initially mean an issue tracker and a code review system.
Over time more tools could be added (like a wiki, a more complex CMS, a project tracking system…).
The implementation strategy is to build these tools on top of git.
This means handling git as a database which the tools use as backing storage.
Since the tools are collaborative one of the main issues is how to avoid (or automatically resolve) conflicts at the git level when multiple users modify the data.
One way of doing it would be to use CRTDs to encode the tools data.
This is doable, and is the approachn taken by git-bug.
However this has the downside that the storage format is not “natural”: the git objects need to be operations and the semantic object content needs to be reconstructed by applying them sequentially.
This makes the implementation complex; also making ergonomic CLI UIs becomes harder.
For this reason we are exploring another approach: using barebones git plumbing commands as a data storage and transmission layer and resolving eventual conflicts at the application level.
To understand this better, let’s consider this scenario: project
P has two users,
U2, each with his own view of the project (
At some point user
U1 creates an issue (
I) with a comment
The issue is therefore committed by
P1 so we can call it
P therefore they will get a copy of
P1, which means they will see
P1/I. They will then “merge”
P2, including its comment
U2 adds another comment to
C2 and pushes it into
Eventually, thanks to replication, user
U1 will also receive
P2 and therefore
P2/I, and they will be able to merge
P2/I (no conflict here).
Now, imagine that
U2 tried to modify comment
C1. This operation would be illegal because
C1 has been authored by
U1. In principle the Radicle UI should not allow
U2 to do this but let’s say they did it anyway. At this point
U1 will see that
C1 in an invalid state (edited by
U2) and will refuse the merge.
The general observation is the following: in git the stotage content is supposed to be source code, which can in principle be modified by anyone and is merged with a line oriented algorithm.
However when implementing a collaboration tool (like an issue tracker) there are “business logic” rules about modifying data that should be implemented and enforced. Therefore it makes sense not to use the
git merge algorithm and instead implement an ad hoc semantic merge that will abide to those business rules and ensure that each participant observes them.
When handling the branches that store “collaboration data” Radicle should
get fetch them but never
git merge them: the logical merge should happen at the application layer and the resulting data should be then committed to the local repo.
The intention is to have a merge algorithm that never generates conflicts.
This is easy to achieve if every individually editable item is never merged “partially” (like source code, line by line).
Instead the semantics must be that the new revision wins over the previous one and replaces it entirely.
This works well if the items that can be modified are kept small, like individual comments inside an issue.
And note that in an issue comments are generally apended, with no conflict at all; we have potential conflicts only if a comment is modified.
Moreover, in most cases each “editable item” should have a single author who can modify it, making the risk of conflicts even smaller. For instance, in an issue tracker the only “shared editable” items are an issue description, its title, and its set of labels (where each label counts as an individual item).
The idea is that inside a radicle project collaboration data is a sort of “shared CMS”, viewable and editable by each Radicle user (but of course project members could have different permissions on specific items).
While for source code merge operations are explicit, for this CMS they should be implicit and happen as soon as a git fetch operation receives new data from a peer, provided that the received data abides to the business rules.
The radicle P2P gossip replication (based on git fetches) will disseminate the data on all repositories, and since the merge algorithm will be deterministic and conflict free the system will be eventually consistent because it should naturally converge to the same state in which each user has the same data (including every update from every other user).
In practice these “automatic merges” will move data from the remote branches into the local “master CMS” branch, which will in turn be mirrored into the remote branches of other peers.
As a sort of guideline, we could use the following building blocks to represent collaboration data:
As leaves, files containing a combination of YAML, TOML, JSON and Markdown text (whatever we find more ergonomic to represent the needed data). These files could represent, for instance, issue comments or descriptions. Ideally each individual file should be an atomically editable entity (as described above when modified they will not be merged but fully replaced). Importantly, each of these items could be signed to prove its author (see the discussion about identity for a reasoning about signatures).
To contain leaves, “tree objects” (directory trees) with a specific structure. for instance, the comments of an issue could be files with numbers as names all collected in the same directory, and the issue could be a directory with a file named
descriptionand a directory named
Groups of objects that should be generally downloaded together should be collected in a tree pointed to by a specific branch. For instance, each issue should have its own branch that represents the evolution of the issue, and to download the issue it would be enough to fetch that branch.
A “section” of this “shared CMS” would then be a group of branches, easy to organize because branches can be namespaced; for instance, the issue tracher could be implemented by branches with the name