RFC: CoCo Event Feed Storage

mmassi · July 2, 2020, 10:37am

RFC: Event Storage

Author: @massimiliano-mantione
Date: 2020-07-02
Status: draft

Motivation

In Radicle to exchange information in code collaboration we are defining the notion of “event feeds”.

Each Radicle device exposes one or more of these feeds so that other devices can track them and fetch events from them.

The main idea is that each of these feeds is a git commit chain (git branch), with one commit for each event.

Each event (commit) would store the event payload and would also have one link to every other event that the it depends upon.

These dependencies can be of different kinds:

simple temporal ordering with no implied causality (like a comment stating to have been written after another comment)
causal ordering with a strict semantic meaning, like:
- one comment is the reply to another comment
- an entity is signed using a key belonging to another entity
direct content dependencies, like:
- a comment quoting another one or reusing an attachment introduced by another one
- a comment referring to a specific commit in a code review (we should discuss how this would work)

In all of these cases the idea is that the event “needs” the referred events, and must not be stored without them.

The goal of this RFC is to specify the details of how these event chains are stored.

Common Event Metadata

To allow authorship attribution and signature verification every event must carry the following pieces of metadata:

The cryptographic signature of the event hash
The public portion of the signing key
The URN and revision of the entity (user) that is the event author and signer
Maybe the event timestamp, with these points open for discussion:
- is it needed?
- should it be used to check signatures against key revocation?
The feed URN
The event hash (to give the event a unique “id” inside the feed)
The hash of the previous event in the feed (to logically build the event chain)
The “type” of event (open for discussion and not interpreted by radicle-link except for entities, see below)
For each “dependency” (event this one depends on), the dependency
- URN
- Hash
- Timestamp (redundant, open for discussion, but it would make soundness checks easier)
- Git OID of the referred event commit (redundant, open for discussion, but it would make checks and retrieval faster)

The event hash must be computed against:

the event payload, including eventual attachments
the event metadata excluding the signature

The signature signs the event hash.

To verify a signature the following steps are necessary:

the hash must be recomputed and checked
the signature must be checked
the signing key ownership must be checked (at the time when the signature happened)

The described metadata must be stored in a uniform format for every event feed so that the same software library (radicle-link) can be used to handle, verify and transfer every feed.

It could be stored in the event commit message (maybe as commit trailers), or in a blob with a well defined name inside the tree pointed to by the event commit.

Event Payload

Each event payload is application-specific and radicle-link does not need nor try to interpret it (except for entity revisions).

Therefore the payload must be stored as a blob in a way that allows radicle-link to check its hash but to otherwise ignore it.

There are two main ways in which this could be done using git:

storing it in a blob with a well defined name inside the tree pointed to by the event commit
storing it in the git comment itself, likely together with the event metadata

Parent Commits

For each dependency described in the metadata the event commit must contain a parent commit, which in turn must point to the commit containing the referred event.

This turns each event feed into a full Merkle DAG and guarantees that every needed dependency is fetched and stored locally, even if it belongs to a different feed.

In practice the device git monorepo becomes a database that stores all these DAGs in an optimal way, storing each event only once even if it is referred to by multiple feeds.

Storage Strategies Considerations

Whether to store event payload and-or metadata in git trees or directly inside the commit message is an implementation detail, the system would work correctly either way.

As with every implementation detail, to make a choice we should look at material implementation and runtime tradeoffs.

Storing as much as possible in the commit message would have these advantages (in no particular order):

Less clutter in the git object DB: the commit object must be stored anyway but we would avoid one needless tree and blob object per event in the common case
Thanks to this, faster object retrieval and transfer (less object links to follow)
No need to come up with canonical blob names inside the event tree object
Tooling to operate on or inspect object feeds would be simpler to implement (even a plain git log would work and provide meaningful output)
Also CLI tools to commit events would be simpler (most events would technically be plain empty commits)

The downside that I can see is that we would need to define how to separate the metadata from the actual payload.

I would propose a sort of “header-based” system, where a set of headers provides the metadata and all the bytes after that are the payload.

If an application needs to store blobs of binary data inside events those events would contain a tree object, where each blob referred by the tree would be an attachment, and the “root payload” would be free to refer to attachments in an application specific way.

I would still propose to list attachments in the metadata because they should be hashed and signed, therefore radicle-link should be aware of them.

Maybe we could consider a downside the fact that we would not specify the metadata as git commit trailers: having them as headers (before the payload instead of after) will make them unavailable as trailers.

And maybe the fact that these “fat” commit messages might not survive being transferred by email could be a problem.

However I would consider Radicle as an alternative to using email for code collaboration and I would not consider trying to transfer radicle event feeds by email, one commit at a time, as a valid use case.

The point is: commit messages could contain long text lines, it should not be a problem.
If an application needs binary blobs it can use attachments anyway, but having the possibility to skip tree blobs completely for simple events could have many positive effects and no critical downside.

Actual Event Metadata Storage

Regardless of the above considerations I would format event metadata in a way similar to RFC 822 e-mail headers, just like git commit trailers.

If we’ll store the event payload inside a blob these will be actual commit trailers, otherwise they would be like content headers.

Metadata properties would be like this (using angle quotes for variable content, details should be discussed):


signature: <SIGNATURE>
signing-key: <PUBLIC-KEY>
author: <EVENT-AUTHOR-AND-SIGNER-URN> <REVISION> <HASH>
timestamp: <MILLIS-FROM-UNIX-EPOCH>
feed: <FEED-URN>
hash: <EVENT-HASH>
previous: <PREVIOUS-EVENT-HASH>
type: <NAMESPACED-EVENT-TYPE>
dependency: <URN-1> <HASH-1> <TIMESTAMP-1> <OID-1>
dependency: <URN-2> <HASH-2> <TIMESTAMP-2> <OID-2>
dependency: <URN-3> <HASH-3> <TIMESTAMP-3> <OID-3>
attachment: <PATH-IN-TREE-1> <HASH-1>
attachment: <PATH-IN-TREE-2> <HASH-2>

We should decide whether the signature would require some kind of ASCII armouring and shorter lines or not.

But, again, see the above comment about transferring radicle event feeds by email: this should not be a supported scenario.

Radicle event feeds are expected to be transferred using git fetch operations between Radicle devices.