Software Releases on Radicle

kim · April 4, 2022, 10:18am

Git treats tags as being globally unique, ie. refs/tags is a common namespace regardless of where the tag came from (unlike branches, which are namespaced as refs/remotes/[the origin]/*). This has to do with the traditional way of verifying them (which btw includes the tag’s name, which is part of the object headers).

link permits to replicate tag refs in a namespaced manner, that is they end up under refs/remotes/xyz/tags. That, however, means that clients need to be careful to not confuse git when creating project checkouts: if the tags stay namespaced, git will not consider them as being tags (eg. for git-tag -l, or when decorating git-log output). If they do not stay namespaced, the workflow described above is unlikely to work, because the maintainer tags will conflict.

Also consider that the project may be published to some traditional mirror (eg. Jit-Hub), so at the end of the day the must be a single release tag (under refs/tags).

So that’s a multisig problem, which is fairly awkward to solve when you don’t have a timestamping service. Essentially, a proposal would need to be published, which then is signed by the eligible keys, and ultimately “finalized” into the published release tag (which in turn contains a proof of those signatures).

This kind of thing can be implemented using collaborative objects, from which a tag can be synthesized (although the tag itself will still be signed by only a single key).

yorgos · April 6, 2022, 5:39pm

Fantastic, thanks so much for this @kim !

One clarification if you wouldn’t mind, please:

Do I understand correctly that this “proposal” (regardless if that is what it is eventually called) would be the collaborative object?

How could different projects then select different policies around e.g. “how many signatures are needed / which signatures are optional / etc.” and when the proposal can be “finalized” ? Would all this be captured in the collaborative object schema?

kim · April 7, 2022, 7:07am

Those are good questions!

The current model of anchoring the cob history is refs/cobs/<type>/<id>, which would suggest that each “release” is its own object. That could make sense, but has one drawback: the ordering of the releases would depend on some convention (e.g. a version number) found inside the object (the id above does not yield an ordering).

Thus, I think it might be preferable to only have a single history of “releases” per project.

I wonder what kind of processes you’re seeking to capture.

Traditionally, maintainership of up to the largest-scale free software projects very much equates to the authority of declaring what a release is. That is, in a DVCS like git anyone can tag a particular tree as a release – and sometimes this is very useful, think of “distributions” maintaining their own patchsets, or “long term stable” releases maintained by different people. Yet the momentum comes from accepting a single person, or a small group of persons as authoritative.

Unlike traditional git, link expresses this relationship explicitly: by carrying the identity document of the “upstream” project, it is implied that some random tree considers itself within the lineage of that project.

Surely some projects may prefer to mirror corporate structures: some release team is responsible for conducting all the preparatory work, and finally gets to sign a release tag (probably with a shared “release key”). I’m not personally interested in this model, as it’s as broken as the corporate structures themselves: the “release managers’” performance indicator is to ship non-broken releases, but it’s not their responsibility (nor even competence) to make it so the code actually works. That’s the same problem as with dedicated QA teams; a waste of time and resources.

That being said: link’s identities are modelled after TUF, with the express intention of enabling delegation (by the root keys to some other set of keys which have narrower privileges). That has some security implications, so the process would need to be modelled carefully.

As a middle ground between these two, I could imagine a workflow where releases require signoff by both the maintainer keys as well as keys held by other entities. For example, CI systems may indicate that the build passed from the proposed tip, or packagers may confirm that their pipelines work.

For this kind of thing, I would recommend that the policy is expressed in the release object itself. Ie. when I propose a new release, I also specify which additional keys or identity chains are expected to sign off. This has the advantage of not requiring additional revocation mechanisms – the statement is only valid for this specific release, which can easily be amended by the next one.

yorgos · April 8, 2022, 9:46am

+1

I agree on both parts: I also think it is broken and that some teams will want to follow it regardless. Hopefully - because radicle is not so much solving corporate problems - this will be a limited use case.

I think this here is generic enough to cover a range of different workflows / policies, without us necessarily caring about the meaning of what each signature means in the context of the release. That can probably be captured elsewhere.

I think this is fine. I do see a tradeoff in that it is not so easy to say “this is what our releases look like” (considering that two consecutive objects may be radically different). However, I also see ways around that: for example, teams could document “this is what our releases look like” in some README, etc.

On this point, I do understand that IDs do not help with an ordering, but I don’t yet understand how the history of “releases” will be made possible… Would this fall on the “reader” to read all objects, find some version number within and make sense of the ordering that way ?

kim · April 8, 2022, 10:33am

I’m not sure what you mean by “look like” – the schema describes what a collaborative object looks like. The schema itself can change, but it wouldn’t for what I proposed: there would simply be a list-shaped element which describes what keys are expected to sign, and the signatures.

If “releases” is only one collaborative object, then the CRDT properties yield an ordering.

yorgos · April 8, 2022, 12:21pm

For example, when a new maintainer is added/removed, the list of signatures would change, right? Or if some team decides at some point that a new “release manager” needs to sign releases (instead/as well). Those aren’t necessarily schema changes (if I understand correctly), but the data changes - because the policy around “what a release looks like” changed.

ah, ok, I understand now. I thought each release would be a new collaborative object (conforming to a “release” schema), but only one instance of the collaborative object that incorporates all information around the releases does make sense.

With that, I think I’m good with all clarifications for now (and thank you for those!) and I guess we’d need to start making this proposal more concrete in order to invite broader feedback, etc. ? What do you think would be good next steps for this discussion ?

kim · April 11, 2022, 7:55am

Well I think you might want to start designing the schema, which will help answering the remaining questions.

I like to do this in some kind of typed pseudo-code, so as to more concisely capture the desired semantics. Here’s the simplest thing I could come up with:

struct Releases {
    /// The project we're talking about
    urn: Urn,
    /// Releases are an ordered list
    releases: Vec<Release>,
}

/// A point in the git history which shall
/// be tagged as a release after it was 
/// approved by some number of collaborators.
struct Release {
    /// The commit hash to be released
    commit: Oid,
    /// Name of this release, eg. a version number
    name: String,
    /// Some arbitrary blurb, eg. release notes
    description: String,
    /// The signing obligations to render this 
    /// release valid
    valid_when: Set<Either<PublicKey, Urn>>,
    /// The actual signoffs, initially empty
    signed_off_by: Set<Sob>,
}

struct Sob {
    /// URN of the person signing, optional
    urn: Option<Urn>,
    /// The actual key used
    key: PublicKey,
    /// Signature over the `commit` hash of 
    /// the `Release`
    sig: Signature
}

I’m not sure if it is clear, so I’m just going to reiterate: a cob / CRDT is just a datastructure. Its properties give us an ordering (of edits), and we can guarantee that it conforms to the schema. The rest is up to an application written to interpret this data.

From the above, I think it’s easy to see that we can synthesize a git tag which includes the description as well as the set of sobs. Since git does not have a native way to express “multisigs”, this tag would be signed by whoever creates it, and some custom tooling would be necessary to allow verifying the sobs knowing only the git history. I would suggest to just encode the sobs as git trailers, and verify the signatures element-wise.

A few things of the above could be refined, or require further consideration, eg.

Whenever a URN is mentioned, should it also refer to its revision at the time of creation? This can serve as an optimization, but also protect against rewrite attacks.
The valid_when set can obviously be modified after creation. This is either a case for @alexgood 's ACL language, or the application needs to commit on a semantics (eg. first-writer-wins).
How to express that the release has been “finalized”? Does the tag refer to the cob, or vice versa, or both?
When there is more than one signature, there are various ways in which such a release object could be in some kind of partially-valid state. This is an opportunity to come up with an improved verification UI (which shouldn’t be too hard if git / GPG is the benchmark).

yorgos · April 26, 2022, 7:20am

thanks for putting the draft together @kim !

I wonder if this should be an ordered list. (e.g. what if 1.2.0 has been released, then 1.3.0 and then we need to ship 1.2.1 with a hotfix). Could this be an unordered list and a date be added to the Release struct perhaps? (ordering could also be an A-Z / Z-A ordering of the release name as well)

I am not sure I fully understand the rewrite attack here? Perhaps an example would help?

I think first-writer wins makes sense here and the application should enforce it (i.e. ignore changes made to valid_when).

It seems to me that the multi-sig part is how a release is “finalized” (the humans say so). Unless I am misunderstanding the question, it seems to me the cob referring to the “tag” makes sense. (well, s/tag/commitid/ because git tags can be moved to a different commit hash and I’m not sure we want that, right?)

Sounds like you’re referring to the application that displays / lists the releases, right? If so, I would also expect some kind of status for each Release object it displays that explains whether it satisfies the valid_when constraints.

kim · April 27, 2022, 7:37am

Maybe Map<String, Release>.

If we refer to a signer by URN, there is no causal relationship to their “sigchain”. So, we could be presented with a signing key which is only in a forked history of that identity. Or, the key was revoked at some point, but we don’t know if that was before or after the release has been singing with it. We can greatly simplify these validation obligations by referring to (Urn, Revision) instead.

For the project itself there is a connection made every time a release is finalized, which is why it would be useful to include the hash of the release object in the tag.

(Obviously, we cannot have the tag hash in the release object then without modifying it. So it would not be possible to know if a release was finalized by looking only at the release object. I guess that’s fine)