Thanks @mmassi, many good points!
Hm, is it? As long as the security of the system rests on a single, eternally
valid keypair, I am having difficulties to see how anything is solved.
Specifically key theft is not addressed at all.
There are a few different considerations, which led me to the conclusion that
both inlining as well as indirection should be supported:
-
Pseudonymity
It is a desirable property of the system that the user can choose in which
context they want to reveal which information about themselves. Using an
identity for only one, a subset, or all projects the user is participating
in should be the user’s choice.
-
Interoperability and Adoption
We have multiple avenues of interoperability with other systems, including
plain old git storage hosting. It is desirable to be able to try out the
system without forcing a concerted migration effort. Inlining removes the
requirement to be able to resolve a URI-based indirection.
-
Consistency and Availability
There are good reasons to optimise for availability of all information within
the context of one project, and also to optimise for cross-project consistency
at the expense of availability. I think the tradeoff should be made at a
higher level.
-
Uniformity
At its core, radicle-link
is simply a fancy way to distribute and discover
version control repositories on a network. That is, the only way data can
exist is in the context of a repository. The concept of a “project” provides a
way to address and correlate repositories. I am not at this point convinced
that introducting another primitive reduces the overall complexity or
indirection.
I agree, however, that there is no reason an identity statement needs to bind
itself to any repository (or project) in particular, we only require the other
direction: a project binds one or more identities. Since we can index into a
repository via branches, the identity statement may in fact exist under multiple
project namespaces.
My first idea was also to use a public key. I discarded this because of the
following reasoning: what is this keypair used for?
-
Nothing but establishing the user ID.
This would mean we can throw away the secret key after generation. We could
just as well use a random sequence of bytes, or a UUID in this case.
-
Signing updates to the user profile (including other keys).
This is basically how GPG works, if used properly. The key now has to be kept
in secure offline storage, which is terrible because people don’t do that for
convenience reasons.
-
Signing code.
That’s the worst option, because a) it encourages key transfer, b) the key is
used all the time, increasing vulnerability, c) revocation changes the
identity, and d) key rotation requires a web-of-trust, or otherwise external
PKI we’d be tightly coupled to, or have to replicate within our system.
In summary, I think the only property we need for a “user ID” is that it is some
kind of stable identifier, paired with some rules for how to resolve it. What it
resolves to should indeed be some kind of document which contains claims of
properties of the described subject. Inlining proofs of those properties
should not generally be necessary, but possible for certain application-level
needs, e.g. two-way attestation of the on-chain identity or a GPG key also
present on external PKI. Note that this is essentially convenience: we could
also require that an online challenge against those external systems has to be
performed in order to obtain ownership proofs.
What we do need, however, is ownership proofs of those claims themselves (and by
extension the user ID, if we think of it as a content address of the document).
This is where the device keys come into play.
We can go into more detail as to why they exist, but for now I would only like
to point out the following: TLS ensures that a remote machine is operated by a
certain legal entity, and that the data it serves is not altered while in
transit between the remote machine and ours. It does not prove anything about
the data itself: if we put a piece of data on the server, and then request it
back, we should end up with the same sequence of bytes. For this, we either have
to trust the remote machine and its operator, or put our own integrity
protection in place. Which is exactly what we need in a peer-to-peer system,
because we’re talking to intermediaries most of the time. Thus, we repurpose the
origin certificate of TLS to also prove authorship, couple it with an integrity
proof, and require all intermediaries to present this along with the data. This
is why we require a radicle peer to provide a signature over refs/heads
.
Now, bear with me, we’re conflating the concept of a “server” and that of a
“user” because, simplified, git commits don’t commute (i.e. order matters). We
thus need to defer conflict resolution to either a central timestamping service
(which we don’t want), or to humans on the read end (which is actually a normal
thing to do when working with version control). Hence we keep all data, and
decide what to do with it at the very end.
Alright. So if a user and a server is basically the same thing, and a user also
wants to claim certain properties of themselves, it seems plausible to use those
device keys to sign them. We get for “free” that the security of these claims
increases with the number of devices a user owns (in the sense that it becomes
more difficult to take over the identity, which is an effective countermeasure
against key theft), a revocation mechanism, and in some sense a web-of-trust via
third-party attestation in the context of projects.
Note that this does not protect any other keys you might be using against
compromise. I’d argue that this is perfectly fine – if you rely on other
cryptosystems, you also need to use their validation methods. We don’t have to
support them within radicle-link
. What we could do, however, is to teach git
to use radicle device keys for code signing, which is not only easy (cf.
gpg.program
in git-config(1)
), but incidentally also how one would properly
use GPG (I mean, using device-local keys only), yet without the terrible UX. The
drawback would be that the project would again have to approve of all devices
of a user, which we just got rid of.
Are you still with me? Cool So, I think, yes @fintohaps, DID is the right
approach, if not in letter, then in spirit. We would simply extend it with a
“method” which applies to only our keys.
The question remains, how do we construct the “user ID”?
The obvious choice would be a hash over the intial revision of the identity
document, although this creates the nuisance that it can not refer to itself.
I’m not sure how much of a problem this is in practice, though.
Now, we need to be able to refer to this ID, and resolve it on the network, so
let’s work backwards from this. We could:
- Use this ID as the repository name, and invent a naming convention to
distinguish it from projects, e.g.: <subject>.rid.git
. subject
in this
case would simply be the commit hash of the initial revision.
- Bind it to a specific project ID, e.g.
<project>.git/<subject>.rid
. Here,
using a content (blob) hash seems wiser, as copying the file to a different
project would otherwise alter its identity. To preserve content-
addressability, we would need to encode this hash in the branch name for git,
e.g. refs/heads/.rad/<subject>.rid
- Don’t mention the repository at all, and just say
<subject>.rid
. This would
also suggest a content hash, or something else entirely (see below).
The first option is straighforward to resolve on the network, but creates two
problems: it precludes that this repository can be used for anything else than
storing the identity document (or else, the semantics of projects vs. identities
become unclear), and we need to maintain and inspect remote-tracking branches
for every device which signs the document.
The second option is also straightforward, but since the identity document does
not say anything about the project, the subject may appear under more than one
project. In addition, every device must be tracked.
The third option abstracts away the repository backend, but then we need a
custom wire format to transmit the history, and in addition provide a mapping to
materialise that into a repository format for the owner to edit. Or invent a
custom editable, transmittable, and history-preserving format altogether.
Of these, I’d still favor the second option, because it does not change anything
about the discovery and replication protocol we already employ. Also,
project-relative and absolute references are both possible (well, if we allow
$self
in a project metadata document). Since the identifier is stable, it is
trivial to maintain a persistent index locally in order to determine the most
recent seen revision across all locally-tracked projects.