After thinking about this for a while, I’m no longer sure I understood it correctly. I can see two distinct cases:
Eventual Consistency
The key is revoked outside the project I’m currently looking at, and I don’t have received the revocation yet. I’ll happily accept all signatures as valid, even if they were made after the revocation timestamp. If the key was stolen, the attacker could have made signatures past the revocation point. The key owner could have too, but has no reason to do so.
Intentional Backdated Revocation
I somehow have reason to believe my key was stolen, and also have an idea about the timeframe when this may have happened. So I fabricate a revocation in which I state that it should be valid from some point in the past. I, as well as the attacker, could have created signatures past that point.
In both cases, we would need to re-validate the history of a project from the revocation point onwards.
Now, what do we actually do if we find signatures made by the revoked key?
Artifacts signed by the key might have been signed already by other, valid keys, so simply hard-resetting the git history to the revocation point is not an option. Since the device key is tied to the PeerId, which is how we address remotes, the network should stop replicating the respective repo, though, and remove it from the list of remotes entirely (which may remove some of the data from the object database, but not all).
Assuming the revoker is actually using multiple keys on multiple hardware devices, it should be very hard for an attacker to gain control of all of them, and issue a revocation themselves. So, the damage is limited – provided one or more of the other keys was also used regularly: anything which was not also signed by others will be lost, and the victim will essentially need to rewrite all history with valid signatures. I don’t think there’s anything we can do about this – shit happens.
For artifacts which ended up in other peers’ histories (and are thus implicitly or explicitly signed by them), there is also not much we can do except surface the offending commits or blobs to the user, so they can decide whether or not the data is actually compromised and take approprate steps.
In conclusion, I think the proper measure against 1. is to inline the identity documents within the project: there is simply no way signatures can appear after the revocation point, as the remote will disappear afterwards.
Defending against key compromise which goes undetected for some period of time is hard, and I am not aware of a system which actually solves this. I guess the best approximation is to bake in expiry into the keys, so they need to be renewed (rotated) after a while. Backdating a revocation seems safe under the assumption that it is hard to compromise the revocation itself, and otherwise a catastrophic design flaw.
I don’t see these two cases as separated, but as the same thing from two different points of view: the key owner and other users receiving artifacts signed with a key of his.
To reason about it, let’s build a timeline:
User U1 has keys K1, K2 and K3.
On day 1 at noon he signs artifact A1 with key K1.
At midnight his private key K1 is stolen.
On day 2 in the morning he signs artifact A2 with the compromised key K1.
Then, at noon, he recognizes that the key has (or could have been) compromised during the night; he chooses midnight as the “turning point” and revokes K1 from that moment on.
He then needs to re-sign A2 using one of his other valid keys (either K2 or K3); if A2 were part of a history that history will need a partial rewrite (starting from midnight).
Note that here I talk about signatures that are in some way embedded inside the history of a project.
These could be git commit signatures, or “collaboration artifact” signatures embedded inside the artifact itself.
Radicle device keys can be used for this purpose, as signatures that represent the user identity, but they do not need to: a user can use any key stored in his user.json file to sign his artifacts.
Device keys must be used to sign the “contents” of a device (the git branches stored inside its git repository), because this makes the P2P transmission reliable: I can get the contents of a device D1through a second device D2 and I can still see and check the original D1 signature that proves that those artifacts come from D1.
But this “origin proof” certified by device keys is at the “transport” level, while, I repeat, a user can use any key stored in his user.json file to sign the artifacts themselves.
When a device key is compromised the sensible action is to remove the compromised device from the network, eventually replacing it with a new one identified by a different key (maybe on the same hardware, but logically a new device).
The effect is that eventually every P2P device will purge the branches of the compromised device from its storage, and it will cease to exist.
But the goal here is to avoid full history rewrites of the signed artifacts.
So, if in the example above key K1 was a device key, for every other user the remote branches referring to K1 will be removed, but artifact A1 should be retained (through other devices) because its signature is valid (it happened before the revocation).
On the other hand, artifact A2 signed by K1 cannot be trusted, which means that it should not be merged into other “official” branches.
These branches should, over time, accept the new A2 signed with another key, and if they had accepted revoked K1 signatures their histories should be partially rewritten.
This poses timing issues, in the sense that revocations can be retroactive but they should not point too far back in time otherwise the resulting history rewrites might not be practical.
I cannot grasp this, and my intuition is that two distinct signature kinds are conflated while they should be kept distinct:
“transport level” signatures, done with device keys automatically and working as a certification that data comes from a given device.
“artifact level” signatures, embedded in artifacts or in the git history as commit signatures, used as certification that data was authored by a given user.
Both kind of signatures move from device to device, but at different levels.
When a device key is revoked the device ceases to exist (with all the remote branches that refer to it) but “artifact” signatures done with that key before the revocation should be retained inside every history that carries them.
They are distinct in the sense that they pose different threats: if an attacker can create a backdated revocation, they can invalidate the entire history of the victim. Otherwise, only the signatures between the time the key was compromised and the revocation are affected.
The point here is that it is not only the history of U1 which needs to be rewritten, but potentially other users’ histories as well, as they might have included the signed artifact(s) in their histories and attested them using their own keys. In that sense, re-signing is mostly meaningless, as we cannot have a proof that the victim has diligently gone through all the artifacts and confirmed that they were not created by the attacker – that’s why I conclude that the post-mortem procedures are ultimately up to the humans, we can only provide clues as to which artifacts might be compromised (which is already better than what you can do today).
Sure… but again, it violates the first rule of SecOps if secret key material leaves the device it was created on. If you use a “global” signing key, the consequences of key compromise will be more disastrous than using device-local keys. If the signature keys are tied to the network identity, there is a chance that revocation will retract compromised data from the network before it gets attested by others. If it already is in others’ histories, the crisis team must be called in.
Yeah true, we can prevent merge operations – if we have control over them (which is not true for source code branches, unless we also intercept the normal git CLI).
Well, I’m not sure they should be kept distinct, for the reasons already outlined: if a key can travel from device to device, then if it is compromised, all devices are effectively compromised.
In order for an attacker to create a revocation they need to have compromised a majority of the user keys.
If they did then we are already in the “game over” scenario in which the whole identity has been compromised and there’s no way at all to distinguish actions performed by an attacker from actions performed by the legitimate user.
The problem here is not a malicious revocation but something much bigger.
I would leave a description of how to handle this extreme scenario to a different thread.
For now I’d stick to the regular case of “just one key has been compromised”.
In this case the user creates the revocation, which is simply a new user.json revision that:
Excludes the revoked key.
Is signed by all the remaining keys (which of course are still valid).
Could even add new keys, still abiding to the quorum rules.
Specifies the point in time from which the revocation occurs.
The attacker that has stolen the key cannot do this.
There’s a difference between:
a fully compromised device,
a device that carries artifacts with invalid signatures that can be detected
a device that carries artifacts signed by a malicious actor that cannot be detected
If a device key is compromised that device falls into case 1.
It must be decommissioned and every remote branch that tracks it must be removed entirely.
When removing those branches nobody should look at their histories, they must be fully erased so that the compromised device ceases to exist as a device.
However, artifacts signed with that device key will still exist in the project history.
Some of those artifacts are still valid because they occurred before the key compromise.
Other artifacts should, ideally, move from case 3 to case 2, and then they should be removed by means of a partial history rewrite.
Before getting to the history rewrite let’s see why I am describing this passage from state 3 to state 2.
The key here is the moment in time when the revocation is created (T-CRE) and propagated vs the moment in time from which the revocation applies (T-APP, which should be the same time as the compromise, or a bit earlier for safety).
In the timeline above T_APP is midnignt and T-CRE noon of day 2.
In the time interval from T_APP to T-CRE some artifacts are in state 3: they are potentially compromised.
They could have been signed by the attacker, using the stolen key, or by the legitimate user, using the same key because the compromise was still undetected.
And these artifacts can spread and be merged into other histories: state 3 is just a logical state in this description but by definition it cannot be detected in practice.
At time T-CRE the revocation is created and the P2P gossip starts spreading it.
This transitions those artifacts into state 2 above: they are signed using an invalid (revoked) key.
This is a situation that can be detected automatically (after T-CRE, of course!).
We should provide tooling so that each history can be walked from time T_APP to T-CRE looking for invalid artifacts so that the user can decide what to do.
Probably the ideal thing would be that the owner of the compromised key uses thees tools themselves, on their own branches, just after creating the revocation.
For each artifact they signed with the compromised key they should validate it and re-sign it with a valid key.
This is a history rewrite, and it will be propagated by the P2P network as every other history rewrite.
The fact that it happened because of a key revocation is likely irrelevant.
What happens is that the owner of the compromised key “changed is mind” and partially rewrote the history.
I repeat, the fact that the rewrite consists of changed signatures is incidental.
Every device that tracks that history will see the rewrite (through valid devices: the invalid one has been removed entirely), import it in its remote tracking branches as is.
Then each tracking user will decide how to merge it into their own master branch how they see fit.
As @kim said perhaps this is not ideal but it seems vastly better than the status quo.
I agree with that SecOps principle but I wish that radicle will encourage best practices, implement them by default, but not strictly mandate them.
Here I am reasoning in the general case because we should be able to handle the general case.
And I still see a difference between device keys and “other” user keys.
Let’s imagine a user with radicle devices (each with its own key) on their notebook, phone, tablet, home desktop, office desktop, and home file server.
Let’s say that each of his devices is active on the radicle P2P network and hosts at least the “user project” (the one storing the user.json file and defining the user identity).
I could also imagine this user having a personal key on another device which is a pure key storage (like a YubiKey or something similar) and therefore cannot be a “radicle device”.
IMHO this user, when working on a device, should be free to sign their artifacts either with the device key or with the YubiKey personal key, at their discretion.
In principle they could use any key they want, however unreasonable that might look to us.
As long as the key is advertised in their user.json, radicle should work properly.
The difference between a device key and another user key is that device keys are automatically used by radicle itself to sign branch heads so that they can be transferred from device to device and still carry a proof that they originate from a given device.
I see this automatic signature performed by a radicle device the equivalent of the signature performed by an HTTPs server at the transport level.
These automatic “transport level” signatures are different from the “artifact level” signatures performed by the user and stored inside the artifacts themselves.
The “transport” signatures disappear when a device is decommissioned.
On the other hand the “artifact” signatures are embedded in the artifacts and the only way to remove them is through (partial) history rewrites.
We can make so that radicle, by default, uses device keys to perform artifact level signatures, because this is the most sensible default.
But I still see a clear distinction between the two kind of signatures.
We can, btw, also choose to not solve this problem for artifact signatures — which goes full circle to my original thought process.
So:
TUF secures devices.
Devices have metadata, which indicates something about their owners (they are single-user devices).
That something includes one or more public keys the user intends to use for signing things.
Verifying the validity of these keys is up to the application layer: we simply provide hints as to how a specific key might be validated (eg PGP, radicle-registry, bitcoin, …).
A key for which we do not have this information is invalid by default.
We derive some stable identifiers for keys (eg the hash of the public part), such that we can maintain an index into the verification info.
Signatures we control include the key id, so we can find the public key to verify against.
The way git validates commit signatures is by using GIT_AUTHOR_EMAIL and GIT_COMMITTER_EMAIL to reduce the search space in the local GPG keyring. We can do the same.
We can choose to provide a key management / signing tool with less arcane UX than GPG for people to opt out of GPG.
In this scenario, radicle-registry is one (of several possible) PKIs.
Haha, ok, I think we are mostly on the same page, and are mainly debating what should be enforced on the protocol level.
What I’m not getting is how you imagine a history rewrite to happen in practice – say we’re talking about code signatures. I have signed commit C, and it got included into master of a quorum of the maintainers. Now, I revoke my key and … what now? Create C' on my own version of master, and then apply everything that came after this on top? Now the maintainers will see a diverging master, would have to audit potentially a lot of commits (which are now all attributed to me, btw), and “force-push” that new master.
I’m not sure this is practical, but I’m also lacking a solution as there is no obvious way to “multisig” past events in a git history.
The problem with an eventually consistent key revocation system is that you can’t establish a total order of events and therefore you don’t know when the revocation was issued. If you trust the timestamp, you’re effectively saying it’s at the discretion of the user or attacker to revoke whatever they want, whenever they want, and change the past.
This seems too powerful and also a little bit wrong: it allows past messages to be invalidated even in the case where there is no compromise (by the key owner), and it allows a successful attacker to not only force the user to create a new identity (in the worse case), but to invalidate everything that user has ever done.
The thing is, unless you are submitting your revocation to radicle-registry, there is no way anyone can validate the time at which the revocation was issued.
You have to submit your repo to radicle-registry, too, in order to establish a correlation of “time”. Iow, relying on wall-clock time to determine a revocation point is just not going to work, but causal history does not require a blockchain.
If you checkpoint your repo too, then even better, you can determine an order between the two. If you don’t checkpoint your repo, you would have to compare timestamps.
Hmm causal history is not enough though in my mind, because it just says “X happened after Y”, but without a blockchain, you get to choose Y however far back in the past as you want.
Well, define “blockchain” in this context. If you have a hash-linked history, you can very well establish a happened-before relationship, the problem of being able to rewrite this history is orthogonal.
I must insist also that timestamps, let alone freely chosen by the revoker, don’t seem like a good idea. Even if done by the legitimate key owner, being able to state “oh btw, everything I said since Jan 1st, 1970 was a lie. Have fun dealing with that, kthxbye!” is just not going to work.
I don’t think it is possible to solve the problem of providing a proof of the point in time a key might have been compromised, by definition. I’m also not aware of any cryptosystem that does this. The only thing you can do is to say “from now on, this key shall not be valid anymore” – but since time is not absolute, you have to either explicitly put it in the context of a local history, agree on a global history, or employ online verification akin to OCSP.
It is still unclear what the system should do after now happened. It should obviously not automatically change the past, nor do I think any serious project would do this to their master branch. If the network removes devices for which the keys have been revoked, no more harm can be done after that point in time (which is likely after now, but that’s ok). Recovering from an attack which used a compromised key to sign artifacts is, at the end of the day, in the social domain.
Actually… it is relatively cheap to synchronise disjoint (git) histories with an external timestamping service (a blockchain), by just including the most recent timestamp (block hash) you’re aware of in the local history. This way, Y can still be logically in the past, but only up to the previous acknowledged timestamp.
Intuitively, this is quite useful for forensics in case of a (suspected) breach. It still holds, though, that automated action on already confirmed artifacts is unsound. An application must also show a revoked key as such for artifacts before Y — like “key later revoked at height Y” or similar.
I am analyzing the key revocation problem with two assumptions in mind:
We operate without the blockchain side of the project.
User identities are managed in git repositories separated from code project repositories.
These assumptions are hard to deal with, but they also capture an important use case for radicle so we should have some kind of support for it.
This support does not need to be “perfect”, just “good enough”.
Users that want stronger guarantees should use the registry (blockchain), both for defining their set of trusted keys over time and eventually also for certifying code commits.
Then we would have to define what to do if their registry identity is compromised, but let’s leave this exercise to another thread
In the same way, managing user identities and public keys in the same branch as the code history would give us causal relations between code commits and key creation and revocation, but this has two problems on its own:
It is not practical: a single user could be active on hundreds of projects, and in this scenario they would have to “mirror” (or at least “reference”) the maintenance of their user.json file on each branch of each project they are active on.
It still does not solve the issue of what can users do when they discover a key compromise after “some time”, and some artifacts have been maliciously signed using the compromised key during that time.
So, to be clear, I understand this about causal history:
My problem with causal history is that it is achievable only through a sequence of events inside the same git branch, and under these assumptions I don’t think we can have them.
This is true, but we should at least provide guidelines on how to handle that case.
Some kind of protocol that should be followed by each user that we know is the “best reasonable thing to do” and will make so that each view of the project will converge to a “reasonable” state in which compromised artifacts have been identified and purged away or “sanitized”.
And if we could provide tooling to assist users in this, even better.
Remember that we are under the assumptions outlined above (no blockchain and user identities managed outside of code histories).
Under those assumption radicle, as a code collaboration tool, cannot offer guarantees as strong as a blockchain but it should still be a useful tool.
We should define what it can guarantee, what it can not, and describe how to use it and what to expect from it.
While this is true, we should remember that a user can rewrite his own git history anyway.
We are building a collaboration tool.
With this tool people collaborate publishing git histories, and merging work from other users into their own histories.
With the identity metadata we are defining what we can guarantee is that an attacker that gets control of one key cannot publish artifacts (or a rewritten history) and get away with it indefinitely.
Eventually a key revocation should be issued and the situation should be corrected.
The compromised device (if any) will be decommissioned, and the remote branches that track it will be removed, but we still have the problem of how to deal with explicitly signed artifacts that have already been merged into other branches.
I think that a sensible way of handling this could be through a history rewrite performed by the legitimate artifact author.
I mean that the user that had their key stolen should:
Revoke the stolen key.
If there are branches that he owns that carry artifacts signed by the revoked key, rewrite them removing the malicious artifacts or re-signing them with a valid key.
Each downstream user that merged those branches should handle the history rewrite just like any history rewrite.
We know that history rewrites of published branches should never happen because they are difficult to handle, but I see them as a good way to handle this particular problem.
Maybe we could use some conventional “marker” empty commit that represents the start of the rewrite and its reason, both for clarity and to assist tooling.
To sum it up:
I agree with @cloudhead that changing the past is inherently bad.
But without a blockchain certification of every commit each user can already “change the past” through forced pushes leading to history rewrites.
Doing it as a consequence of a key revocation is not special at all.
At least we could provide clear guidelines on how to handle the issue.
I also agree with @kim that this is a social issue, involving trust, that should not be automated.
But we could provide tooling to assist in handling the sections of “compromised” branch histories.
My claim (which can be wrong!) is that with these tools and guidelines we could use timestamps as a base for correlating artifact signatures and key revocations.
While not as reliable as a blockchain this could be good enough for users to collaborate on code development using radicle, keeping their identity and occasionally revoking keys.
I think this is the culprit: by default, all merges from a remote tracking branch into the local branch are intentional (made by a human). If we provide some automation around this, it MUST prevent automatic merging after the local radicle tooling has learned about a key revocation. It MUST NOT, however, rewrite the local branch without user interaction.
Perhaps we should spell out the situations in which the radicle tooling would perform automatic merging, to understand better when it would bail out.
For example: assume device and artifact signing keys are distinct. Now, an artifact signing key is revoked. The client updates local projects from the network. Then, I want to pull the updates into my working copy of one project: intuitively, the tooling should now verify that none of the branches I’m about to pull contains a signature by the revoked key, and otherwise prevent me from pulling in the updates, as well as give me an informative error message about what just happened, and instruct me how to proceed with verifying the branches I already have in my working tree.
That’s an interesting idea. I agree that we can do a lot better than answering yes-no for tooling which verifies artifact signatures. Perhaps something for @cloudhead to chime in and work out some flows on how a recovery procedure could look like from a users perspective?
Yeah I agree that if this is basically a hint for the recovery procedures and tooling, timestamps are good enough. The “upsell” for radicle-registry would be that, since you can causally relate key revocation events with code snapshots, releases, name ownership changes, etc. on a global timeline, the disaster recovery procedures for widely used projects could be much more effective.
It seems we’re getting there!
My feeling is that we are agreeing on everything.
Branches representing local copies of remote devices are always fetched unconditionally, either fast forward or rewriting histories, because they are simply mirroring those devices.
What is interesting is transferring commits from those branches into locally owned ones.
For code branches I would never do automating pulls or merges.
The act of pulling is an explicit user action (it is a git pull).
Of course it could pull commits signed with a compromised key before the revocation has been issued, but this is unavoidable.
The tooling should simply check all signatures according to the currently known valid key set, and refuse to pull or merge invalid signatures.
In case of retroactive key revocations there are two ways in which tooling can be of help.
The first would be to scan the section of the local history that starts from the revocation moment (which could be in the past), and detect signatures that are invalid because of the revocation.
This would be useful but IMHO without knowing the intention of the original author there’s not much that can be done, except simply assisting the user in removing those commits.
Another useful tool could be an assisted merge of a rewritten history.
To see why I am always talking about “history rewrites”, let’s follow this scenario.
User U1 has three devices with keys K1, K2 and K3.
At some point K1 is compromised and an attacker publishes commits on the device, signing them with K1.
User U1 stays offline for a while and sees nothing, meanwhile user U2, tracking the three devices, sees the malicious work on K1 and merges it into their own tree.
Then U2 goes offline for a while.
Then user U1 notices the attack, decommissions K1, revokes its key, and publishes something else on K2 (signing it with K2).
When U2 comes back online their perception of the work by U1 is exactly a history rewrite: the malicious commits from K1 are gone, and have been replaced by other commits from K2.
This is why I am so convinced that tooling that helps in reconciling history rewrites would also help in recovering from a key compromise situation: the end result is the same.
It is just like if user U1 published something, then changed their mind and published something else.
We should explore the idea of “marker” empty commits stating the intention behind the rewrite (to assist the tooling).
About collaboration tools
All of the above applies to artifact signatures on code branches (essentially signed commits).
However in radicle we’ll have another category of signed artifacts: individual contributions to the shared data of collaborative tools (issue tracker, code reviews…).
Maybe in this case automated merges can be desirable (because the underlying CRDTs will be eventually consistent).
The general process should be the same, but perhaps in this case the recovery from a key compromise could be more automatic (because the merge rules are strict and deterministic).
I think we’re more or less all in agreement, but there’s a few things I still am not sure I got across or understood, so let me give it another try and maybe you can explain where the holes in my thinking are:
There is a difference in meaning between key revocation and history rewrites due to key revocations being global across projects. If we wanted to use causal history here, we would have to point to all git heads we want to prove happened before. This would then allow on a per-project basis to know which commits are authored with the compromised key.
Both timestamps and causal order are easy to cheat without the radicle-registry:
In the case of timestamps, I can create a revocation with a timestamp both in the past, and the future. So this is useless.
In the case of causal ordering, I can set my “parent” object to anytthing in the past (but not the future) – basically I can pick an old object as my parent, and there will be no way to tell that I didn’t pick the latest available object.
Hence, in both scenarios, I am able today, to back-date a key revocation, invalidating arbitrary histories. This is the part I think we all agree with. What I was trying to say about blockchains, and in particular radicle-registry, is that if today, I decide to revoke my key and post that on-chain, I am neither able to back-date it, nor forward-date it. The only choice I have, is to date it to today. This is what makes it effectively a timestamping service, as pointed out.
Now, the advantage of a timestamp we can trust, is that we can go through all git histories, and compare the commit timestamps with the trusted timestamp, to determine order of events, and which commits are compromised. Unless I’m missing something, this cannot be done with causal history, due to the ability to back-date, and requires all parent objects to be included in the revocation.
But what about the git timestamps you ask, can they be trusted? Well, no. But if you wanted to rewrite your git history to add 30 days to the commit time of all commits, effectively back-dating your revocation, you would have to get all other clones of your project to do the same, and reset to your tampered history. Though this is already unlikely, you could as mentioned also anchor the commit histories on chain, giving you causal and timestamped revocations.
Just a nitpick: the blockchain does exactly provide causal history, but a global one without a spof
The takeaway for future product considerations is: what if we’d consider an advisory timestamp in the past for some forensics / breach recovery tooling? As in, I know my key could not have been stolen before T1, but I revoked at T2, and because I’m trustworthy, the incident response team can start sifting though the 10k commits after T1 first.