Discussion: Package Signing & Security

christroutner · December 13, 2022, 8:17pm

Hello Radicle Community!

Radicle agreed to fund a grant researching package releases. This research focuses on packaging Radicle git repos into a software library, and then using a git tag to signal a ‘release’. Some of those ideas come from this previous thread. The terminology and workflow is a little different for each programming language, so I’m starting with existing, well-established workflows from the JavaScript, GitHub, and npm ecosystem.

Introduction

I’m at a point of the research where I’m investigating the concepts of packaging signing, security, and supply chain attacks. I’ve been trying to understand the security model at the center of Heartwood protocol and Radicle Identity. I’ve also been comparing that with the npm package signing workflow.

The deeper I go down this rabbit hole of security and package signing, the more confused I get and more hopeless it seems. There are many attack vectors, and solving one of them does little to secure the entire supply chain. So I thought it would be a good idea to turn to the Radicle community, to see if we can collectively find some answers.

Background

For reference, I should point out the first two parts of the research I’ve already completed:

Part 1 focused on how to generate a npm package from a Radicle git repository, and then how to include that package as a npm dependency in another JavaScript program.
Part 2 focused on taking the output of Part 1 and integrating it with Verdaccio, which is an npm package cache. It will cache npm packages locally and serve them. This allows the opt-in use of Radicle as a package delivery system without changing any existing workflows (which is good UX).

So Part 1 showed that it’s possible to generate a package, and part 2 shows that it’s possible to deliver and cache the package. Parts 1 and 2 proved what is possible, now it’s time to review those results through the lens of what is prudent. It’s time to look at the workflows adversarially and consider how a malicious actor could inject viruses and other things into a package.

Workflows

From my perspective, there are two workflows to take from here:

Adopt an existing package signing workflow.
Leverage the existing identity workflow built into Radicle

Existing Package Signing Workflows

I’ve studied the npm package signing workflow more than any other, but at a glance, it seems like there are few standards that are shared across package managers.

The npm package signing workflow gives a brief description of how the signing workflow would be supported by a third-party registry (like Radicle). Dev work would be required to integrate the same method into radicle-cli. In theory, the end result would be that packages generated from Radicle repositories could be passed around, and delivered over separate mediums (like Filecoin or Arweave). The signing process alerts the user if the package has been tampered with, but there is a lot of nuance to that statement. It’s unclear to me if the end result is worth the cost of the effort.

So my first question to the community: Is working hard on a package signing scheme even worth it? It seems like a very hard problem to get right, and it does not appear that there are many industry examples to point to of what getting it right looks like.

If anyone would like to propose a package signing workflow, please point out the following parts of the workflow (at a non-technical, high-level):

How does the package creator sign the package?
How does the package consumer verify the signature?
How would we integrate the workflow into Radicle?

Leveraging the Existing Identity Workflow

Radicle’s identity model works pretty good as a security model. For example, if a package consumer is getting the package directly from a Radicle repository, they are implicitly trusting the repository owner. That’s a pretty simple, clear chain of custody.

That model is broken however, if the package does not come directly from a Radicle repository. For example, if the package is uploaded to Filecoin or Arweave, and then consumed from there. Without package signing, there is no secure connection that the package consumer can verify.

There is a trade-off there: Getting a package directly from a Radicle Seed node is more secure, but how reliable is it? Being able to put packages in different locations (Filecoin, Areweave, Seed nodes, npm) improves reliability and availability, but it reduces security.

Research Implications

In Part 1 and Part 2 of my research, I was careful to point out when a package was coming directly from a Radicle Seed node or not, because of the trade-offs described above.

To summarize that research to the parts the are salient for this conversation:

In part one I showed the ‘git+https’ format allows a JavaScript application to specify a package dependency that can be downloaded directly from a Radicle Seed node, which is a secure way of retrieving a package. But in part two, I showed that Verdaccio will not cache that format, so if the Seed node is unreachable, then the dependency can not be installed (bad for availability).
In part two I showed that the ‘file-proxy’ app can download a specific package from a Radicle Seed node, in a way that Verdaccio can cache it. This is the best solution I’ve found in terms of security and availability.

Questions for the Radicle Community

So with this background, let’s open it up to discussions. I feel the use of ‘file-proxy’ (from Part 2) strikes the best trade-off between security and availability. But perhaps there is a better way? Or perhaps there are security considerations I haven’t even considered yet?

I’m sure there are people in the Radicle community that are more knowledgeable about package signing methods and their trade-offs than I am. Please use this opportunity to educate us!

cloudhead · December 13, 2022, 10:31pm

Based on Part 1/2, it’s not clear to me what is verifiable by the user. If npm fetches a .git package from some source, what is being verified locally to ensure the right package was downloaded? To properly assess any of this from a security standpoint, we’d have to know exactly what the verification process looks like.

It seems like there’s two possible situations though: (1) In the first situation, the user somehow knows the package checksum. This means the user obtained the checksum from a trusted source. (2) In the second situation, the user knows the package id, which never changes, but not the current checksum.

(1) is a solved problem. Given the checksum, you can download the package from an untrusted source and verify that it matches the checksum. This is not interesting, as any old file server will do.

(2) is the interesting problem, and is what radicle solves, using self-certifying repositories. This requires the user to verify the repository or package within the repository, instead of just a checksum. But it means that there’s no need for a centralized registry which keeps tracks of package checksums.

I’m not sure where the ‘file-proxy’ stands, as there seems to be no verification at all mentioned in the document, which means that you could be downloading malware without knowing it. So without understanding the verification process, it’s not possible to say much more.

christroutner · December 14, 2022, 1:15am

Thank you @cloudhead for that thoughtful response. It’s good food for discussion! I like how you broke down two distinct situations.

(1) is a solved problem. Given the checksum, you can download the package from an untrusted source and verify that it matches the checksum. This is not interesting, as any old file server will do.

My understanding is that at one point, Radicle considered an ‘anchoring’ method by writing a checksum (or some equivalent) to the ETH blockchain? ETH is prohibitively expensive for this use-case, but I’m an active developer in the BCH and XEC blockchains, which have sub-cent transaction fees.

If it’s of interest to anyone else in Radicle, I’d love to explore the idea of anchoring these checksums into a blockchain, and tracking them with an indexer. These are activities that I am very familiar with, and have collections of code that can be leveraged. This would nicely provide Radicle with a decentralized registry of checksums.

(2) is the interesting problem, and is what radicle solves, using self-certifying repositories. This requires the user to verify the repository or package within the repository, instead of just a checksum. But it means that there’s no need for a centralized registry which keeps tracks of package checksums.

I had thought to leverage the self-certifying nature of Radicle repositories, but the ‘how’ is not obvious to me.

The file-proxy app is just a proof-of-concept. It only been developed to the point of showing that it’s possible to build an app that can retrieve a file from a Radicle repository and serve it in a way that Verdaccio will cache it. From a security standpoint, it’s very naive. The intention is to pair this method of fetching files with something that is more secure.

I wonder if there is some additional logic that could be added to file-proxy to verify the repository or package somehow?

Or perhaps the functionality that file-proxy provides could be integrated into a Radicle Seed node, and leverage the existing verification that way?

At this point, I’m really looking at the ‘lego bricks’ on the table, and trying to figure out the best way to put them together.

christroutner · December 14, 2022, 5:05pm

I wanted to expand a bit on the idea of using a blockchain as a decentralized checksum registry.

Part of my understanding comes from my experience working with Bitcoin on a low level, and so I’m trying to be aware that most people aren’t going to have that knowledge.

There is also an open question of how much of this would be integrated into radicle-cli versus being built as a separate tool set.

But in my mind, there would a rad-pkg command or something like that, built into radicle-cli. This command would do the following:

It would package the contents of the repository into a package file. In the case of an npm package, this would be a .tgz file.
It would compute the checksum for the package, and write a that checksum to the XEC blockchain.
It would generate a git tag to signal a release, and include the checksum.

The indexer service that tracks all those checksums on the blockchain could be run on an individual basis, and an instance can be setup as a web service to provide a nice UX. Either way, the checksums can be collected and independently verified.

With that checksum scheme, the actual package can then be safely distributed over any medium.

As an added bonus, we could follow the npm standard for package signing, and make the radicle workflow for package releases compatible with that. So when someone runs npm audit signatures the packages pass the check.

yorgos · December 16, 2022, 2:10pm

Hello Chris!

Thank you for offering to explore this area and getting the discussion going. I very much appreciate your efforts in this area!

There is a lot in this post - I will focus my comments on what I think are the most interesting/valuable points to explore further from my own, non-radicle-core-team, point of view:

In general, I think considering NPM packages (or any package that consists of a simple packaging of the source code, as is common in interpreted languages) is perhaps not the best example to help the reader understand which aspects of the term “package” are discussed here. Depending on context, “package” could mean:

the library itself,
a binary file that is attached to a specific release of the library,
the source code that makes up a specific release of the library.

I think it is important to clarify what we want to focus on, so I would recommend we consider the case of a compiled language (java, golang, rust, etc. etc.) which will help clarify that point further. For example, in one of these languages, a specific release of the library might have several different packages attached to it (i.e. different binary files - e.g. for different platforms).

As a second point, I am not sure how the blockchain approach improves upon the proposed solution based on Radicle’s Collaborative Objects in the other thread. I mean… sure, it does offer a solution to the problem, but it introduces a new (massive) dependency on a blockchain that Radicle doesn’t have today. Collaborative Objects on the other hand will be the basis of other entities like e.g. Issues and Patch Proposals, and Releases (which would be linked to a set of (signed) binary packages) seem to be a good fit.

I see the 2 threads very closely linked and I think some of what is being discussed here has already been addressed there. Would it make sense to base the discussion herein on top of that perhaps and see specific points where that proposal falls short or points that it doesn’t cover related to signing / security ?

christroutner · December 17, 2022, 3:05pm

Great feedback, @yorgos. Thank you.

I think it is important to clarify what we want to focus on, so I would recommend we consider the case of a compiled language (java, golang, rust, etc. etc.) which will help clarify that point further. For example, in one of these languages, a specific release of the library might have several different packages attached to it (i.e. different binary files - e.g. for different platforms).

I think we could define a package as “one or more binary files”. That definition nicely spans both the npm use-case as well as the compiled language use-cases. Even in the broadest sense of the term, where it indicates the source code, that source code would probably be distributed as a binary file (like zip or tgz).

Collaborative Objects on the other hand will be the basis of other entities like e.g. Issues and Patch Proposals, and Releases (which would be linked to a set of (signed) binary packages) seem to be a good fit.

I need to understand the Collaborative Object in more detail. The concepts in the referenced thread do not give a clear picture to me as to how it solves the issues of signed packages and avoiding supply chain attacks. From reading that previous thread, I don’t understand:

How does a package consumer verify the signature? (or verify that the package has not been tampered with)
How would the package signing be integrated into a Radicle workflow?

Is the Collaborative Object an idea that has been worked on, but there just isn’t any documentation for it yet? How can I learn more about this idea? How much of it has already been implemented?

As a new data point for this discussion, @cloudhead dropped this link in the heartwood Discord channel. This appears to be a very handy TypeScript library for signing and verifying npm packages. It also seems to leverage the Ed25519 signature that Radicle uses:

GitHub - 47ng/sceau: Code signing for NPM packages

The problem remains: where does the checksum live? But this library would be a handy tool for generating the checksums.

cloudhead · December 19, 2022, 10:28pm

I don’t have time to respond to everything above, but I’ll just put down a few thoughts:

The natural place for the checksum to live is in the radicle repository of the project/package/binary; reason being that the rules and security model around a certain project are self-contained, within that repo.
Since everything published is always signed, the question becomes: by who?
You may consider a release/pkg trustworthy if it is signed by a certain key. Usually the trusted keys are the ones controlled by maintainers/authors of the project. These are modeled as “delegates” in radicle.
There isn’t much sense in separating checksum from package; since radicle repositories are the ideal storage for both, and one without the other is useless anyway.
Collaborative objects can add functionality, but the essentials needed here are really just storing a file in the repo and creating a git ref, that’s all. CRDTs (which is what collaborative objects offer) are probably overkill for now.

yorgos · December 20, 2022, 6:54pm

+1

I think that is what the entries valid_when and signed_off_by try to capture in the data structure proposed here .

hmmm… what if the binary is a few hundred MB ? what if it is a container image of a few GB ? would we still want to store these in Git (or did I misunderstand what you mean here)?

I was thinking that while the checksum (and probably a descriptor / reference to the binary package) should live inside the radicle repository, we could decouple storage/location of that binary, so that it could live… anywhere. All we need on our side is to be able to reliably point to that location and leave the actual storage / content delivery to existing, well-established 3rd party services.

for the above 2 reasons, I would like to see us start with a Conflict-Free Replicated Data Type (CRDT) that we can build upon, rather than some other more basic structure that will soon limit what is possible with Radicle Releases.

Looking forward to your thoughts (whenever you have some time).

(disclaimer: I am perhaps a little sensitive / overly focused in this area as my day job is largely around release engineering).

christroutner · January 2, 2023, 6:03pm

Thank you everyone for contributing to this discussion. Let’s please continue to engage in discussion of this topic, but enough time has passed for Radicle members to contribute, that I think I can summarize and conclude the discussion so far.

For me, the goal of this discussion was to wrap up my research into the parts 2 and 3 of Milestone 1 for this research grant. With the contributions to this thread, I think I can do that.

Here are my takeaways from this discussion:

A package is defined here as any arbitrary binary file.
The best place to store packages is in the Radicle repository. This keeps the security model simple.
- This scheme is problematic if the package is very large (100 MB or larger).
- If there is no supply chain (i.e. no package storage and delivery, since users are retrieving it directly from the repository) a checksum is optional. But should a checksum be used, that can also be stored in the Radicle repository.
There is no consensus on how Radicle should decouple large packages from the repositories.
It’s unclear if Collaborative Objects can provide a solution to decoupling packages from repositories or for tracking checksums of packages.

yorgos · January 3, 2023, 9:31am

Thanks for the summary @christroutner !

I would only contest this point:

Storing packages inside git itself is not a popular pattern as far as I’m aware of and I’d add that (on top of the file size concerns raised above) it also rather confuses things (e.g. you need an extra commit to add the binary package which was built from a different commit, making it rather unclear which commit is actually used for the “released package”).

I do think that it’s fair to say it’s one of the options on the table - I’m just not convinced about the “best” qualification yet.

cloudhead · January 3, 2023, 4:29pm

Git isn’t bad at storing large binaries, it’s just not always what you want. It’s also not true that if we store binaries in Git, that users have to download them when contributing to projects. It’s possible to fetch only the source code and not the binaries.

However, it doesn’t mean you always want to store binaries directly in the repo. You can imagine something like git-annex, where the checksums and tree are stored in the repo, but the actual file is stored somewhere else. This model would work if you prefer to use a CDN or specialized server for serving binaries.