Radicle for Collaborative Writing

Yarrow · February 7, 2021, 11:11am

Hi, I’m curious what this community thinks about the capability of P2P git for document collaboration other than software. Do you think non-programmers, writers and editors can learn a workflow to be able to collaborate on markdown files? If so, then tools like Radicle could end up being a game-changer for how people cooperate on expressing themselves in more fully matured thinking.

We need to grow a culture of writing and thinking critically in the open. I love making software, but I have a strong feeling that the principles of open-source collaboration need to be applied to intellectual creativity in general.

What do you think? Is it hopeless to try to get non-programmers using git?

–Yarrow

alexgood · February 7, 2021, 1:30pm

I’ve thought a little bit about this. I think non-technical users can learn to use git for collaborating on markdown documents, but it’s at terrible user experience. Git has a famously poor UI even if you’re very familiar with version control. I spend a substantial portion of my life wrangling git repos and I don’t think Git is a good experience for individual documents.

I’ve spent some time thinking about this because I’m looking to build a collaborative markdown editing system using automerge. I think editing structured text is a good candidate for building a local-first application because the current user experiences are horribly broken. I like to talk about Google Docs here because it’s so popular and yet so limited. You can only use it in a web browser and you can only use it when you’re online (unless you’re using specific browsers and prepare to do so ahead of time). You’re also completely at the whim of Google, if they decide to stop supporting Google Docs, or make it inaccessible to you in some manner, then you’re stuck.

The dream for me is to be able to work on the document in whatever editor the user wants. Imagine a workflow like this. I start editing a collaborative document in Vim, everything I write is written to a file locally. Once I have a first draft I upload the file to some kind of file hosting (DropBox, Google Docs, S3, whatever really, it’s just files) and share it with friends. One of those friends downloads the document and edits it in a VS Code plugin - they make some changes and add a few comments and then upload it to their file hosting system of choice and tell me where to get it, I can then pull those changes into my version of the document. Another friend downloads the document and adds some comments and then emails me the new file. I can merge both of these with my document because I’m using automerge, which is a CRDT.

This has a bunch of nice properties but the key thing is that the thing we are all agreeing on is now a document format (in this case an automerge CRDT) rather than a network location (e.g docs.google.com). Anyone who understands that document format can write tools that work with it.

This is long because I’ve recently spent a lot of time thinking about this and designing a solution, I hope these thoughts are interesting

kim · February 8, 2021, 1:25pm

Hey @alexgood, do you have some more details on the design you’re thinking of? The obvious question for me would be how you envision to map markdown onto automerge – while CRDTs have very clear semantics, markdown hasn’t. Even after CommonMark, there is quite some parsing ambiguity, and I can’t imagine a merge algorithm which could do better than a specialised patience. But perhaps I’m just not imaginative enough

andrewjskatz · February 8, 2021, 1:42pm

This is extremely interesting. We’re looking at starting a project based on Atom (see git.law) which uses a superset of Markdown to faciliate editing legal documents like contracts or licences, and introducing a number of tags which would allow process document which would be helpful in the legal realm, such as understanding definitions, introducing cross-references and so-on. CommonForm has already done some work on this. Common Form · GitHub. To integrate this with Radicle would be really powerful.

alexgood · February 8, 2021, 2:34pm

@kim I feel like Markdown parsing ambiguity is a slightly orthogonal problem, the nice thing about using a CRDT over plain text (as opposed to over some more structured kind of representation such as hierarchical maps) is that we don’t have to make that decision for users and rendering tools can be composed on top of it. In fact, there’s no particular reason to use Markdown at all, any kind of formatting language can be used (I definitely want this for LaTeX as well). The commenting engine would be the same and then you would combine it with a different rendering engine.

kim · February 8, 2021, 4:33pm

Well, how is that? Unless at least some of the semantics of the document are captured in the datastructure, we’re back to textual diff. I’m not sure I understand how CommonForm works, but it seems to me that, if we favour “lightweight” markup, we need to do even more work transforming that into a structurally sensible form – and the more ambiguity in the source, the more ways this can go wrong. Or am I misunderstanding something?

alexgood · February 8, 2021, 7:05pm

Automerge (and most text CRDTs) preserve editing intent much more effectively than git due to the changes been recorded on a character by character basis rather than line by line. My thinking is that because the formatting information is recorded in the text, and the CRDT preserves intent reasonably well, then most merges will result in the formatting being merged as users would expect.

On the occasions when that isn’t the case then fixing things should be reasonably trivial because the formatting information is in the text, so users can see it and figure out how to fix it. Additionally I’ve yet to see a more complex model that works in the presence of commutative merge operations. My thinking therefore is that it makes sense to implement the plain text version first and if that provides a poor UX then do something more complicated.

kim · February 8, 2021, 9:14pm

Oh I see, so you’d envision more like a (more modern) operational transform system.

Follow up questions:

Isn’t this paradigm more useful for realtime collaboration?

Consider that radicle-link is very inherently an asynchronous system, and that seems what you’re envisioning, too.
If not, how compact can you store edits?

Surely we can just store a history of every edit, but having to replay that will have some scalability limits I suppose. Since git already imposes a partial order, could some of the bookkeeping be omitted maybe? And if we expand the “C” in CRDT to “commutative”, would it be possible to collapse a series of edits into a larger one?

RMBLRX · February 9, 2021, 9:02am

I use Emacs (more specifically, Spacemacs) and Magit for writing and will definitely use Radicle for most of my projects in the future. I’m only very much an amateur on the programming side of things but developed an appreciation for git early on in my research around various wiki platforms, deciding that git was just a more resilient approach in that department. This is probably why I ultimately picked up markdown for all of my writing, and then it wasn’t long before I happened upon publishing platforms like Leanpub, GitBook, and Atlas, all of which integrate git into the publishing of books and/or documentation.

I would certainly like to see some robust integrations around prose-oriented publishing in Radicle’s interface, primarily in the area of wikis, static sites (blogs, for instance), as well as various other forms of documentation or publication. For instance, the latter of those would benefit greatly from IPFS publishing hooks of some sort; as in, the ability to define a chosen build tool to produce a static site or even epub, pdf, etc. and have it deploy to IPFS or a given IPNS address, which would be doubly handy when Ethereum integration rolls out, given the possibility of updating ENS records with a hook (ideally, being able to post the source repo’s hash from Radicle-Link, as well as an IPFS hash of the site built from source, giving the added benefit of being able to somehow reference either hash with the same domain). I guess what I envision here is something of a decentralized Fleek or Netlify or GitHub Pages alternative. (incidentally, this sort of IPFS hybridization also seems like an appropriate way to handle release assets, as well as large files and binaries of whatever sort)

It seems also that this sort of capability would prove even more integral when issue tracking, discussion, etc. are rolled out, given that a site could actually leverage this sort of communication in a more accessible or familiar route of engagement with a piece of text, such as comment section widgets or even entire web-forums with the radicle-linked git-repo on the backend. In the case of comments for blog posts or other published materials, these communications could occur in-context and even directly inline with the text (more specifically, a fork of the text, using commenting syntax like that used in CriticMarkup, for instance–possibly a good place to implement CRDT… comments as collaborative forking?). Alternatively, I could imagine the forthcoming Akasha World framework (likely using Ethereum and IPFS) as an avenue for integrating comments and issue-tracking. However it’s accomplished, these communications would prove uniquely indispensable for engaging with texts published through Radicle, whether collaboratively or otherwise socially.

kim · February 9, 2021, 1:53pm

Can’t speak for the Radicle Upstream roadmap here (as far as integrated experience is concerned), but these kinds of cross-publishing use cases were considered when we went for git as the backing store. We still have a IPFS git remote helper lying around, which publishes the branch head to IPNS.

Heh, that’s kinda cool, actually. Reminds me a little of Jane Street’s Iron, where they do code reviews as inline comments. With some clever tooling, that seems unreasonably effective.

alexgood · February 9, 2021, 3:36pm

@kim I think CRDTs have broader applicability than real time collaboration. The reason I’m interested in them is because they make it possible to build peer to peer applications without having to also build merge operations into the UI of your application (which is otherwise necessary as there is no privileged peer to decide what order things happened in). I think it’s probably an open question as to whether that approach makes sense in general but I think it’s clearer with text editing on a single document. There are already a number of experiments in this area (e.g PushPin) which are very promising.

Re. compactness, Martin Kleppmann has spent a lot of time working on this on the performance branch of Automerge, which will be released as a 1.0 soon. You can see a detailed writeup of the results here, (although there have been some improvements since then) but it’s something like 3 bytes per change (where a change can include multiple insertions or deletions).

@RMBLRX CriticMarkup is interesting, thanks for the link. Thinking about integrations, there’s no reason Netlify couldn’t integrate with Radicle right? They would just need to track a particular Peer ID? Would be very cool.

kim · February 9, 2021, 6:29pm

Most definitely – I was referring to the particular strategy of tracking single-character edits (and cursor movements!). I feel like this is a very particular and somewhat narrow use-case, and, given the cost, for me personally only convincing in a realtime setting.

Hm, I think this is only true when the set of collaborators is trusted. That’s fine for a lot of use-cases, but until we can make a “drive-by” contribution as frictionless as opening a PR, we are not there yet.

Ah, that’s a good update of what’s been happening over there, thanks! The space + time improvements look very nice. The change from vector clocks to hash-linked causality is interesting, and it is indeed imaginable that the storage format could be adapted onto git’s (omitting some redundant properties in the process). It’s not clear to me, however, where the automerge project is heading – I get the impression that it is converging towards its own storage layer and network protocol.

Yes, roughly. It wouldn’t even change their value prop + business model.

RMBLRX · February 10, 2021, 6:06am

Yeah, I had also considered that, but if full-fledged git forge decentralization is the goal here, then surely this sort of functionality (in terms of IPFS and ENS deployment, at least) fits the bill. I mean, an individual could roll their own hooks for this, but having that process enmeshed with radicle-link would be quite a boon for Radicle and the decentralized web, generally. I mean, imagine collaborative wikis or even publications where the collaborators collectively hold custody over an ENS name and can deploy to that name based on whatever sort of role or code ownership they hold (like some would be able to post and edit their own articles, for instance, but have no chance otherwise to break the site).

That reminds me of some combination Magit and Imdone, at least back when the latter used webhooks or whatever to add and edit GitHub issues right from the comments captured in the commits; I don’t know if it still does this but it was neat, and as a major aside, I think it’s a cool trick that could work well for Radicle (I always wished it had included TODO items from orgmode syntax as well as comments from CriticMarkup syntax, personally).

Oh, neat. Yeah, I think that even if publishing the repo is merely a redundancy one could take or leave, it seems like a no-brainer at least to publish assets and builds to IPFS, IPNS, ENS, etc. to readily break them out into the wider distributed web and somehow relay curious folks back to the source on Radicle (the way static sites often do with GitHub).

pepo · February 10, 2021, 12:07pm

Hi there!, someone told me about this conversation, and I just wanted to share a bit of our work with _Prtcl (a protocol) and Intercreativity (a note-taking app powered by _Prtcl).

We live higher in the stack and abstraction level, so “objects” are simply mutable references that live on a given platform (like URLs) and that, when asked to that platform, resolve to the hash of their latest head (like a GIT branch).

Real-time collaboration is not our priority, nor CRDTs. Instead, controlling when to push and when to merge between coexisting versions of a piece of content controlled by different people/entities is what we want to nail down.

Another thing we are focused on is in handling “nested repositories” so that you are not forced to set a “walled space” that represents your content (a super repository that is updated every time anything is updated), but, instead, having any piece inside of your space open to be referenced and forked independently of the others.

Our “documents”, for example, are not markdown or big JSON objects, but made by many small linked JSON objects, each of which with its own “URL/branch id” and resolving to its own latest head, and with links to “URL/branches ids” of its children.

Our documentation is not up to date , so now is not a good time to try it out, but if you want to see our latest demos, they are here.

We are about to finish a refactor process and hopefully in a few weeks will be able to update the docs and developer tools. This is the only thing that currently works.

Cheers!