Idea for distributed encyclopeadia, possibly using Git over IPFS

Ramesh_Nair · November 17, 2019, 12:35pm

Hi Radicle team,

Great project! I really like it.

Over at the Knowledge Standards Foundation (https://larrysanger.org/2019/10/introducing-the-encyclosphere/) we’re currently discussing the possible architecture for a global, decentralized, uncensorable encyclopaedia. So far we’ve zero-ed in on the benefits of a content-addressable system (using IPFS or Torrents) combined with digital signatures and self-sovereign identities.

Being able to collaborate on a piece of content would be great, and Git obviously provides a great foundation for making collaboration a breeze. Since you guys have built a decentralised Git/hub I’d love to have a chat (email, chat, or call) with you about the how hard or feasible you think it is to use a similar system (Git over IPFS) for what we’re trying to build.

Our end-user content creators will be non-developers. We want to publish to the encyclopaedia from blogging platforms, for example. So of course we’ll have to build additional UIs for collaboration (branching / merging) purposes. But data integrity and digital signatures are crucial (and Git does have those already).

Look forward to hearing from you.

thanks,

Ram

···

–
Ramesh Nair

Director

HiddenTao Ltd (UK company no. 6807289)

http://hiddentao.com

https://github.com/hiddentao

https://www.linkedin.com/in/hiddentao

kim · November 21, 2019, 8:54am

Ram

Thanks for suggesting DAT protocol. I had come across that before but wasn’t sure, I’ll suggest it to the team for sure.

You should definitely check out the Beaker Browser
(https://beakerbrowser.com)! Seems like it could give you at least a
PoC platform.

(immutable commits that can then be “merged”, allowing for edit history to be viewed too just like for a normal wiki page)

The question that comes to mind here is if you want Git's snapshot
model (all historical versions are preserved in their entirety), or
more like small diffs. I'd say the former can be modelled quite
naturally on IPFS directly, while for the latter any of IPFS, Dat, or
SSB would work (to some degree). When it comes to merging, Git's model
is harder to reason about, because patch identity is tied to the
history, and thus patches don't commute. Since your editing model may
turn out to be less general, perhaps CRDTs can capture it quite well.

a searchable index (also decentralized)

That's also an interesting problem - keep us posted on your approach!

-K

···

On Mon, Nov 18, 2019 at 10:09 PM Ramesh Nair <ram@hiddentao.com> wrote:

Ramesh_Nair · November 18, 2019, 9:08pm

Kim,

Thanks for the insight.

Thanks for suggesting DAT protocol. I had come across that before but wasn’t sure, I’ll suggest it to the team for sure.

We haven’t yet figured out the exact workflow. If we can enable people to collaborate on an article wiki-style then that would obviously be ideal. Yet, because we’re digitally signing content and the need for immutability, such collaboration would likely take on more of a Git-like architecture (immutable commits that can then be “merged”, allowing for edit history to be viewed too just like for a normal wiki page). I agree with you though that Git itself isn’t necessary the model, but rather something similar that’s made for the type of content we’re producing.

I just realized what I’m describing above is: a decentralised wiki with decentralised editing

Having said that, it doesn’t have to be wiki-style. At the fundamental level we just want to be able to publish articles to a decentralised storage layer with metadata attached that allows us to build a searchable index (also decentralized). Further layers (such as article rating, content keyword-based search, analytics) can be built on top of these base layers later on.

For instance we want people to be able to publish their blog posts to this decentralized network. It’s unlikely they’ll be collaboratively editing posts directly like that - we’ll probably have to build a separate editor for that as integrating something like that into an existing blogging engine may end up being too confusing from a UX point of view.

Regarding data persistence, the Knowledge Standards Foundation is likely going to host a node which will permanently pin any data, at least until we figure out something better. Hopefully other orgs and individuals interested in contributing to the network will do so too. Regarding censorship, we’ll have to caveat to users that they can only guarantee no censorship if they host their own node in order to ensure their data is constantly accessible. I think that’s fair.

thanks,

Ram

···

–
Ramesh Nair

Director

HiddenTao Ltd (UK company no. 6807289)

http://hiddentao.com

https://github.com/hiddentao

https://www.linkedin.com/in/hiddentao

kim · November 18, 2019, 8:57am

Hi Ram,

interesting project on your end as well, thanks for reaching out!

I'll just drop a few thoughts, mostly from a technical perspective:
our experience combining git and IPFS has been so-so[0]. This may not
be an issue for you, as I imagine it would be more important to have a
good, peer-to-peer distribution mechanism for readers, than a
close-to-"native" remote git collaboration experience. I do wonder,
however, if you have considered to build directly on IPFS (or Dat[1],
or SSB[2] for that matter), because you say:

Our end-user content creators will be non-developers.

Git isn't the easiest thing to wrap one's head around, especially when
used for anything else but code, yet IPFS would give you similar
primitives (storage model).

I'd be interested to hear more about how you envision people to
collaborate in this system - as in, what kinds of workflows do you
have in mind. We've been exploring the idea of a replicated state
machine so far - which is more general than what we think we'll need
for code collaboration, but perhaps the ability to capture
collaboration "rules" in user-modifiable code is applicable to what
you do.

Lastly, one of the trickiest problems to solve in any peer-to-peer
system is how to ensure data actually persists on the network. Disk
space and bandwidth might be cheap, but they're not free - so there's
always an incentive for peers to be picky about what they replicate.
Which goes against the censorship resistance you're aiming for - it
would be very interesting to hear how you're thinking about this.

-Kim

[0]: Roadmap · Issue #689 · radicle-dev/radicle-alpha · GitHub
[1]: https://www.datprotocol.com/
[2]: https://scuttlebutt.nz/