Surfs Up - The Denotational Wave

fintohaps · December 3, 2019, 4:33pm

In this post I wanted to give people the low-down on radicle-surf and the journey for its first version.

In part it’s also a personal journey for myself and the team, since we are a partly-remote team and I wanted to give an
idea of how we work.

`radicle-surf`

As a part of the greater Radicle ecosystem, we will focus on the radicle-surf component here. The aim of this component is to provide an API for browsing code in a Version Control System (VCS). Think of it as similar to going to GitHub and looking at a specific repository. Some of the common things we want to do are:

Navigate directories
View a file’s contents
See what commit touched a file last
View the commits in some history

Another wish for the project was to have it independent from the VCS it used. We would like to explore having support for Pijul repositories, for example. With a few of these details in mind we moved forward into thinking about the design of radicle-surf.

Denotational Design

Denotational design is a technique for designing your API in such a way that it minimises abstraction leaks and lends itself to getting to the “true meaning” of what you are designing. Conal Elliott is the main (pretty much only) proponent of this technique. I first became interested in it after watching his talk and remarked on how elegant his abstractions turned out to be. On top of that, the process of thinking about what you want to represent and eliminating edge cases in the system is dear to my heart as a functional programmer who likes types and static guarantees.

The reason I mention this is because as I began to think about code browsing my friend Sandy Maguire had mentioned denotational design in his upcoming book, “Design and Interpretation of Haskell Programs”. This re-lit the first in my mind and I decided to explore the technique as a way to provide some upfront work on radicle-surf before diving in and writing code.

First Attempt

It’s rare that we get things right the first time around. It was no different with working denotational design and our initial stab at the problem. We can find the initial design here.

The tactic for designing in this way is to lay out your domain as much as you can up front. So, to start off we list all the objects we think we will require in the system. In this case, we had listed objects such as: Repo, CommitHistory, Commit, and Change. We can note here that these might not have initially existed from the outset, but instead brought into existence after exploring the design further.

Once we had the objects listed, we listed the actions we would like to have working on them. These build up the functionality of our API. For example, in code browsing we want to list a directory, get a particular set of commits, view a file’s contents, etc. With the types of objects and functions in hand we begin to reason about the “meaning” of these objects and functions. The aim of this step is a bit more nuanced and we spend the most of our time figuring this out and refining it. The best guide for this step is to find the type that is used the least in the functions. For example, Repo was only mentioned in two places:

addCommitHistory :: CommitHistory -> Repo -> Repo
getCommitHistories :: Repo -> [CommitHistory]

We have a function that adds a CommitHistory to a Repo, getting back the updated Repo. The other is to retrieve all the CommitHistory objects in a Repo. When we want to assign a meaning to an object, we want to pick the simplest meaning as possible. This means that we do not want to bring any unnecessary meaning with us during the design. We will not want to talk about what time the Repo was created at, so we will not be adding a time type to the meaning. The simplest meaning in our case here is type instance Repo = [CommitHistory]. A Repo is simply the collection of CommitHistory objects. From there we can derive the meaning of our two functions above using the overloaded operator μ (read “as meaning of”).

-- Prepend a 'CommitHistory' to a 'Repo'
addCommitHistory :: CommitHistory -> Repo -> Repo
μ addCommitHistory history = history : μ repo

-- Retrieve a list of the 'CommitHistory's
getCommitHistories :: Repo -> [CommitHistory]
μ getCommitHistories repo = μ repo

The issue was that when starting out I did not describe the domain correctly to myself. When fleshing out the design I got to the point where a Change was the atomic unit of the system. A Commit was made up of a set of Changes and they would build a file. Maybe it was my buzz on just learning about patch theory, Darcs, and Pijul.

In the design we defined Change as:

type Change
-- Constructors of Change - think GADT
AddLineToFile :: FileName -> Location -> FileContents -> Change
RemoveLineFromFile :: FileName -> Location -> Change
MoveFile :: FileName -> FileName -> Change
CreateFile :: FileName -> Change
DeleteFile :: FileName -> Change

Then to build up a file’s contents we added:

applyChange :: Change -> FileContents -> FileContents
μ applyChange change fileContents = case change of
  AddLineToFile _ location fileContents' -> addLine location fileContents' fileContents
  RemoveLineFromFile _ location          -> removeLine location fileContents
  -- These operations don't modify contents per se but rather just modify the file
  MoveFile _ _                           -> fileContents
  CreateFile _                           -> fileContents
  -- Deleting will set FileContents to nothing
  DeleteFile _                           -> emptyFileContents

While this is elegant from the point of view of constructing a file via its history, it became unnecessarily complicated to implement when all we wanted to do is view a file’s state. We would have to implement a patch-based approach for Git Commits. This would be a distraction from what we really wanted to do: view files and directories.

Second Attempt

The good news is that I have a team at Monadic to lean on when the going gets tough! After expressing my frustration with the initial design, Kim and I setup a call where we could collaborate on a refinement of the design. Kim is based in Berlin and I am based in Dublin so we hopped on Google Hangouts and used tmate to start a remote terminal that we could both work in. This is a really cool tool for doing some pair programming remotely, or even in the same room.

Sitting down together we thought about the problem again. We initially tried to think about the initial design and what was the issue, but we switched how we were thinking about things a bit. We had a thought experiment as to what a code browser might look like if we were to build it for a REPL. This lead us to an initial jumping off point where the user would get a repository, pick a history to view, build the directory from that history, and finally browse that directory. This matched up pretty well when thinking about how a user would utilise GitHub. We first go to a URL, pick a branch (default to the master branch), and we have our directories and files to view.

Then things got interesting. We knew we had this concept of building a directory. Using Pijul and Git as examples we thought about how they worked to render a directory from their units of change.

In Git we have the notion of Commit (see Commit Objects), and this will give us the ability to render a Tree. So if we had a list of commits (a Branch can be seen as a list of Commits), to get the directory state we grab the head of the list and render the Tree. On the flip-side we have Pijul which contains a list of Patches. If we are given a collection of Patches we can fold over them, applying each using Pijul’s logic, to get the redered directory state.

So it was important to note that we have two similar functions that go from some VCS artifact to a rendered directory. This led us to the idea of a Snapshot. A Snapshot was simply a function
from [a] -> Directory where a could be a Commit or a Patch. This is when we felt like we were really onto something. The boundary was clearly there separating any Repo logic from the
Directory logic.

We refined this further. Since we must have at least one a to bring a Directory into existence, a Snapshot became NonEmpty a -> Directory. We note that we could have also said that it would be [a] -> Maybe Directory. Instead we wanted to work on more natural semantics, since we can also note that there is a transformation from [a] -> Maybe (NonEmpty a). For a further explanation on this method, we highly recommend reading Parse, don’t validate, by Alexis King.

One issue with this initial meaning for Snapshot though was that head of an empty list is not a total function and so we could not get the head of list of Commits. We would like to avoid these kinds of faulty semantics as much as possible, and thankfully this is simply fixed by using a NonEmpty list, but it clearly illustrates the push and pull of giving meanings to types and functions.

We explored further the meaning of Snapshot and how it would play in the browsing semantics. We noted that in the initial user flow of the REPL, that there was no way of changing the history we wanted to work with. This smelled like we needed some way of viewing and updating state. In Haskell-land, this is achieved simply by using the State monad, where we can call functions such as get, set, and modify. At that moment we realised that our way of browsing code was simply combining a Reader monad (a read only state) over a State monad. The read-only state was the Snapshot function, and the modifiable state was History (i.e. NonEmpty a). From there we could easily define a function that rendered a Directory from its current state and modify state, such as finding a particular commit, or viewing a particular branch.

Once we have a Directory we can do things like finding a particular file or directory, or getting the diff between to two Directory objects. These are outlined in the second design.

Conclusion

The first minimal version of radicle-surf is coming along nicely and we would say this is due to having put some good up-front work before writing any code.

On top of having a reference document for writing the implementation, it also gave other Monadlings the chance to give their input on the design and allowed us to have a common place to discuss the design. It allowed us to not get distracted by nit-picky semantics, but rather focus on what we truly wanted to design and use.

We would love to hear your feedback and if you have had any experience in writing denotational design before

Lots of love,
@fintohaps

cloudhead · December 4, 2019, 11:24am

Great write-up. One question: since in git, the commit links to a tree, as stated, does it mean that rendering a git directory will always involve as input a NonEmpty with only one item in it? Compared to in a patch-based system, where you’d need n patches? I would imagine then that something like type Snapshot a = Commit a | Patches (NonEmpty a) would also be a valid representation?

fintohaps · December 4, 2019, 11:35am

So we actually generalise to:

type History a = NonEmpty a
type Snapshot  a = History a -> Directory

And then we can have:

type GitSnapshot = History Commit
type PijulSnapshot = History Patch

-- Get the latest commit of the history and render the Tree
gitSnapshot :: GitSnapshot
gitSnapshot commitHistory = getTree (head commitHistory) 

-- Use Pijul to apply the patches on an empty directory
pijulSnapshot :: PijulSnapshot
pijulSnapshot patches = foldl applyPijulPatch emptyRepo patches

The reason we picked Snapshot working over History is because then you can do manipulation over that, such as finding a particular Commit or Patch and then re-rendering the Directory.

Hope that answers your question @cloudhead! And thanks for the compliment

cloudhead · December 4, 2019, 12:55pm

Ok I think I understand. The Snapshot is not meant to be an instant in time, but an actual history? Basically I was trying to understand why the git snapshot has the whole history, since as you show there, you only need one commit to get Directory, eg. the head.

fintohaps · December 4, 2019, 2:00pm

I suppose the best way to put it is that a Snapshot is a way to interpret an instant in time into a Directory.

We use something more general than Head a | NonEmpty a for History because NonEmpty subsumes the other case and it can be reused to model things like Branches and Tags.

cloudhead · December 4, 2019, 2:33pm

Makes sense, thanks!

MrChico · December 11, 2019, 10:51am

This is a really interesting way of approaching the design space around this. And one that feels very inviting to community changes at a more fundamental level than “can I have it in blue, please?”.

At the risk of overgeneralizing, (or maybe taking seriously the idea that the meaning of these constructs are given by their structure rather than their names), I always found the obsession with Lines in vcs to be a bit crude. Maybe the construction AddLineToFile could be generalized to AddLeafToTree, where this could be specialized to lines or words in a file, or even nodes in an AST, if there is structural information from which this can be deduced.

fintohaps · December 11, 2019, 11:12am

Thanks @MrChico!

I do like where you’re going with the idea of an AST, maybe that is getting closer to the idea of diff’ing code We’re actually trying to think through this at the moment on this issue: Diff Semantics · Issue #27 · radicle-dev/radicle-surf · GitHub.

I’ll have to take some time and see how it plays out with a tree-like model Or maybe you’ll have more thoughts on it too before I do

MrChico · December 20, 2019, 12:00am

Are you guys familiar with the experimental language unison? They touch on some related ideas, or at least provide an interesting perspective on content-addressable code: https://www.youtube.com/watch?v=IvENPX0MAZ4

cloudhead · December 26, 2019, 5:30pm

Yes, I watched a talk about it some time ago – it’s a really clever idea and a plausible future for programming and collaboration. A few things that make me ponder:

In functional languages, it’s often possible to prove that two definitions are isomorphic – they have the same meaning, though they are formulated differently. In Unison, this would result in different content-ids, but it would be really cool if there was a canonical form. For example, you could apply eta-reduction and other forms of reduction on the function body before hashing it. This would considerably reduce the hash space and allow for less duplication.
I’m not sure if Unison compiles to machine code, but they say you don’t have to build stuff anymore. The content-addressable ASTs is very similar to a reproducible build system, but fine-grained. It would be cool if you could pull down binary code for functions that have already been compiled, and only ever compile functions once, accross the whole internet!

Those are my thoughts…

fintohaps · January 2, 2020, 11:43am

Ya, Unison is a really cool language! Fun fact, Runar got me my first Haskell job and we celebrated with some Green Spot Whiskey

Well if you’re using a functional language, you would have a canonical form, I believe. You could break down the isomorphisms into their sums, products, and exponents and then sub them into the AST.

But this also makes me think, can you retrofit the Unison content-addressable model to other languages, functional or not

I think that’s what the end goal for Unison is, as far as I remember. Because the point is to have a functional, distributed programming language. So you could pull down, or push out code to be computed on any machine. I think that’s the idea, but maybe we should get in touch with the folks there