What's the Diff

fintohaps · February 3, 2020, 4:15pm

With a new development cycle upon us, radicle-surf needs to revisit the task of implementing diff’ing.

We created a couple of issues to track this work initially:

The reason for getting more serious about this topic is that we would like to complete some code browsing features. To name just a few:

Viewing commit changes
Comparing two commits
View branch comparisons (more specifically for viewing proposed changes to be merged)

Work in Progress

Thanks to the work of Andrei Tomashpolskiy we have a start for diff’ing. This initial implementation surfaced a lot of questions that we hadn’t initially considered when thinking about generating diffs. We initially thought that generating a diff from the Directory structure would be quite powerful since it would be implemented in, so-called, “pure” functions. It would not rely on any store or state. In reality though, it would be useful to boot-strap off of the backing VCS’s implementation of diff’ing since it would track a lot of useful information already, such as lines changed, files moved, etc.

Next Steps

Moving forward we would like to try work from the concrete and notice patterns that we could generalise for future development. This process will be different in comparison to Surfs Up - The Denotational Wave. If we were to approach this problem with another denotational design perspective, I would fear that we’d be on the road to re-inventing something like Pijul (but nowhere near as complete and nice).

Instead we will start from the model of libgit2 (more specifically the Rust wrapper git2), mapping out how it models diffs, and try to create a model that would make more sense to us and other implementations, again, such as Pijul. Over the next couple of weeks, starting today, we will dig into diff’ing with one use case at a time.

Commit Change-Set

The first use-case is looking at a single commit change set. In GitHub this would be the same as looking at a single commit, e.g. Add Deref for Label · radicle-dev/radicle-surf@07fce73 · GitHub. What this is doing is taking a single commit and comparing it to its parent. What we get is a series of files that were changed, whether they were added, deleted, moved, or modified, and within modifications what lines were added or removed. Sounds simple right? Well in libgit2 it turns out to be a web of data structures that we’re currently navigating through. Here’s a taste of what we have so far:

To get a diff of two Commits we can get their Trees and call diff_tree_to_tree
A Patch can be retrieved by using Patch::from_diff passing the Diff we just retrieved along with with the file you want to look at (which is given by an index, by the way…)
Within the Patch we have a DiffDelta which tells us what files are being touched
We also have a series of DiffHunks
DiffHunk’s in turn have lines which are captured in the structure: DiffLine. It’s here where we can tell how the content has changed within a file.

Finding the Pattern

While these are just our initial findings with a quick dive into the libgit2 API (via git2), we will have to keep in the back of our mind what patterns we can see here. While these data structures may be useful for the git model, is this type of nested structure helpful for code browsing? Is there a more user friendly structure that we can come up with that allows for the plug’n’play of other VCS libraries? What information makes the most sense for radicle-upstream so they can display diffs in our beautiful application?

If you, dear reader, have any ideas please let us know! In the meantime, we will play around with these git primitives and keep you updated on our progress

Thank you for reading,
Team CoCo (Code Collaboration)

Edit 1: “tells us” instead of “tells use”
Edit 2: Add links to libgit2 and git2-rs

fintohaps · February 4, 2020, 5:09pm

Update 1

So as promised I said I would try and keep you, dear reader, in the loop. I must admit that the previous post was quite cathartic in the sense that I was staring at these types/data structures and trying to picture how they all linked. The process of writing them down freed me up and allowed me to have those details “on paper” v.s. “in my head”. Today I was able to revisit those bullet points and see how I could get some sort of minimal model together.

All the code I will discuss here can be found on this WIP branch: https://github.com/radicle-dev/radicle-surf/blob/783ceb0f1aeeb645f6d404f27319215a3654681d/src/diff/git/mod.rs

No, You the MVP!

In this section I want to give you an insight in to how I work (sometimes). After working with Haskell long enough, I gained an immense affinity to work with types when thinking about a problem. Since Rust has a Good Type System , I still think in this way.

What data do we need to talk when talking about a change in contents? Thankfully the answer is in DiffLine. All we need is the line number, the column offset, and the contents that have changed:

pub struct Content {
    pub line: u32,
    pub offset: i64,
    pub contents: Vec<u8>,
}

Moving on we want to convey what kind of change we’re talking about. In this case I already had an idea, but DiffLine also tells us that there’s an addition, a removal, and contextual lines (to help us, the user of git). Note that there are other types of line diffs, but we’ll ignore them in this MVP. We can model this with the following enum:

pub enum ContentChange {
    Addition(Content),
    Deletion(Content),
    Context(Content),
}

These are fine-and-dandy, but to actually see if we’re on the right track we need to be able to convert DiffLines into ContentChanges. To do this I opted for TryFrom since there is the chance that our data doesn’t look right when converting from DiffLine due to its model being slightly different and us not handling all the cases it does currently. While the following uses a pub enum Error defined in the file, I actually used the () type for Error when I was prototyping.

Wiring it all together we get:

impl<'a> TryFrom<DiffLine<'a>> for ContentChange {
    type Error = Error;

    fn try_from(diff_line: DiffLine) -> Result<Self, Self::Error> {
        match diff_line.origin() {
            '+' => {
                let content = Content {
                    line: diff_line.new_lineno().ok_or(Error::MissingNewLine)?,
                    offset: diff_line.content_offset(),
                    contents: diff_line.content().to_vec(),
                };
                Ok(ContentChange::Addition(content))
            },
            '-' => {
                let content = Content {
                    line: diff_line.old_lineno().ok_or(Error::MissingOldLine)?,
                    offset: diff_line.content_offset(),
                    contents: diff_line.content().to_vec(),
                };
                Ok(ContentChange::Deletion(content))
            },
            ' ' => {
                let content = Content {
                    line: diff_line.old_lineno().ok_or(Error::MissingOldLine)?,
                    offset: diff_line.content_offset(),
                    contents: diff_line.content().to_vec(),
                };
                Ok(ContentChange::Context(content))
            },
            c => Err(Error::UnhandledChange(c)),
        }
    }
}

From here the process was very similar. I wanted to define a data type that captured file changes where the changes included: an added file, a deleted file, a renamed file, a modified file.

pub enum FileChange {
    Addition(path::PathBuf),
    Deletion(path::PathBuf),
    Move {
        old_path: path::PathBuf,
        new_path: path::PathBuf,
    },
    Modification {
        path: path::PathBuf,
        changes: Vec<ContentChange>,
    },
}

Writing an implementation for TryFrom<Patch> for FileChange we now have a way for getting diff changes in a model that, at least for me, makes a bit more sense and has more guarantees as to what data is present. Looking at you old_file and new_file

Failing Test Driven Development

Look, I’ll come out and say REPLs are great. While Haskell’s tooling has something left to be desired, its REPL is sooooo good. It allows for some really nice and quick feedback when prototyping. That being said I have found a way to mess around with test code in Rust. When I want to play with some code I add a test case that always fails, that way any printlns will fire and I can see the results. Something like the following:

[cfg(test)]
mod tests {
    #[test]
    fn playground() {
        println!("Hello, playground!");
        assert!(false);
    }
}

So I did this to test the TryFrom instances above using our golden (platinum) test repo GitHub - radicle-dev/git-platinum: Platinum files for testing radicle-upstream and its first two commits. Running cargo test diff::git on the command line we get the following output:

"src/Eval.hs"
line no: 20
line offset: -1
contents: import           Radicle.Lang.Identifier (Ident(..))

line no: 21
line offset: -1
contents: import           Radicle.Lang.Internal.Orphans ()

line no: 22
line offset: -1
contents: 

----
line no: 23
line offset: 590
contents: -- | The built-in, original, eval, the MVP.

++++
line no: 23
line offset: 590
contents: -- | The built-in, original, eval.

line no: 24
line offset: -1
contents: baseEval :: Monad m => Value -> Lang m Value

line no: 25
line offset: -1
contents: baseEval val = logValPos val $ case val of

line no: 26
line offset: -1
contents:     Atom i -> lookupAtom i

Gorgeous, don’tyathink?

What’s Next?

From here, I think the progress to be made is a two-pronged attack. The first prong will be to tidy this code up and test a lot of use cases, eventually wrapping it into the radicle-surf API so that our application can do a changeset call on a Commit and get the desired output.
The second prong will be to take another look into Pijul and see how close this model might map to its way of representing changes, and evaluating this model based on this research. Maybe this can be more trait-like

It would be cool to hear if you like this type of story and if you find the prose useful around my thought processes, style of development, etc. and I would also value hearing anything that could help me/us

Peace out my fellow Radicles