Journal

/

Built and benchmarked Urn against Git

Implemented init, status, add, commit, log, show, and diff commands. Depends only on OpenBSD base system tools. Didn’t bother with collaborative workflows.

Initial design mirrored the work tree using symlinks. Using filesystem as a database felt clever, but walking directories on every command and the inode churn were untenable. Replaced the symlink architecture with a path-sorted index.

The index tracks path, mtime, size, and SHA-1 hashes of staged, committed, and base files. Hashing is skipped when mtime and size are unchanged. If the file and the index share the same timestamp, it’s rehashed to catch sub-second changes.

Implemented directory scans as a two-finger walk with the index. Linear index access trades random-access speed for sequential IO and keeps memory footprint low.

Commits save staged files, trees, and deltas to the content-addressable object store. Bundled deltas into tarballs to conserve inodes. Gzipped objects larger than 512 bytes. The threshold was arbitrary. Did not tune further.

Deltas, computed using diff, target the original file. Subsequent versions are reconstructed via a single patch—no chains. When the delta exceeds the rebase threshold, the file becomes the new base. Diff output is bloated but compresses well, so rebase threshold is set to 1.4, assuming a 30-40% compression ratio.

Commands run in memory, using text streams and pipes wherever possible. Left MEM_LIMIT configurable to fall back to disk for large repositories:

my $flush = sub {
    if (!$use_disk) {
        ($tmp_fh, $tmp_path) = tempfile(UNLINK => 1);
        $tmp_fh->setvbuf(undef, POSIX::_IOFBF(), $chunk_size); 
        binmode $tmp_fh, ":raw"; 
        $use_disk = 1;
    }
    print $tmp_fh @buf;
};

push @buf, $line;
$buf_size += length($line);
$tot_size += length($line);

if ((!$use_disk && $tot_size > MEM_LIMIT) || 
    ($use_disk && $buf_size > $chunk_size)) {
    $flush->();
}

Benchmarked against Git v2.51.0 on a T490 (i7-10510U, OpenBSD 7.8):

=============================================================
 COMMIT BENCHMARK: 1000 files (100 commits)
 CONDITIONS: Depth=2, Files Mod=0.5%, Line Mod=5%
 INITIAL REPO SIZE: 17332 KB
=============================================================

SNAPSHOT: Commit #20
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.29s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1300 |                 1425
Repo size       |              6836 KB |              8296 KB
-------------------------------------------------------------

SNAPSHOT: Commit #60
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1381 |                 1706
Repo size       |              7896 KB |             10236 KB
-------------------------------------------------------------

SNAPSHOT: Commit #100
-------------------------------------------------------------
METRIC          | URN                  | GIT                 
----------------+----------------------+---------------------
Time            |                0.35s |                0.03s
Max RSS         |              0.02 MB |              0.01 MB
Page faults     |        Maj:0 / Min:0 |        Maj:0 / Min:0
Inodes          |                 1462 |                 1987
Repo size       |              9020 KB |             12168 KB
-------------------------------------------------------------

AFTER GIT GC
-------------------------------------------------------------
Final Size      |              9020 KB |              3812 KB
Final Inodes    |                 1462 |                   41
-------------------------------------------------------------

TOTAL URN REBASES: 0

Git wins on speed and memory.

On storage, Urn shows promise. Git wrote 12 MB to track a 17 MB repository; Urn wrote 9 MB. Over 80 commits, Git’s inode consumption grew by 562. Urn’s crept from 1,300 to 1,462.

Then fell the GC hammer. Inodes: 41. Space recovered: 8.4 MB.

Urn’s sequential IO and reduced write frequency are theoretically gentler on NAND. Git’s dramatic GC pass (12 MB → 3.8 MB) incurs SSD wear Urn likely avoids. Precise impact on TBW and write amplification, however, remains unknown.

Commit: 79d9ec2