I run various homespun personal databases for music notes, journaling, writing projects, graphics, and technical documentation. I am engaged in a circuitous process of exploring some broader questions around them.
Over 2 years of focus on generalized, large-scale tools and corresponding user interfaces, some insights have begun to emerge:
In 2019, I shifted emphasis from protocols to understanding systems of information and identifying small missing pieces in available toolkits. In a sense, I have come around to applying the Unix Philosophy to this work.
The following is painfully technical.
This all started with some abstract thinking. Over the years, I have applied concepts of functional programming and type systems to large projects, and the approach seems to limit complexity enough to make a major difference in cognitive load.
What other messy areas could I revisit with a slightly different approach? Mature REST APIs show similarly big & hairy tendencies, so what about typed APIs? Is GraphQL’s typing worth the custom query language and server side baggage?
I decided to build a GraphQL API to feel things out, on top of the reference implementation server. Because this is an exploratory project, I picked an appropriately abstract goal: arbitrary data storage and retrieval via a GraphQL API.
I put together a couple of simple GraphQL schemas for testing CRUD operations. But faced with a need to map requests to data, yet hoping to steer clear of ORMs, I turned to history.
GraphQL was created by an American social media company, which has also published information about their storage system for graph data, TAO. It ought to be a natural fit, and the specification for TAO operations is very minimal, so I implmented it as an interface to a relational data store.
Only a small amount of glue code was required to translate a GraphQL schema to mutation and query operations, and then to TAO persistence operations. The pieces fell into place and the general purpose server was working.
The one hitch was that search and index operations were lacking, and would require either data storage gymnastics, or new graph database operations. Instead, I set up a second (very simple) data store for search; saved data is persisted to both locations.
With nothing more than the concept of a type system gluing API and database together so neatly, I wondered if a user interface would follow as easily.
So, I built a web front end for arbitrary GraphQL schemas. (That is, for schemas following my simplified CRUD-only implementation.) This is essentially a graphical “database admin” tool.
To take things further, I used GraphQL’s then-newish and nearly-undocumented schema directives to define form and field options for the client.
While tinkering on the system, I had a realization about the underlying design of GraphQL and how it interacts with the data layer: it follows the command-query separation pattern.
This was a perfect opportunity to explore event sourcing, because GraphQL’s mutations can be treated as commands that emit events.
I refactored the whole API and storage layer, such that:
This update also added a full history of data changes. This history includes a full history of data schemas as well, so schema updates do not make prior history difficult to read.
Now that I’m thinking of data as a chain of commits, I started reading more about where we could take that format. Some inspiring reading includes:
The first and perhaps most obvious thing to do was use content-addressable storage for events, and group events into commits, to support versioning.
With the introduction of commits, I started thinking about commit signing and encryption. And that led to the big question: what is identity? Who is creating these commits? The GraphQL API has a user system, so I could just attach a user ID and call it done. But the peer-to-peer concepts from these papers I’ve been reading feel like the next obvious step.
So I put together a certificate-based signing system for commits, and started researching distributed identity systems, address book protocols, and web of trust.
Thus far, this project has followed a natural flow of ideas building toward something, but I began to lose sight of a coherent outcome. (How does an API/database system mesh with a peer-to-peer system? And how does that help my original charter of managing small scale personal information?)
I decided to table the low level system engineering and get back to user interfaces. I prefer my data to live in boring, accessible formats, like plain text files and image files.
Can I apply the lessons from command-query separation, event sourcing, content addressable storage, and peer-to-peer systems, to something much smaller in scale but no less confounding?
A previous two prototypes were almost entirely user interface, with the database assuming the unexpected role of short-term memory and communication tool.