Contrasted with other standards
An overview
To understand where Tenet fits in with existing solutions, let’s compare it with some representative systems.
Protocol | Types | Functions | Lifecycle |
---|---|---|---|
JSON | Fixed | None | None |
ProtocolBuffers | User-defined, binary | Interfaces | Precompiled |
XML | Document object model | XSLT, etc. | Runtime (usually) |
Native serialization | Native | Native | Runtime |
Tenet | User-defined, algebraic | User-defined | Precompiled |
A caveat
I haven’t yet solicted feedback from the maintainers of any technologies that I make any significant commentary on. Also, to be explicit, as someone working on a competing technology, I have a great deal of respect for all of these projects as I understand the difficulty of the trade-offs they made and that they all represent significant technical innovation in their own right.
Notable standards
JSON
Very simple protocols like JSON make an important tradeoff of features for ubiquity. They have a fixed set of types that are simple enough they can be represented as native types in many languages.
As a standard with a very simple specification, it’s instructive to look at what other standards have added and why.
- YAML and TOML add complexity to the parser for a more human readable and composable format.
- Ion adds s-expressions and more types to allow the representation of advanced logic within files.
- BSON, CBOR and MessagePack are binary variants designed to reduce transmission overhead.
It’s also instructive to look at the older s-expressions format. S-expressions have a similar origin to JSON, whereas JSON extracted a portion of the Javascript language, S-expressions were based on LISP.
All of them, however, are “schemaless” or “self-describing” formats. Self-describing is something of a marketing spin, because the data doesn’t describe itself, other tools or processes must be used to assign meaning to the structure and verify the correctness of data. They’ve been extremely successful in practice because applications can use ad hoc techniques for simple structures and then add automation as they grow more complex.
When we describe JSON has having no functions in the overview, that simply means that there’s no way to derive functions automatically from any particular JSON document. You absolutely can use various tools to query or work with JSON data. Similarly, JSON has no schema as part of the standard, so it has no lifecycle.
ProtocolBuffers and Thrift
ProtocolBuffers and similar formats like Apache Thrift are more advanced libraries that are aimed at sophisticated, high performance applications. They allow user-defined types, but their type system encompasses C-style binary types that transform relatively trivially into the exact binary layout. They are precompiled for performance into a systems language like C, Go, Rust or Java, but can also be used from dynamic languages as well.
The older ASN.1 format is similar in having a precise binary layout, and still extensively used in a variety of fields. It has an interesting feature that fields may be tagged “EXPLICIT” and thus readable without the original schema.
These formats are very effective for systems that have requirements for high availabilty and consequently demand that engineers will plan out APIs thoroughly and well in advance. The entire encrypted Web depends on ASN.1, for instance.
They still have the limitation that logic must be replicated by all consumers of the data types, however, they can describe an interface that all consumers must implement. And they have very flexible user-defined data structures that encompass both product types, messages or structs, and sum types, typically described as oneOf or union as well as container types like lists, maps and sets.
XML and the Document Object Model
XML is a broader standard that introduces a completely new structure called the DOM. It is a complex standard that requires extensive library support, but it’s also ubiquitous enough that there’s good support in many languages.
XML, and more famously HTML, is described by Standard Generalized Markup Language; the “generalized” indicates that it can describe many characteristics of markup languages. Neither of XML or HTML are currently pure SGML specifications, but there are some like Docbook that are.
SGML was aimed at building languages to describe documents, the premise being that text is marked up with tags of some sort, enabling diverse systems to analyze and extract information from them. XML became popular as a mechanism for serializing data and found heavy use in Java and enterprise applications.
As these standards evolved from the need to publishing and cataloguing documents and forms, they tend to be oriented towards automating complex bureaucratic processes and are very effective in that space. Some standards built on XML like SOAP try to add a data structure that directly mimics the data types used in programming languages, but these were eclipsed by the far simpler JSON family of formats.
As a result of its origins, XML has extensive tooling devoted to templating, transforming and analyzing it, so a significant amount of logic can be removed from the application code. And partly because of this, it’s typically not necessary to use any autogenerated code to manage XML; you can use something like XSLT to process it directly.
The main downside to XML is that you can have any type scheme you like, so long as you like the DOM. It’s a niche language for a very large niche.
Native language serialization
Many languages have a native serialization standard, such as Java’s java.util.serialization, Python’s pickle and Ruby’s marshal.
There are different techniques used to implement these libraries, but roughly speaking, a mini stack machine walks an object graph in the heap and writes out commands to reconstruct it later. It’s a simple and elegant solution, though there are serious security issues arising from having a stack machine executing commands based on untrusted input.
The intent for these standards has generally been for use within the language more than as a general purpose serialization standard; just being able to transmit objects through IPC or over the wire is tremendously useful for any kind of distributed processing. And if an enterprise is willing to live with a monoculture, even the one-language restriction becomes less important.
The big appeal of native serialization is that you have the full power of your language’s native type system, and distributing logic is as easy as distributing regular code, and there’s no requirement to autogenerate any code, which is a huge advantage over all the prior serialization systems.
Tenet
Tenet has user-defined algebraic types, user-defined functions, and is precompiled. It’s aiming to deliver consistent semantics on multiple platforms, but rather than the semantics of a DOM as XML does, these are types built on strings, numbers, named tuples, tagged unions, sets, lists and maps.
If we broadly partition solutions into “simple” vs “complex,” Tenet is one of the complex solutions. It’s distinct from solutions like ProtocolBuffers, though, because its type system is logical rather than binary, and it’s distinct from XML because it doesn’t constrain the user to a DOM.
It’s also relatively novel in that it provides functions and built-in support for versioning rather than guidelines. Tenet offers a total functional language1 and will build functions written in it for all clients. In addition, because it has a built-in language, if a newer version of a client is released, functions to translate from the older format can be released with it to all supporting languages. It’s not magic, but it offers far more power to rename and restructure an API than any other scheme; even native serialization has trouble with this.
-
Note that a total functional language is not Turing complete because it’s restricted to programs that provably terminate. ↩