Tenet’s approach to data

We’re used to several paradigms in representing data structures, and Tenet is a bit different from all of them.

Common paradigms in data types.

So let’s first do a brief survey of different paradigms. It’s worth noting: these aren’t all inclusive, many languages borrow from multiple paradigms. Rather, they represent a way of structuring data. And all of these are listed because they’ve been extremely successful in some way.

C-style data types

Conceptually, the C language has the simplest data types. These are driven by the restriction of working very close to the processor. You typically get integers and floats bound by your machine word, pointers and arrays based on simple offset arithmetic. In addition, unions and structs are shockingly powerful given that all they do is divvy up fixed chunks of memory.

On the plus side, these are absurdly fast and you’re in complete control over the exact bytes you write to memory or disk. On the down side, everyting else, even strings, requires library support.

Concrete data types

This is a neologism; the concrete data types combine a few simple atoms, strings and numbers, possibly dates or times, and some standard collections like lists, sets, and maps. They are common in agile languages, popularized by Perl, Python, Ruby and Javascript, and what’s common to all these languages’ implementation of them is the ability to construct complex nested expressions with no function calls.

To be clear, all those languages have support for classes and such; concrete data types are more narrowly the ability to write something like this:

{
  name: 'Bob',
  age,
  favoriteLetters: ['q', 'y', 's', letter],
  time: now()
}

There is no type declaration needed, which is why this style is very obvious and simple for new developers and how new projects can get started very quickly. Much as goto led to spaghetti code, though, this can lead to a kind of “data pasta” as the semantics of the data distributes over dozens of functions in multiple projects.

The relational model

The relational model is a mathematical model based on set theory and predicate calculus for maintaining a database of true statements about the world. The dominant implementation is the SQL standard which, unfortunately, early on decided that tables, rows and columns should map one-to-one to their structure on disk, losing much of the flexibility of the relational model.

What makes DBMS’s interesting is their common use case of managing the essential information that makes a database work. Rows in a SQL DBMS are typically the authoritative knowledge of contracts between a business and its customers, leading to requirements that demand a very different mindset from the common assumption that developers make that they can blow the data away and start from scratch.

Object-oriented programming

Object-orientation tries to associate code with data and often enforces encapsulation to ensure that any given piece of code is understandable and clear. OOP generally allows polymorphism through inheritance and, often, generics. OOP tends to come in two flavors: the “struct and vtable” flavor of compiled languages, and the “hash and type object” flavor of dynamic languages. These implementation details tend to drive an object-oriented language’s type system.

On the plus side, object-orientation has been hugely successful in allowing teams to collaborate on large projects. On the down side, most developers learning it make repeated missteps trying to decide what should actually be in a class. Inheritance is decidedly counter-intuitive, and it’s entirely mathematically unsound; a trivial example like modelling a square, a rectangle and a parallelogram breaks inheritance quite easily.

Algebraic data types

This is the dominant scheme of functional languages, and Tenet’s as well. An ADT is typically a union of tuples. As many people might not be familiar with this, let’s look at a concrete example in Haskell.

data Pet = Cat { name :: String, disdain :: Integer }
         | Dog { name :: String, lastWalked :: Time }

bob = Cat { name="Bob", disdain=9001 }
susie = Dog { name="Susie", lastWalked=time 14 30 }

In this example, Pet is the type, and Cat and Dog are constructors that construct values of type Pet. A functional language will typically have facilities to make it easier to match values:

report :: Pet -> String
report Cat {name, disdain} = name ++ "'s disdain for you is at level "
                             ++ show disdain ++ "."
report Dog {name, lastWalked} = name ++ " hasn't been walked since "
                                ++ show lastWalked ++ " which is basically forever."

Unlike in an OOP language there are no subclasses at work here, these are simply different values for the same type. And ADTs are typically immutable.

Tenet’s approach

Tenet draws from algebraic data types and concrete data types. Concrete data types are necessary because they describe a majority of data out there, while algebraic data types can describe just about anything else.

The capsule description of types in Tenet:

  • All values are immutable
  • The native style for manipulating data is imperative
  • Literal values are always well typed
  • Functions are first-class
  • Infinite or cyclic structures are impossible in Tenet

Faux-mutability

While even very junior devs can understand mutability perfectly well, very expert devs routinely can’t keep the complexity of even relatively modestly complex data structures in their heads. Immutable structures mean that changes can only occur through a return statement, so side-effects are local and understandable.

Besides side-effects, generics in a mutable language have to deal with covariance and contravariance, see Kotlin’s design notes on the subject to appreciate the complexity of it.

While there has been extensive work done by the functional community to develop good idioms to represent changes to immutable structures, it doesn’t seem like they’ve settled on a standard. This may be, in part, because functional languages are flexible enough that they don’t have to, but regardless, a user has a plethora of idioms to learn and that’s not conducive to Tenet’s design goals. The idioms used in mutable languages, on the other hand, are ubiquitous and well understood.

Concrete types in Tenet

Besides strings and integers and some future rational number type, Tenet will support lists, sets and maps natively. Most host languages support them natively, and being bound by different performance characteristics, Tenet won’t be well-suited to implement low-level data structures.

Many host languages offer different implementations of these concrete types. To start with, we’ll use a single reasonably performant implementation, but it should be possible to transparently select different implementations depending on how variables are used.

One point is that the syntax of tuples is deliberately similar to Python’s dicts, Javascript’s objects and Ruby’s hashes because they frequently serve a similar use case.

Algebraic data types

One innovation in Tenet is that the types aren’t nominal; that is Tenet can completely infer types from these expressions:

bob := cat ~ { name: "Bob", disdain: 9001 }
susie := dog ~ { name: "Susie", lastWalked: time(14, 30) }

Now, they are separate types. Specifically, here are the declarations:

type BobType := Union[cat ~ Tuple[name: String, disdain: Integer]]
type SusieType := Union[dog ~ Tuple[name: String, lastWalked: Time]]

But the rule for Union types is that they can combine. If we declare this function:

def which_pet(pet):
    switch pet
    case "bob":
        return bob
    case "susie":
        return susie
    default:
        return no_such_pet ~ pet

Tenet can thus infer the type as:

type PetPlusError := Union:
    cat ~ Tuple[name: String, disdain: Integer]
    dog ~ Tuple[name: String, lastWalked: Time]
    no_such_pet ~ String

(In case it’s not clear, the square brackets can be replaced with a nested block.)

This is because Union is a sum type, and type inference can combine unions so long as the tags are different or, if they’re the same, the variants are compatible.

We can use this to great effect in return types. Say we have this function:

def ask_about_pet(input_state):
    switch input_state
    case ok ~ pet_name:
        return which_pet(pet_name)
    default:
        return input_state

Here the input_state can indicate a success or an error. In the case of success, we pass it back. In the case of an error, we just pass that through, and can then pass our own error as well.

(Caveat: we’ll need more intelligence on the part of the type system to recognize that input_state can’t be ok in the default clause.)

Cyclic data structures

A cyclic data structure is simply any data structure with a cycle, or a loop in it. As an example:

These are very common in many languages, and they are the reason so much effort is put into elaborate garbage collection schemes. In Tenet, though, cyclic data structures are not possible due to the semantics of assignment. Let’s consider a cyclic list in Java:

class LinkedList {
    int num = 0;
    LinkedList next = null;

    static LinkedList make() {
        node = LinkedList();
        node.num = 6;
        node.next = node;
        return node;
    }
}

Here, LinkedList is constructing a singly linked list, but our make method will assign next back to the original node.

If we do the same in Tenet, though:

def make_cyclic():
    node := { num: 6, next: empty~~ }
    node.next := follow ~ node
    return node

result = { num: 6, next: follow ~ { num: 6, next: empty~~} }

We can see that result.next is simply the original value of node. The reason is that the code is translated into:

def make_cyclic():
    node := { num: 6, next: empty~~ }
    node_2 := !!set_slot(node, next, follow ~ node)
    return node_2

This makes explicit that node.next is really constructing a new value leaving the earlier node unaffected and renaming future references. Inifinite lists in functional languages are also possible, for example, in Haskell:

cycle :: a -> [a]
cycle item =
   let inf = item : inf
   in inf

Tenet prohibits forward or recursive bindings, so the equivalent inf := [item] + inf would fail. These limitations have some benefits:

  • memory management requires only reference counting
  • cycles are not a concern when serializing a data structure
  • a class of errors is eliminated

Tenet will obviously work in a language that offers garbage collection, but only requiring reference counting is very important for being able to generate output for a wide variety of languages like C.

Cyclic types

At present, cyclic types are not possible, but this restriction will be lifted in the future. This is not the same as cylic data structures, though. A cyclic type would be something like:

type BooleanExpr := Union:
    or ~ List[BooleanExpr]
    and ~ List[BooleanExpr]
    true ~~
    false ~~

This is cyclic because the types of the operators or and and both refer back to BooleanExpr, e.g.:

The data itself must be acyclic, consider or ~ [true~~, and ~ [true~~, false~~]]:

The type itself is cyclic because we can keep adding or and and nodes, but an actual value must terminate.

This also means that a cyclic type will be required to have a terminating variant. Thus:

type Unconstructible := Tuple[a: Unconstructible]
type Constructible := Tuple[a: Union[a ~ Constructible, b~~]]

In the case of Constructible, {a: a~{a: b~~}} is a valid value.