Identifiers

An identifier in Tenet is a sequence of ASCII codepoints that:

  1. matches the regular expression /[a-zA-Z][a-zA-Z0-9]*(_[a-zA-Z0-9]+)*_?/
  2. is not a Tenet keyword or reserved word
  3. does not contain certain restricted prefixes

The case of the first letter must be:

Fixer recommendation:

  • report names that are probably in camel case.
  • suggest fix to convert to underscore separated names.

Identifier folding

In order to support conversions of identifiers to multiple language conventions, Tenet restricts the use of similarly named identifiers.

Identifiers must be unique when folded by:

If two or more identifiers fold to the same value within the same domain, this is an error.

Compilation should continue to assist in fixing the error and identifying other errors.

Mapping identifiers to host conventions

Defining a Tenet identifier as a series of _ delimited words, it can be translated to various host language conventions.

The naming translation performs a delimiting transform and a case transform.

The case transforms are applied to all words:

The delimiting transforms are:

Resolving conflicts with keywords

A user may add a trailing _ to avoid conflicts with Tenet keywords.

If an identifier conflicts with a host language keyword, the implementation adds a trailing _ unless there is a preferred convention for distinguishing names.

Domains

Certain identifiers may be reused in different domains, depending on the host language and the constructs used to implement the API.

The domains are:

A concern here is we allow names to be reused between domains and then find we want to write tag accessors and those now conflict with attribute accessors.

Restricted Prefixes

Most languages, however, have namespaces of one sort or another. For the few languages that have a single global namespace, names could be assigned a custom prefix.

Other potential conflicts

There are other pitfalls. Haskell’s record syntax generates accessor functions. Consider:

data Murmur = Murmur { alpha :: Foo }
data Foobar = Foobar { alpha :: Foo }

According to the language, it should define two accessor functions named alpha. Specific compilers like GHC can work around this with language extensions, but if we don’t rely on extensions, we’d have to put individual types in their own modules.

Ordering

Identifiers are ordering by the collation used at build time. Because symbols are guaranteed to be unique by collation, this ordering is total and canonical with respect to build.

Future

Some considerations for the future.

Marking private identifiers

Presently, the spec has explicit exports. An alternative is to export everything by default that doesn’t begin with _.

Unicode identifiers

The basic idea is to expand acceptable characters in identifiers to include Unicode via UAX #31.

To support folding and sorting would require collations like DUCET.

And since it would require a lot more work to ensure names translated properly in the target languages, it should be based on actual use cases and real code.

Caseless identifiers

Tenet distinguishes types and values through the case of the initial letter, which is a problem for caseless languages.

UAX #31 mentions using _ as a prefix. It identifies a number of other languages using case to distinguish identifiers.

Clean seems to be ASCII only.

GHC treats a leading caseless letter as lowercase, but the Haskell spec hasn’t been updated.

Go specifies an identifier is exported only when it starts with a character in the Lu class, thus caseless characters are not exported.

SWI-Prolog appears to use leading _ to work around caseless languages.

Erlang seems to restrict case-sensitive tokens to Latin1.

Tenet will require that identifiers not begin with a caseless letter, as it will be easier to lift that restriction than pick something bad. The aim should be a convention that is obvious to native readers, mechanical, and easy to type.