BARE Message Encoding Binary Application Record Encoding

BARE is a simple binary representation for structured application data.

NOTICE: The BARE encoding is not finalized. Feedback is welcome. In the near future, the specification will be frozen and an RFC will be filed.

BARE at a glance

Here is a sample schema:

type PublicKey data<128>
type Time string # ISO 8601

enum Department {
  ACCOUNTING
  ADMINISTRATION
  CUSTOMER_SERVICE
  DEVELOPMENT

  # Reserved for the CEO
  JSMITH = 99
}

type Customer {
  name: string
  email: string
  address: Address
  orders: []{
    orderId: i64
    quantity: i32
  }
  metadata: map[string]data
}

type Employee {
  name: string
  email: string
  address: Address
  department: Department
  hireDate: Time
  publicKey: optional<PublicKey>
  metadata: map[string]data
}

type Person (Customer | Employee)

type Address {
  address: [4]string
  city: string
  state: string
  country: string
}

Known implementations

Go
go-bare
Rust
serde_bare
Python
bare-py
pybare
JavaScript
bare-js
D
bare
Zig
zig-bare
PHP
bare-mess-php
Common Lisp
cl-bare
OCaml
bare-ocaml

Specification

Binary Application Record Encoding

DRAFT

Binary Application Record Encoding (BARE) is, as the name implies, a simple binary representation for structured application data.

BARE messages omit type information, and are not self-describing. The structure of a message must be established out of band, generally by prior agreement and context - for example, if a BARE message is returned from /api/user/info, it can be inferred from context that the message represents user information, and the structure of such messages is available in the documentation for this API.

Message Format

A BARE message is a single value of a pre-defined type, though the type and its encoded value may be an aggregate type.

Primitive Types
The following primitive data types are supported:
uint, int
A variable-length integer. Each octet of the encoded value has the most-significant bit set, except for the last octet. The remaining bits are the integer value in 7-bit groups, least-significant first.

Signed integers are mapped to unsigned integers using "zig-zag" encoding: positive values x are written as 2*x + 0, negative values are written as 2*(^x) + 1; that is, negative numbers are complemented and whether to complement is encoded in bit 0.

The maximum precision of a varint is 64 bit.

u8, u16, u32, u64
An unsigned little-endian integer with a fixed length in bits. The precision is 8, 16, 32, and 64 bits respectively.
i8, i16, i32, i64
A signed two's complement, little-endian integer with a fixed length in bits. The precision is 8, 16, 32, and 64 bits respectively.
f32, f64
A 32-, or 64-bit IEEE-754 floating point number, little-endian.
bool
A boolean, either true or false, represented respectively by a one or a zero encoded as an 8-bit unsigned integer. Any non-zero value is interpreted as true.
enum
A value from a set of possible values enumerated in advance, encoded as a uint.
string
A UTF-8 string of text, prefixed by the string's length in bytes as uint.
data<length>
Arbitrary binary data with a fixed "length" in bytes, e.g. data<16>. The binary data is encoded literally. The length must be representable as a u64, but is not encoded into the message.
data
Arbitrary binary data of an undefined length. The length in bytes is encoded as a uint, followed by the binary data encoded literally.
void
A type with zero length. It is useful to create user-defined types which alias void to create discrete options in a tagged union which do not have any underlying storage.
Aggregate types
optional<type>
A value of "type" which may or may not be assigned, e.g. optional<u32>. Represented either as an 8-bit unsigned integer 0, indicating that the value is unset; or any nonzero integer to indicate that the value is set, followed by the value.
[length]type
An array of values of "type" with a fixed "length", e.g. [8]string. The encoding of this value is the encoded member values concatenated to one another, with no delimiters or length prefix.
[]type
An array of values of "type" with an undefined length, e.g. []string. The length of the array in values is encoded into the message as a uint, followed by the concatenated values.
map[type A]type B
A map of values of type B keyed by values of type A, e.g. map[u32]string. The encoded representation of a map begins with the number of key/value pairs encoded as a uint, followed by the key/value pairs concatenated together. Each key/value pair is encoded as the encoded key and encoded value concatenated. The order of items is undefined, and if a key is repeated, the last key/value pair of that key is considered authoritative.
(type | type | ...)
A tagged union whose value can be one of any type from a set. Each type in the set is assigned a numeric representation, starting at zero and incrementing for each type. The value is encoded as the selected tag as a uint, followed by the value itself encoded as that type.
struct
A set of values of arbitrary types, concatenated together in an order known in advance.

User-defined Types

A user-defined type gives a name to a built-in type, or aliases another type. This creates a distinct type, whose underlying storage is equivalent to the type it names.

Invariants

The following invariants must be upheld in a BARE schema:

  1. Any type which is ultimately a void type (either directly or through user-defined types) may not be used as an optional type, struct member, array member, or map key or value. Void types may only be used as members of the set of types in a tagged union.
  2. The lengths of fixed-length arrays and data types must be at least 1.
  3. Structs must have at least one field.
  4. Unions must have at least one type.
  5. Map keys must use a primitive type which is not data or data<length>.
  6. Two or more values in the same enum cannot share the same value.

Message Schema Language

The use of a schema language is optional, and implementations should support decoding arbitrary BARE messages without such a document, or by defining the schema in a manner utilizing more native tools available from the language or runtime environment.

However, it may be useful to have a schema language, for use with code generation, documentation, or interoperability. A domain-specific language is provided for this purpose.

During lexical analysis, whitespace may be used to separate tokens, and is then discarded. Additionally, "#" is used for comments; if encountered, the "#" character and any subsequent characters are discarded until a LF is found. The syntax of this language is represented by the following ABNF grammar (see RFC5234):

schema		= 1*user-type

user-type	 = "type" user-type-name non-enum-type
user-type	/= "enum" user-type-name enum-type

type		= non-enum-type / enum-type
non-enum-type	= primitive-type / aggregate-type / user-type-name

user-type-name	= UPPER *(ALPHA / DIGIT) ; First letter is uppercase

primitive-type	 = "int" / "i8"  / "i16" / "i32" / "i64"
primitive-type	/= "uint" / "u8"  / "u16" / "u32" / "u64"
primitive-type	/= "f32" / "f64"
primitive-type	/= "bool"
primitive-type	/= "string"
primitive-type	/= "data" / ("data" "<" integer ">")
primitive-type	/= "void"

enum-type	= "{" enum-values "}"
enum-values	= enum-value / (enum-values enum-value)
enum-value	= enum-value-name / (enum-value-name "=" integer)
enum-value-name = UPPER *(UPPER / DIGIT / "_")

aggregate-type	 = optional-type
aggregate-type	/= array-type
aggregate-type	/= map-type
aggregate-type	/= union-type
aggregate-type	/= struct-type

optional-type	= "optional" "<" type ">"

array-type	= "[" [integer] "]" type
integer		= 1*DIGIT

map-type	= "map" "[" type "]" type

union-type	= "(" union-members ")"
union-members	= union-member / (union-members "|" union-member)
union-member	= type ["=" integer]

struct-type	= "{" fields "}"
fields		= field / (fields field)
field		= 1*ALPHA ":" type

UPPER		= %x41-5A ; uppercase ASCII letters

Here is a simple example schema using this language:

type PublicKey data<128>
type Time string # ISO 8601

enum Department {
  ACCOUNTING
  ADMINISTRATION
  CUSTOMER_SERVICE
  DEVELOPMENT

  # Reserved for the CEO
  JSMITH = 99
}

type Customer {
  name: string
  email: string
  address: Address
  orders: []{
    orderId: i64
    quantity: i32
  }
  metadata: map[string]data
}

type Employee {
  name: string
  email: string
  address: Address
  department: Department
  hireDate: Time
  publicKey: optional<PublicKey>
  metadata: map[string]data
}

type Person (Customer | Employee)

type Address {
  address: [4]string
  city: string
  state: string
  country: string
}

The names of fields and user-defined types are informational: they are not represented in BARE messages, but they may be used for code generation or to provide meaningful names for readers of the schema.

Enum values are also informational. Values without an assigned integer are assigned automatically in the order that they appear, starting from zero and incrementing for each subsequent unassigned value. If an enum value is explicitly specified, automatic assignment continues from that value plus one for subsequent enum values.

Union type members are assigned a tag in the order that they appear, starting from zero and incrementing for each subsequent type. If a tag value is explicitly specified, automatic assignment continues from that value plus one for subsequent values.

Compatibility between schema upgrades

This section is informative.

The recommended approach for message versioning is with the use of union types. Adding new types to a union is backwards compatible with previous messages. For example, the following schema provides several versions of a message:

type Message (MessageV1 | MessageV2 | MessageV3)

type MessageV1 {
    ...
}

type MessageV2 {
    ...
}

type MessageV3 {
    ...
}

An updated schema which added a MessageV4 would still be able to decode versions 1, 2, and 3. However, you must make the decision to use versioning in advance. Replacing a struct type with a union type that contains the same struct is NOT backwards compatible.

If you later decide to deprecate MessageV1, you may remove it and specify the initial tag explicitly:

type Message (MessageV2 = 1 | MessageV3)

type MessageV2 {
    ...
}

type MessageV3 {
    ...
}

Security Considerations

Implementations must take care when decoding types with an unbounded length (e.g. []int, map, data), as a malicious message can be created with an excessive length and cause a naive implementation to enable denial-of-service attacks, failed allocations, or other security faults.

License

This specification text is licensed with CC-BY-SA.