Overview
The STEF Schema Definition Language (SDL) is used to define schemas for STEF serialization. It provides a simple, type-safe way to describe data structures that can be efficiently serialized and deserialized using the STEF format.
Package Declaration
Every STEF schema file begins with a package declaration:
package com.example.myschema
Package names use dot notation and can have one or more dot-delimited components.
Language-Specific Package Handling
Different target languages handle package names differently when generating code:
- Go: Uses only the last component of the package name. For example,
com.example.myschema
becomes packagemyschema
in Go. - Java: Uses the full package name hierarchy. For example,
com.example.myschema
becomes packagecom.example.myschema
in Java.
Comments
STEF SDL supports C-style single-line comments:
// This is a comment
package com.example // Comments can appear at end of lines
Primitive Types
STEF SDL supports the following primitive data types:
bool
- Boolean values (true/false)int64
- 64-bit signed integeruint64
- 64-bit unsigned integerfloat64
- 64-bit floating point numberstring
- UTF-8 encoded stringbytes
- Binary data
Structs
Structs define composite data types with named fields:
struct Person {
Name string
Age uint64
Email string
}
Root Structs
The root
attribute marks a struct as the top-level record type in a STEF stream:
struct Record root {
Timestamp uint64
Data Person
}
Multiple structs can be marked as root
in a single schema, allowing the STEF stream to contain different types of records:
struct MetricRecord root {
Timestamp uint64
Metric Metric
}
struct TraceRecord root {
Timestamp uint64
Span Span
}
When multiple root structs are defined, each record in the stream will be one of the root types, and the STEF format includes type information to distinguish between them during deserialization.
Dictionary Compression
Fields can use dictionary compression for repeated values using the dict
modifier:
struct Event {
EventType string dict(EventTypes)
Message string
}
Structs can also have dictionary compression applied:
struct Resource dict(Resources) {
Name string
Version string
}
Dictionary names allow the same dictionary to be shared across multiple fields, even in different structs, as long as the fields have the same type:
struct MetricEvent {
ServiceName string dict(ServiceNames)
EventType string dict(EventTypes)
}
struct TraceEvent {
ServiceName string dict(ServiceNames) // Same dictionary as above
SpanName string dict(SpanNames)
}
This sharing enables more efficient compression when the same values appear across different record types.
Optional Fields
Fields can be marked as optional, meaning they may not be present in every record:
struct User {
Name string
Email string optional
Phone string optional
}
Arrays
Array types are denoted with square brackets and can contain zero or more elements of the specified type:
struct Container {
Items []string
Numbers []int64
Objects []Person
}
Arrays are variable-length - they can be empty or contain any number of elements.
Oneofs (Union Types)
Oneofs define union types that can hold one of several possible field types:
oneof JsonValue {
String string
Number float64
Bool bool
Array []JsonValue
Object JsonObject
}
A oneof may also be empty, i.e. contain none of the listed values.
Multimaps
Multimaps define key-value collections:
multimap Attributes {
key string
value AnyValue
}
Multimaps can also use dictionary compression:
multimap Labels {
key string dict(LabelKeys)
value string dict(LabelValues)
}
Enums
Enums define named constant values:
enum MetricType {
Gauge = 0
Counter = 1
Histogram = 2
Summary = 3
}
Enum values must be explicitly assigned unsigned integer values. Multiple number formats are supported:
- Decimal:
MetricType = 42
- Hexadecimal:
MetricType = 0x2A
orMetricType = 0X2A
- Octal:
MetricType = 0o52
orMetricType = 0O52
- Binary:
MetricType = 0b101010
orMetricType = 0B101010
enum StatusCode {
OK = 0
NotFound = 0x194 // 404 in hexadecimal
InternalError = 0o770 // 500 in octal
Custom = 0b1111101000 // 1000 in binary
}
Complete Example
Here's a comprehensive example showing various STEF SDL features:
package com.example.monitoring
// Enum for metric types
enum MetricType {
Gauge = 0
Counter = 1
Histogram = 2
}
// Key-value attributes
multimap Attributes {
key string dict(AttributeKeys)
value AttributeValue
}
// Union type for attribute values
oneof AttributeValue {
StringValue string
IntValue int64
FloatValue float64
BoolValue bool
}
// Resource information with dictionary compression
struct Resource dict(Resources) {
ServiceName string dict(ServiceNames)
ServiceVersion string dict(ServiceVersions)
Attributes Attributes
}
// Metric data point
struct DataPoint {
Timestamp uint64
Value float64
Attributes Attributes
}
// Main metric structure
struct Metric {
Name string dict(MetricNames)
Type MetricType
Unit string dict(Units)
Description string optional
DataPoints []DataPoint
}
// Root record type
struct MetricRecord root {
Resource Resource
Metric Metric
}
Type References
STEF SDL supports forward references - you can reference types before they are defined in the file. The parser resolves all type references after parsing the complete schema.
Recursive Type Declarations
STEF SDL allows recursive type declarations, enabling the definition of tree-like data structures.
Self-Referential Types
A type can reference itself, useful for creating tree structures:
// Binary tree node
struct TreeNode {
Value int64
Left TreeNode optional
Right TreeNode optional
}
Mutually Referential Types
Multiple types can reference each other, creating more complex recursive relationships:
// Expression tree with operators and operands
struct Expression {
Node ExpressionNode
}
oneof ExpressionNode {
Literal LiteralValue
BinaryOp BinaryOperation
UnaryOp UnaryOperation
}
struct LiteralValue {
Value float64
}
struct BinaryOperation {
Operator string
Left Expression // References back to Expression
Right Expression // References back to Expression
}
struct UnaryOperation {
Operator string
Operand Expression // References back to Expression
}
These recursive patterns are resolved correctly by the STEF parser and enable rich data modeling capabilities.
Syntax Rules
- Identifiers must start with a letter and can contain letters, digits, and underscores
- Keywords are case-sensitive
- Struct, oneof, multimap, and enum names must be unique within a schema
- Field names must be unique within their containing struct/oneof/multimap
- Enum values must be unique within their enum
- Whitespace and comments are ignored during parsing
Generated Code
Use the stefgen
tool to generate serialization code from your STEF schema:
stefgen --lang=go myschema.stef
This generates efficient serializers and deserializers in your target language.