omnigraph/docs/user/schema-language.md
aaltshuler 5e7b1aad78 docs(schema): document enum migration (widen/narrow/String↔enum)
Add an "Enum evolution" section to schema-language.md covering the four
supported shapes and their tiers, plus the unsupported cases (non-String
scalar change, interface enums, in-place variant rename). Record the new
ChangeEnumConstraint migration step. Add OG-MF-105 / OG-MF-107 to the
schema-lint code table and clarify OG-MF-106 as a genuine scalar change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 23:56:51 +01:00

6.3 KiB

Schema Language (.pg)

Pest grammar at crates/omnigraph-compiler/src/schema/schema.pest. AST at schema/ast.rs. Catalog at catalog/mod.rs.

Top-level declarations

  • interface <Name> { property* } — reusable property contracts.
  • node <Name> [implements <Iface>, ...] { property* | constraint* }
  • edge <Name>: <FromType> -> <ToType> [@card(min..max)] { property* | constraint* }
  • Comments: line // and block /* … */.

Property declarations

<ident>: <TypeRef> [annotation*]

Built-in scalar types

Scalar Arrow type
String Utf8
Blob LargeBinary
Bool Boolean
I32 / I64 Int32 / Int64
U32 / U64 UInt32 / UInt64
F32 / F64 Float32 / Float64
Date Date32
DateTime Date64
Vector(<dim>) FixedSizeList(Float32, dim), 1 ≤ dim ≤ i32::MAX
[<scalar>] List(scalar)
enum(v1, v2, …) Utf8 with sorted/dedup'd set of allowed string values
<scalar>? Same as scalar but nullable: true

Constraints (body level)

Constraint On Effect
@key(p, …) node Primary key; implies index on key columns; key_property() returns the first key
@unique(p, …) node, edge Uniqueness across listed columns
@index(p, …) node, edge Build a scalar (BTREE) index on the columns
@range(p, min..max) node Numeric range validation (open ranges allowed)
@check(p, "regex") node Regex pattern validation
@card(min..max?) edge Edge multiplicity — default 0..*; 0..1, 1..1, 1..*, etc.

Edge bodies only allow @unique and @index.

Annotations

  • @<ident> or @<ident>(<literal>) on any declaration or property.
  • Known annotations:
    • @embed on a Vector property — names the source property whose text gets embedded into this vector at ingest (embed_sources map in NodeType).
    • @description("…"), @instruction("…") on query declarations (carried through to clients).
  • Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.

Catalog construction

  • Pass 0: collect interfaces.
  • Pass 1: collect nodes, expand implements, build constraint and @embed mappings, build the Arrow schema for each node table (id: Utf8 plus all properties; blob columns get LargeBinary).
  • Pass 2: collect edges, validate that from_type / to_type exist, normalize edge names case-insensitively for lookup, validate constraints for edges. Edge Arrow schema: id: Utf8, src: Utf8, dst: Utf8 plus edge properties.

Schema IR & stable type IDs

  • SCHEMA_IR_VERSION = 1 (catalog/schema_ir.rs).
  • Each interface/node/edge currently gets a stable_type_id from a kind+name hash.
  • Rename-preserving accepted IDs are an architectural invariant, but the current hash-on-name implementation is a known gap until migration carries IDs across @rename_from.
  • Serialized as JSON for diff/migration plans.

Schema migration planning

plan_schema_migration(accepted, desired) -> SchemaMigrationPlan { supported, steps[] } with step types:

  • AddType { type_kind, name }
  • RenameType { type_kind, from, to }
  • AddProperty { type_kind, type_name, property_name, property_type }
  • RenameProperty { type_kind, type_name, from, to }
  • AddConstraint { type_kind, type_name, constraint }
  • UpdateTypeMetadata { … annotations }
  • UpdatePropertyMetadata { … annotations }
  • ChangeEnumConstraint { type_kind, type_name, property_name, to_property_type, code } — evolve an enum-typed property's value-set (see below)
  • UnsupportedChange { entity, reason } (forces supported=false)

apply_schema() returns SchemaApplyResult { supported, applied, manifest_version, steps } and is gated by an internal __schema_apply_lock__ system branch so concurrent schema applies serialize.

Enum evolution

Enums are stored physically as Utf8; the allowed value-set lives only in the schema, not in the column. So enum migrations change catalog metadata, never the data — no table rewrite, and the manifest version does not advance. Four shapes are supported on node and edge properties (interface enum changes are not supported in v1):

Change Example Tier Behavior
Widen (add variants) enum(open, closed)enum(open, closed, archived) Safe Metadata-only; applies unconditionally. No existing row can be invalid.
enumString (loosen) enum(open, closed)String Safe Metadata-only; every enum value is a valid String.
Narrow (remove variants) enum(open, closed, archived)enum(open, closed) Validated (OG-MF-105) Apply scans existing rows; if any holds a removed value it aborts before publish, naming the offending value. No data is dropped — fix or migrate the rows, then re-apply.
Stringenum (constrain) Stringenum(open, closed) Validated (OG-MF-107) Apply scans existing rows; aborts on the first out-of-set value.

Reordering variants is a no-op (the value-set is sorted + deduped, so enum(b, a) and enum(a, b) are identical). Changing an enum to a non-String scalar (e.g. enum(...)I32), or changing nullability/list-ness alongside the value-set, is a genuine type change and stays UnsupportedChange (OG-MF-106). Renaming a variant in place (a value remap, e.g. closeddone) is not yet supported — model it as add-then-narrow with a data migration in between.

Destructive drops — --allow-data-loss

DropProperty and DropType steps default to Soft mode: the catalog tombstones the entry but the prior column / dataset remains time-travel-reachable via snapshot_at_version(prev) until omnigraph cleanup runs. Soft drops are reversible.

Pass --allow-data-loss (CLI) or allow_data_loss: true (HTTP POST /schema/apply body, SDK SchemaApplyOptions) to promote every drop in the plan to Hard mode. Hard drops run cleanup_old_versions on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. Irreversible.

The flag is honored uniformly across transports — omnigraph schema apply --allow-data-loss, POST /schema/apply { schema_source, allow_data_loss: true }, and apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true }) produce identical plans and identical effects.