omnigraph/docs/user/schema/index.md
Andrew Altshuler 77dffdae92
docs(user): de-dev polish — strip internal scaffolding from user docs (Phase 3a) (#226)
Remove developer-only scaffolding that leaked into the public user/operator
docs, while preserving every user-facing behavior, command, flag, endpoint,
constant, and env var. No behavior changes.

Removed across 18 files:
- internal ticket / sequencing refs (MR-NNN, RFC-NNN, "Phase N");
- source-code paths (crates/**/*.rs, *.pest) and internal struct/function
  dumps (e.g. the QueryIR / GraphCommit / SchemaMigrationPlan Rust types,
  internal fn names like fork_branch_from_state, optimize_all_tables);
- Lance-internal blocker prose (upstream issue numbers, blob-decode cause,
  sidecar Phase-B/C mechanics) — keeping the user-visible behavior (e.g.
  "optimize skips Blob-column tables; reads/writes unaffected");
- pre-v0.4.0 Run-state-machine archaeology.

Internal IR/lowering/recovery-internals sections were either trimmed to a
brief user-facing note (e.g. "Traversal execution", "interrupted writes
recover automatically; recovery commits are recorded under actor
omnigraph:recovery") or removed.

Kept: all language syntax, lint codes, Cedar actions/scopes, endpoints,
error taxonomy, every constant and env var (verified none dropped from the
constants cheat-sheet), and the operator-facing explanations of on-disk
artifacts. Residual "legacy" mentions are all user-facing (the deprecated
omnigraph.yaml, the legacy token chain, old command names).

Verified: zero internal-scaffolding leaks (MR/RFC/Phase/.rs/.pest = 0) across
docs/user; zero broken links; check-agents-md.sh green.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:39:25 +03:00

3.8 KiB

Schema Language (.pg)

Top-level declarations

  • interface <Name> { property* } — reusable property contracts.
  • node <Name> [implements <Iface>, ...] { property* | constraint* }
  • edge <Name>: <FromType> -> <ToType> [@card(min..max)] { property* | constraint* }
  • Comments: line // and block /* … */.

Property declarations

<ident>: <TypeRef> [annotation*]

Built-in scalar types

Scalar Arrow type
String Utf8
Blob LargeBinary
Bool Boolean
I32 / I64 Int32 / Int64
U32 / U64 UInt32 / UInt64
F32 / F64 Float32 / Float64
Date Date32
DateTime Date64
Vector(<dim>) FixedSizeList(Float32, dim), 1 ≤ dim ≤ i32::MAX
[<scalar>] List(scalar)
enum(v1, v2, …) Utf8 with sorted/dedup'd set of allowed string values
<scalar>? Same as scalar but nullable: true

Constraints (body level)

Constraint On Effect
@key(p, …) node Primary key; implies index on key columns; key_property() returns the first key
@unique(p, …) node, edge Uniqueness across listed columns
@index(p, …) node, edge Build a scalar (BTREE) index on the columns
@range(p, min..max) node Numeric range validation (open ranges allowed)
@check(p, "regex") node Regex pattern validation
@card(min..max?) edge Edge multiplicity — default 0..*; 0..1, 1..1, 1..*, etc.

Edge bodies only allow @unique and @index.

Annotations

  • @<ident> or @<ident>(<literal>) on any declaration or property.
  • Known annotations:
    • @embed on a Vector property — names the source property whose text gets embedded into this vector at ingest.
    • @description("…"), @instruction("…") on query declarations (carried through to clients).
  • Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.

Table layout

  • Each node type compiles to a table with an id: Utf8 column plus all declared properties (blob columns are stored as LargeBinary); implements clauses expand the interface's properties into the node.
  • Each edge type compiles to a table with id: Utf8, src: Utf8, dst: Utf8 plus the edge's own properties. Edge endpoint types (from/to) must exist, and edge names are matched case-insensitively.

Schema migration planning

A migration plan compares the accepted schema against the desired one and reports whether the change is supported plus the ordered steps it requires:

  • Add a type
  • Rename a type
  • Add a property
  • Rename a property
  • Add a constraint
  • Update type or property metadata (annotations)
  • Unsupported change (reports the entity and reason; forces the plan to unsupported)

Applying a plan reports whether it was supported, the steps applied, and the resulting manifest version. Concurrent schema applies serialize so they can't interleave.

Destructive drops — --allow-data-loss

DropProperty and DropType steps default to Soft mode: the catalog tombstones the entry but the prior column / dataset remains time-travel-reachable via snapshot_at_version(prev) until omnigraph cleanup runs. Soft drops are reversible.

Pass --allow-data-loss (CLI) or allow_data_loss: true (HTTP POST /schema/apply body, SDK SchemaApplyOptions) to promote every drop in the plan to Hard mode. Hard drops run cleanup_old_versions on the affected dataset immediately after the manifest publish, making the prior column / dataset unreachable. Irreversible.

The flag is honored uniformly across transports — omnigraph schema apply --allow-data-loss, POST /schema/apply { schema_source, allow_data_loss: true }, and apply_schema_with_options(.., SchemaApplyOptions { allow_data_loss: true }) produce identical plans and identical effects.