omnigraph/docs/schema-language.md

80 lines
3.5 KiB
Markdown
Raw Normal View History

# Schema Language (`.pg`)
Pest grammar at `crates/omnigraph-compiler/src/schema/schema.pest`. AST at `schema/ast.rs`. Catalog at `catalog/mod.rs`.
## Top-level declarations
- `interface <Name> { property* }` — reusable property contracts.
- `node <Name> [implements <Iface>, ...] { property* | constraint* }`
- `edge <Name>: <FromType> -> <ToType> [@card(min..max)] { property* | constraint* }`
- Comments: line `//` and block `/* … */`.
## Property declarations
`<ident>: <TypeRef> [annotation*]`
## Built-in scalar types
| Scalar | Arrow type |
|---|---|
| `String` | Utf8 |
| `Blob` | LargeBinary |
| `Bool` | Boolean |
| `I32` / `I64` | Int32 / Int64 |
| `U32` / `U64` | UInt32 / UInt64 |
| `F32` / `F64` | Float32 / Float64 |
| `Date` | Date32 |
| `DateTime` | Date64 |
| `Vector(<dim>)` | FixedSizeList(Float32, dim), `1 ≤ dim ≤ i32::MAX` |
| `[<scalar>]` | List(scalar) |
| `enum(v1, v2, …)` | Utf8 with sorted/dedup'd set of allowed string values |
| `<scalar>?` | Same as scalar but `nullable: true` |
## Constraints (body level)
| Constraint | On | Effect |
|---|---|---|
| `@key(p, …)` | node | Primary key; implies index on key columns; `key_property()` returns the first key |
| `@unique(p, …)` | node, edge | Uniqueness across listed columns |
| `@index(p, …)` | node, edge | Build a scalar (BTREE) index on the columns |
| `@range(p, min..max)` | node | Numeric range validation (open ranges allowed) |
| `@check(p, "regex")` | node | Regex pattern validation |
| `@card(min..max?)` | edge | Edge multiplicity — default `0..*`; `0..1`, `1..1`, `1..*`, etc. |
Edge bodies only allow `@unique` and `@index`.
## Annotations
- `@<ident>` or `@<ident>(<literal>)` on any declaration or property.
- Known annotations:
- `@embed` on a Vector property — names the *source* property whose text gets embedded into this vector at ingest (`embed_sources` map in NodeType).
- `@description("…")`, `@instruction("…")` on query declarations (carried through to clients).
- Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.
## Catalog construction
- Pass 0: collect interfaces.
- Pass 1: collect nodes, expand `implements`, build constraint and `@embed` mappings, build the Arrow schema for each node table (`id: Utf8` plus all properties; blob columns get `LargeBinary`).
- Pass 2: collect edges, validate that `from_type` / `to_type` exist, normalize edge names case-insensitively for lookup, validate constraints for edges. Edge Arrow schema: `id: Utf8, src: Utf8, dst: Utf8` plus edge properties.
## Schema IR & stable type IDs
Address reviewer feedback (Cursor + cubic) on PR #60 All eight comments verified against source and applied: - AGENTS.md: pull @docs/{invariants,lance,testing}.md imports out of the markdown blockquote. Claude Code's @-import parser expects @ at column 0; the leading "> " of a blockquote silently broke recognition, so the claimed auto-include did nothing. (Cursor, Medium severity.) - docs/cli-reference.md: command-family count 13 → 17. The current enum Command in crates/omnigraph-cli/src/main.rs has 17 top-level variants. (cubic P2.) - docs/ci.md: Homebrew tap update is a regular `git push`, not a force-push (release.yml:117 is `git push origin HEAD:main`). (cubic P2.) - docs/errors.md: add the Storage variant to the NanoError list — it exists at error.rs:88-89 but the doc enumerated only 10 of 11. (cubic P2.) - docs/storage.md: clarify tombstone semantics. There is no tombstone_version column; state.rs:180 reads the tombstone version from the table_version column on rows where object_type = table_tombstone. (cubic P2.) - docs/branches-commits.md: split the GraphCommit pseudo-struct from the underlying storage. actor_id is joined in-memory from _graph_commit_actors.lance, not a column on _graph_commits.lance. (cubic P2.) - docs/schema-language.md: rename IR_VERSION to SCHEMA_IR_VERSION to match the actual constant name in catalog/schema_ir.rs:11. (cubic P3.) - docs/testing.md: engine integration test count 16 → 15 (matches `ls crates/omnigraph/tests/*.rs`). (cubic P3.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:09:06 +02:00
- `SCHEMA_IR_VERSION = 1` (`catalog/schema_ir.rs`).
- Each interface/node/edge gets a `stable_type_id` (kind+name hashed) so renames can be tracked.
- Serialized as JSON for diff/migration plans.
## Schema migration planning
`plan_schema_migration(accepted, desired) -> SchemaMigrationPlan { supported, steps[] }` with step types:
- `AddType { type_kind, name }`
- `RenameType { type_kind, from, to }`
- `AddProperty { type_kind, type_name, property_name, property_type }`
- `RenameProperty { type_kind, type_name, from, to }`
- `AddConstraint { type_kind, type_name, constraint }`
- `UpdateTypeMetadata { … annotations }`
- `UpdatePropertyMetadata { … annotations }`
- `UnsupportedChange { entity, reason }` (forces `supported=false`)
`apply_schema()` returns `SchemaApplyResult { supported, applied, manifest_version, steps }` and is gated by an internal `__schema_apply_lock__` system branch so concurrent schema applies serialize.