docs: split user and developer docs (#93)

This commit is contained in:
Andrew Altshuler 2026-05-15 03:45:22 +03:00 committed by GitHub
parent e8d49559c4
commit 60eee78465
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
39 changed files with 499 additions and 445 deletions

7
docs/user/audit.md Normal file
View file

@ -0,0 +1,7 @@
# Audit / Actor tracking
- `Omnigraph::audit_actor_id: Option<String>` is the actor in effect.
- `_as` variants of every write API let callers override the actor: `mutate_as`, `ingest_as`, `branch_merge_as`, `apply_schema_as`, etc.
- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map).
- HTTP server uses the bearer-token actor automatically; CLI uses the local user / explicit env (no implicit actor).
- Pre-v0.4.0 repos also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0 and reclaimed by MR-770's production sweep.

View file

@ -0,0 +1,63 @@
# Branches, Commits, Snapshots
## L1 — Lance per-dataset branches
Lance supports branching at the dataset level: a branch is a named lineage of versions, and `fork_branch_from_state(source_branch, target_branch, source_version)` creates a copy-on-write fork.
## L2 — Graph-level branches
OmniGraph builds *graph branches* on top by branching every sub-table coherently:
- `branch_create(name)` / `branch_create_from(target, name)` — disallowed name `main`; fails if branch exists; ensures the schema-apply lock is idle.
- `branch_list()` — returns public branches, **filters internal** `__run__…` and `__schema_apply_lock__` prefixes.
- `branch_delete(name)` — refuses if there are descendants or active runs on the branch; cleans up owned per-branch fragments.
- **Lazy forking**: a branch only forks a sub-table when that sub-table is first mutated on it. Pure-read branches share fragments with their source.
- `sync_branch(branch)` — re-binds the in-memory handle to the latest head of the branch.
## L2 — Commit graph (`db/commit_graph.rs`)
In-memory shape of a graph commit:
```
GraphCommit {
graph_commit_id: ULID,
manifest_branch: Option<String>,
manifest_version: u64,
parent_commit_id: Option<String>,
merged_parent_commit_id: Option<String>, // populated for merge commits
actor_id: Option<String>, // joined in-memory from _graph_commit_actors.lance, NOT a column on _graph_commits.lance
created_at: i64 (microseconds since epoch),
}
```
Storage is split across two Lance datasets (both with stable row IDs):
- `_graph_commits.lance` — every column above *except* `actor_id`.
- `_graph_commit_actors.lance` — optional separate `(graph_commit_id, actor_id)` map, created on demand. The `actor_id` field above is populated by joining this dataset in-memory at load time.
Notes:
- Every successful publish (load / change / merge / schema_apply) appends one commit.
- Merge commits have two parents; linear commits have one.
- API: `list_commits(branch)`, `get_commit(id)`, `head_commit_id_for_branch(branch)`.
## L2 — Snapshots & time travel
- `snapshot()` — current snapshot for the bound branch; cached.
- `snapshot_of(target)` — snapshot at a `ReadTarget` (branch | snapshot id).
- `snapshot_at_version(v: u64)` — historical snapshot from any manifest version.
- `entity_at(table_key, id, version)` — single-entity time travel without building a full snapshot.
- A `Snapshot` is a `(version, HashMap<table_key, SubTableEntry>)` — cheap to build, snapshot-isolated cross-table reads.
## L2 — Internal system branches
Filtered from `branch_list()` but visible to internals:
- `__schema_apply_lock__` — serializes schema migrations.
- `__run__<run-id>` — legacy from the pre-v0.4.0 Run state machine (removed in MR-771). The branch-name guard predicate `is_internal_run_branch` is kept as defense-in-depth so users cannot create a branch matching the legacy prefix; the filter will be removed once production legacy branches are swept (MR-770).
## L2 — Recovery audit trail
The four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) protect their multi-table commits with a sidecar at `__recovery/{ulid}.json` written before Phase B and deleted after Phase C. The next `Omnigraph::open` (gated on `OpenMode::ReadWrite`) runs the recovery sweep in `crates/omnigraph/src/db/manifest/recovery.rs`: classify per-table state, decide all-or-nothing per sidecar, roll forward / back, record an audit row.
Audit rows live in `_graph_commit_recoveries.lance` (sibling to `_graph_commits.lance`) and reference the commit graph by `graph_commit_id`. The linked recovery commit is identified by that same `graph_commit_id`, and `actor_id="omnigraph:recovery"` is stored in `_graph_commit_actors.lance` (joined by `graph_commit_id`) — `_graph_commits.lance` itself does not carry the `actor_id` column. To find recoveries for a specific original actor: `omnigraph commit list --filter actor=omnigraph:recovery`, then join to `_graph_commit_recoveries.lance` by `graph_commit_id` to read `recovery_for_actor`. Schema: see `crates/omnigraph/src/db/recovery_audit.rs`.

24
docs/user/changes.md Normal file
View file

@ -0,0 +1,24 @@
# Change Detection / Diff
`changes/mod.rs`. Three-level algorithm:
1. **Manifest diff**: skip sub-tables whose `(table_version, table_branch)` is unchanged.
2. **Lineage check**:
- Same branch lineage → fast path: use the per-row `_row_last_updated_at_version` column to classify Insert/Update/Delete.
- Different lineages → ID-based streaming comparison.
3. **Row-level diff**: streaming, no full materialization.
## Public API
- `diff_between(from: ReadTarget, to: ReadTarget, filter: Option<ChangeFilter>) -> ChangeSet`
- `diff_commits(from_commit_id, to_commit_id, filter)` — cross-branch safe.
## Types
```
ChangeOp: Insert | Update | Delete
EntityKind: Node | Edge
EntityChange { table_key, kind, type_name, id, op, manifest_version, endpoints?: {src, dst} }
ChangeFilter { kinds?, type_names?, ops? }
ChangeSet { from_version, to_version, branch?, changes[], stats }
```

View file

@ -0,0 +1,83 @@
# CLI Reference (`omnigraph`)
A reference for the `omnigraph` binary's command surface and `omnigraph.yaml` schema. For a quick-start guide, see [cli.md](cli.md).
17 top-level command families, 40+ subcommands. All commands accept either a positional `URI`, `--uri`, or a `--target <name>` resolved against `omnigraph.yaml`.
## Top-level commands
| Command | Purpose |
|---|---|
| `init` | `--schema <pg>` → initialize a repo (also scaffolds `omnigraph.yaml` if missing) |
| `load` | bulk load a branch (`--mode overwrite\|append\|merge`) |
| `ingest` | branch-creating transactional load (`--from <base>`) |
| `read` | run named query (params via `--params`, `--params-file`, or alias args) |
| `change` | run mutation query |
| `snapshot` | print current snapshot (per-table version + row count) |
| `export` | dump to JSONL on stdout (`--type T`, `--table K` filters) |
| `branch create \| list \| delete \| merge` | branching ops |
| `commit list \| show` | inspect commit graph |
| `run list \| show \| publish \| abort` | transactional run ops |
| `schema plan \| apply \| show (alias: get)` | migrations |
| `query lint \| check` | offline / repo-backed validation |
| `optimize` | non-destructive Lance compaction |
| `cleanup --keep N --older-than 7d --confirm` | destructive version GC |
| `embed` | offline JSONL embedding pipeline |
| `policy validate \| test \| explain` | Cedar tooling |
| `version` / `-v` | print `omnigraph 0.3.x` |
## `omnigraph.yaml` schema
```yaml
project: { name }
graphs:
<name>:
uri: <local|s3://|http(s)://>
bearer_token_env: <ENV_NAME>
server:
graph: <name>
bind: <ip:port>
cli:
graph: <name>
branch: <name>
output_format: json|jsonl|csv|kv|table
table_max_column_width: 80
table_cell_layout: truncate|wrap
query:
roots: [<dir>, …] # search path for .gq files
auth:
env_file: ./.env.omni
aliases:
<alias>:
command: read|change
query: <path-to-.gq>
name: <query-name>
args: [<positional-name>, …]
graph: <name>
branch: <name>
format: <output-format>
policy:
file: ./policy.yaml
```
## Output formats (read command)
- `json` — pretty-printed object with metadata + rows
- `jsonl` — one metadata line then one JSON object per row
- `csv` — RFC 4180-ish quoting
- `table` — fitted text table, honors `table_max_column_width` + `table_cell_layout`
- `kv` — grouped per-row key/value blocks
## Param resolution
Precedence (high to low): explicit `--params` / `--params-file`, alias positional args, `omnigraph.yaml` defaults. JS-safe-integer handling is built in (`is_js_safe_integer_i64`, `JS_MAX_SAFE_INTEGER_U64`) so 64-bit ids round-trip safely through JSON clients.
## Bearer token resolution (CLI)
1. `graphs.<name>.bearer_token_env`
2. `OMNIGRAPH_BEARER_TOKEN` global env
3. `auth.env_file` referenced `.env`
## Duration parsing (cleanup)
`s | m | h | d | w` units, e.g. `--older-than 7d`.

100
docs/user/cli.md Normal file
View file

@ -0,0 +1,100 @@
# CLI Guide
## Core Repo Flow
```bash
omnigraph init --schema ./schema.pg ./repo.omni
omnigraph load --data ./data.jsonl --mode overwrite ./repo.omni
omnigraph snapshot ./repo.omni --branch main --json
omnigraph read --uri ./repo.omni --query ./queries.gq --name get_person --params '{"name":"Alice"}'
omnigraph change --uri ./repo.omni --query ./queries.gq --name insert_person --params '{"name":"Mina","age":28}'
```
## Branching And Reviewable Data Flows
```bash
omnigraph branch create --uri ./repo.omni --from main feature-x
omnigraph branch list --uri ./repo.omni
omnigraph branch merge --uri ./repo.omni feature-x --into main
omnigraph ingest --data ./batch.jsonl --branch review/import-2026-04-09 ./repo.omni
omnigraph export ./repo.omni --branch main --type Person > people.jsonl
omnigraph commit list ./repo.omni --branch main --json
omnigraph commit show --uri ./repo.omni <commit-id> --json
```
## Remote Server Mode
Serve a repo:
```bash
omnigraph-server ./repo.omni --bind 127.0.0.1:8080
```
Read through the HTTP API:
```bash
omnigraph read \
--target http://127.0.0.1:8080 \
--query ./queries.gq \
--name get_person \
--params '{"name":"Alice"}'
```
If the server requires auth, set `OMNIGRAPH_SERVER_BEARER_TOKEN` on the server
and configure the matching `bearer_token_env` in `omnigraph.yaml`.
## Runs, Policy, And Diagnostics
```bash
omnigraph query lint --query ./queries.gq --schema ./schema.pg --json
omnigraph query check --query ./queries.gq ./repo.omni --json
omnigraph schema plan --schema ./next.pg ./repo.omni --json
omnigraph schema apply --schema ./next.pg ./repo.omni --json
omnigraph policy validate --config ./omnigraph.yaml
omnigraph policy test --config ./omnigraph.yaml
omnigraph policy explain --config ./omnigraph.yaml --actor act-alice --action read --branch main
omnigraph commit list ./repo.omni --json
omnigraph commit show --uri ./repo.omni <commit-id> --json
```
(The legacy `omnigraph run list/show/publish/abort` subcommands were removed in MR-771; mutations and loads publish atomically and the commit graph (`omnigraph commit list`) is the audit surface.)
`query lint` and `query check` are the same command surface. In v1, repo-backed
lint uses local or `s3://` repo URIs; HTTP targets are only supported when you
also pass `--schema`.
## Config
`omnigraph.yaml` lets the CLI and server share named graphs, defaults, and
query roots:
```yaml
graphs:
local:
uri: ./demo.omni
dev:
uri: http://127.0.0.1:8080
bearer_token_env: OMNIGRAPH_BEARER_TOKEN
cli:
graph: local
branch: main
query:
roots:
- queries
- .
```
The config file can also define:
- server bind defaults
- auth env files
- query aliases for common read and change commands
- `policy.file` for Cedar authorization rules
When policy is enabled, `schema apply` is authorized through the
`schema_apply` action and is typically limited to admins on protected `main`.

22
docs/user/constants.md Normal file
View file

@ -0,0 +1,22 @@
# Constants & Tunables (cheat sheet)
| Name | Value | Where |
|---|---|---|
| `MANIFEST_DIR` | `__manifest` | `db/manifest/layout.rs` |
| Commit graph dir | `_graph_commits.lance` | `db/commit_graph.rs` |
| Run registry dir (legacy, removed MR-771) | `_graph_runs.lance` | inert post-v0.4.0; reclaimed by MR-770 |
| Run branch prefix (legacy, removed MR-771) | `__run__` | filtered by `is_internal_run_branch` defense-in-depth |
| Schema apply lock | `__schema_apply_lock__` | `db/mod.rs` |
| Manifest publisher retry budget | `PUBLISHER_RETRY_BUDGET = 5` | `db/manifest/publisher.rs` |
| Internal manifest schema version | `INTERNAL_MANIFEST_SCHEMA_VERSION = 2` | `db/manifest/migrations.rs` |
| Merge stage batch | `MERGE_STAGE_BATCH_ROWS = 8192` | `exec/merge.rs` |
| Maintenance concurrency | `OMNIGRAPH_MAINTENANCE_CONCURRENCY=8` | `db/omnigraph/optimize.rs` |
| Graph index cache size | `8` (LRU) | `runtime_cache.rs` |
| Default body limit | `1 MB` | `omnigraph-server/lib.rs` |
| Ingest body limit | `32 MB` | `omnigraph-server/lib.rs` |
| Engine embed model | `gemini-embedding-2-preview` | `omnigraph/embedding.rs` |
| Compiler embed model | `text-embedding-3-small` | `omnigraph-compiler/embedding.rs` |
| Embed timeout | `30 000 ms` | both clients |
| Embed retries | `4` | both clients |
| Embed retry backoff | `200 ms` | both clients |
| LANCE memory pool default | `1 GB` (raised in v0.3.0) | runtime |

184
docs/user/deployment.md Normal file
View file

@ -0,0 +1,184 @@
# Deployment
This doc describes the public runtime contract for self-hosting Omnigraph. It
does not include environment-specific secrets, private infrastructure, or
internal deploy automation.
## Runtime Modes
Omnigraph supports two broad deployment shapes:
- local directory repos
- `s3://` repos on AWS S3 or S3-compatible object stores
The server binary and container image expose the same HTTP surface.
## Binary Deployment
Build or install:
- `omnigraph`
- `omnigraph-server`
Run against a local repo:
```bash
omnigraph-server ./repo.omni --bind 0.0.0.0:8080
```
Run against an object-store-backed repo:
```bash
OMNIGRAPH_SERVER_BEARER_TOKEN="change-me" \
AWS_REGION="us-east-1" \
omnigraph-server s3://my-bucket/repos/example/releases/2026-04-10-v0.1.0 \
--bind 0.0.0.0:8080
```
## One-Command Local RustFS Bootstrap
The easiest local S3-backed deployment path is:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash
```
The bootstrap:
- starts a local RustFS-backed object store
- creates a bucket and S3-backed Omnigraph repo
- loads the checked-in context fixture
- starts `omnigraph-server` on `127.0.0.1:8080`
Supported behavior:
- downloads the rolling `edge` binary when one exists for the current platform
- otherwise clones `ModernRelay/omnigraph` and builds from source
- reuses an existing RustFS container if it is already running
Useful overrides:
- `WORKDIR=/path/to/state`
- `BUCKET=omnigraph-local`
- `PREFIX=repos/context`
- `RESET_REPO=1` to delete an existing partially initialized repo prefix before recreating it
- `BIND=127.0.0.1:8080`
- `RUSTFS_CONTAINER_NAME=omnigraph-rustfs-demo`
The bootstrap expects:
- Docker
- `curl`
- either a matching release asset or a local Rust toolchain plus `git`
If `aws` is not installed, the script attempts a user-local AWS CLI install via
`python3 -m pip`. Docker Desktop or another Docker daemon must already be
running.
If a previous bootstrap left objects behind under the selected `PREFIX` but did
not finish initializing the repo, rerun with `RESET_REPO=1` or choose a new
`PREFIX`.
## Container Deployment
Build the image:
```bash
docker build -t omnigraph-server:local .
```
Run against a local repo:
```bash
docker run --rm -p 8080:8080 \
-v "$PWD/repo.omni:/data/repo.omni" \
omnigraph-server:local \
/data/repo.omni --bind 0.0.0.0:8080
```
Run against an S3-backed repo:
```bash
docker run --rm -p 8080:8080 \
-e OMNIGRAPH_SERVER_BEARER_TOKEN="change-me" \
-e AWS_REGION="us-east-1" \
omnigraph-server:local \
s3://my-bucket/repos/example/releases/2026-04-10-v0.1.0 \
--bind 0.0.0.0:8080
```
## Auth
The server can run unauthenticated for local development, but any shared or
internet-facing deployment should set a bearer token source.
### Token sources
The server reads bearer tokens from one of three places, in precedence order:
1. **AWS Secrets Manager** (build with `--features aws`, see below) — set
`OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` to the secret ID or ARN.
2. **JSON file or env** — set one of:
- `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` — path to a JSON `{"actor": "token", ...}` file.
- `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — the JSON literal inline.
3. **Single-token env**`OMNIGRAPH_SERVER_BEARER_TOKEN` (assigns the
implicit actor `default`).
Tokens are hashed with SHA-256 immediately on ingest; plaintext does not
persist in process memory after startup.
The health endpoint `/healthz` remains suitable for load balancer health checks
and is never gated.
## Build Variants
The server binary ships in two flavors:
| Variant | Command | Contents |
|---------|---------|----------|
| **Default** (on-prem / local dev) | `cargo build --release` | Core server, no AWS SDK |
| **AWS** | `cargo build --release --features aws` | Adds AWS Secrets Manager backend for bearer tokens |
Release artifacts are published with matching suffixes —
`omnigraph-server-<version>-<platform>.tar.gz` for the default build and
`omnigraph-server-<version>-<platform>-aws.tar.gz` for the AWS-enabled build.
The AWS build adds ~150 transitive deps and ~30-60s of first-build compile
time. Default builds don't pay that cost.
## AWS Secrets Manager
When the binary is built with `--features aws`, set
`OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` to the ARN or name of a Secrets
Manager secret whose `SecretString` is a JSON object of
`{"actor_id": "token", ...}`:
```bash
omnigraph-server-aws s3://my-bucket/repos/example ...
# Environment:
# OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET=arn:aws:secretsmanager:us-east-1:123456789012:secret:omnigraph-tokens-AbCdEf
```
Credentials are resolved via the AWS default chain (env vars, shared config,
IMDSv2 instance role, ECS task role) — no explicit credential plumbing is
needed when running under an IAM instance role on EC2/ECS/EKS.
The IAM role must permit `secretsmanager:GetSecretValue` on the referenced
secret.
Setting the env var without building with `--features aws` is a hard error
with a rebuild instruction — it does not silently fall back to the env/file
source.
## S3-Compatible Storage
For S3-compatible backends such as RustFS or MinIO, set the usual AWS SDK
environment variables:
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `AWS_REGION`
- optional `AWS_ENDPOINT_URL`
- optional `AWS_ENDPOINT_URL_S3`
- optional `AWS_ALLOW_HTTP=true`
- optional `AWS_S3_FORCE_PATH_STYLE=true`

31
docs/user/embeddings.md Normal file
View file

@ -0,0 +1,31 @@
# Embeddings
OmniGraph has **two** embedding clients with different defaults and purposes.
## Compiler-side client (`omnigraph-compiler/src/embedding.rs`) — query-time normalization
- Default model: `text-embedding-3-small` (OpenAI-style schema)
- Env: `NANOGRAPH_EMBED_MODEL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL` (default `https://api.openai.com/v1`), `NANOGRAPH_EMBEDDINGS_MOCK`, `NANOGRAPH_EMBED_TIMEOUT_MS=30000`, `NANOGRAPH_EMBED_RETRY_ATTEMPTS=4`, `NANOGRAPH_EMBED_RETRY_BACKOFF_MS=200`
- Methods: `embed_text(input, expected_dim)`, `embed_texts(inputs, expected_dim)`
- Mock mode: deterministic FNV-1a + xorshift64 → L2-normalized vectors
## Engine-side client (`omnigraph/src/embedding.rs`) — runtime ingest
- Model: `gemini-embedding-2-preview`
- Env: `GEMINI_API_KEY`, `OMNIGRAPH_GEMINI_BASE_URL` (default Google generativelanguage v1beta), `OMNIGRAPH_EMBED_TIMEOUT_MS=30000`, `OMNIGRAPH_EMBED_RETRY_ATTEMPTS=4`, `OMNIGRAPH_EMBED_RETRY_BACKOFF_MS=200`, `OMNIGRAPH_EMBEDDINGS_MOCK`
- Two task types: `embed_query_text` (RETRIEVAL_QUERY) and `embed_document_text` (RETRIEVAL_DOCUMENT)
- Exponential backoff with retryable detection (timeouts, 429, 5xx)
## Schema integration
Mark a Vector property with `@embed("source_text_property")`. At ingest, the engine pulls the source text and writes the embedding into the vector column. Stored as L2-normalized FixedSizeList(Float32, dim).
## CLI `omnigraph embed` (offline file pipeline)
Operates on **JSONL files** (not on a repo). Three modes (mutually exclusive):
- (default) `fill_missing` — only embed rows whose target field is empty
- `--reembed-all` — overwrite all
- `--clean` — strip embeddings
Inputs are either a single seed manifest YAML or `--input/--output/--spec`. Selectors `--type T`, `--select T:field=value` filter rows. Streams JSONL → JSONL.

24
docs/user/errors.md Normal file
View file

@ -0,0 +1,24 @@
# Errors and Result Serialization
## Error taxonomy (`omnigraph::error::OmniError`)
- `Compiler(...)` — schema/query parse/typecheck errors
- `Lance(String)` — storage layer
- `DataFusion(String)` — execution layer
- `Io(io::Error)`
- `Manifest(ManifestError { kind: BadRequest|NotFound|Conflict|Internal, details: Option<ManifestConflictDetails>, … })`
- `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }` — caller's `expected_table_versions` did not match the manifest's current latest non-tombstoned version (set by `OmniError::manifest_expected_version_mismatch`).
- `ManifestConflictDetails::RowLevelCasContention` — Lance row-level CAS rejected the publish because a concurrent writer landed the same `object_id`. Retried internally by the publisher; only surfaces if the retry budget exhausts.
- **D₂ parse-time rejection** (MR-794): a single mutation query that mixes inserts/updates with deletes errors out *before any I/O* with kind `BadRequest`. Message: `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes`. See [docs/user/query-language.md](query-language.md) for the rule and [docs/dev/runs.md](../dev/runs.md) for the underlying staged-write rationale.
- `MergeConflicts(Vec<MergeConflict>)`
Compiler-side `NanoError` covers parse / catalog / type / storage / plan / execution / arrow / lance / IO / manifest / unique-constraint, each with structured spans (`SourceSpan { start, end }`) for ariadne-style diagnostics.
## Result serialization (`omnigraph_compiler::result::QueryResult`)
- `to_arrow_ipc()` — efficient binary
- `to_sdk_json()` — JS-safe JSON (large i64 wrapped in metadata)
- `to_rust_json()` — Rust-friendly JSON
- `batches()` — direct Arrow `RecordBatch` access
Mutation results: `{ affectedNodes: usize, affectedEdges: usize }` (also exposed as a tiny Arrow batch).

52
docs/user/index.md Normal file
View file

@ -0,0 +1,52 @@
# User Docs
**Audience:** users, CLI users, HTTP clients, and self-hosting operators
This is the public-facing entry point. These docs should describe behavior,
commands, configuration, and operational contracts without requiring knowledge
of MRs, internal recovery mechanics, or contributor-only invariants.
## Start Here
| Goal | Read |
|---|---|
| Install OmniGraph | [install.md](install.md) |
| Run the CLI locally | [cli.md](cli.md) |
| Look up every CLI flag and config field | [cli-reference.md](cli-reference.md) |
| Write schemas | [schema-language.md](schema-language.md) |
| Read schema-lint diagnostic codes | [schema-lint.md](schema-lint.md) |
| Write queries and mutations | [query-language.md](query-language.md) |
| Use embeddings | [embeddings.md](embeddings.md) |
## Operate A Repo
| Goal | Read |
|---|---|
| Understand repo layout and URI support | [storage.md](storage.md) |
| Work with branches, commits, and snapshots | [branches-commits.md](branches-commits.md) |
| Coordinate multi-query workflows | [transactions.md](transactions.md) |
| Read diffs and change feeds | [changes.md](changes.md) |
| Build and use indexes | [indexes.md](indexes.md) |
| Compact and clean old versions | [maintenance.md](maintenance.md) |
| Interpret errors and output formats | [errors.md](errors.md) |
## Run The Server
| Goal | Read |
|---|---|
| Deploy the binary or container | [deployment.md](deployment.md) |
| Use HTTP endpoints | [server.md](server.md) |
| Configure Cedar authorization | [policy.md](policy.md) |
| Track actors and audit behavior | [audit.md](audit.md) |
## Releases
Release notes live in [releases/](../releases/). Use them for user-visible
changes between versions, not for contributor design history.
## Boundary
User docs should focus on stable behavior. If a paragraph needs to explain
internal sidecars, Lance API blockers, MR numbers, test strategy, or review
rules, it probably belongs in [docs/dev/index.md](../dev/index.md) or a developer-area document
instead.

26
docs/user/indexes.md Normal file
View file

@ -0,0 +1,26 @@
# Indexes
## L1 — Lance index types OmniGraph exposes
| Index | Use | Notes |
|---|---|---|
| **BTREE scalar** | range / equality on any scalar | created on `@key`, `@index(...)`, and on key columns by `ensure_indices()` |
| **Inverted (FTS)** | `search`, `fuzzy`, `match_text`, `bm25` | created on text columns referenced by FTS queries |
| **Vector** | `nearest()` k-NN | Lance picks IVF_PQ vs HNSW family by configuration; OmniGraph stores as FixedSizeList(Float32, dim) |
## L2 — OmniGraph orchestration
- `ensure_indices()` / `ensure_indices_on(branch)` — idempotent build of BTREE + inverted indexes for the current head; safe to re-run.
- Indexes are built on the *branch head* (not on a snapshot), so reads always see the current index state.
- **Lazy branch forking for indexes**: a branch that hasn't mutated a sub-table doesn't need its own index — the main lineage's index is reused until the first write triggers a copy-on-write fork.
- Vector index parameters (metric, nlist, nprobe, etc.) are not exposed in the schema; they default at the Lance layer and are picked up automatically when an index is asked for on a Vector column.
## L2 — Graph topology index (`graph_index/mod.rs`)
This is OmniGraph-specific (not Lance):
- `TypeIndex`: dense `u32 ↔ String id` mapping per node type.
- `CsrIndex`: Compressed Sparse Row representation of edges per edge type — `offsets[i]..offsets[i+1]` slices into `targets`.
- `GraphIndex { type_indices, csr (out), csc (in) }` — built on demand from a snapshot's edge tables.
- Cached in `RuntimeCache::graph_indices` (LRU, max 8 entries, keyed by snapshot id + edge table versions).
- Built only when an `Expand` or `AntiJoin` IR op is present in the lowered query, so pure scans skip it.

94
docs/user/install.md Normal file
View file

@ -0,0 +1,94 @@
# Install
## Quick Install
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash
```
By default the installer places:
- `omnigraph`
- `omnigraph-server`
in `~/.local/bin`.
The default installer is binary-only. It downloads a published release asset,
verifies the SHA256 checksum, and unpacks it. It does not build from source.
If no stable tag is published yet, the installer automatically falls back to
the rolling `edge` release.
## Homebrew
```bash
brew tap ModernRelay/tap
brew install ModernRelay/tap/omnigraph
```
## Channels
Stable binaries:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | bash
```
Rolling edge binaries from `main`:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | RELEASE_CHANNEL=edge bash
```
Install from source:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install-source.sh | bash
```
## Useful Overrides
Install to a different directory:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | INSTALL_DIR="$HOME/bin" bash
```
Install a specific tag:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install.sh | VERSION=v0.1.0 bash
```
Build from a specific git ref:
```bash
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/install-source.sh | SOURCE_REF=main bash
```
## Manual Source Build
```bash
cargo build --release --locked -p omnigraph-cli -p omnigraph-server
install -m 0755 target/release/omnigraph ~/.local/bin/omnigraph
install -m 0755 target/release/omnigraph-server ~/.local/bin/omnigraph-server
```
## Release Assets
Tagged releases are expected to publish:
- `omnigraph-linux-x86_64.tar.gz`
- `omnigraph-macos-x86_64.tar.gz`
- `omnigraph-macos-arm64.tar.gz`
Each archive contains both binaries:
- `omnigraph`
- `omnigraph-server`
## Verify The Install
```bash
omnigraph version
omnigraph-server --help
```

29
docs/user/maintenance.md Normal file
View file

@ -0,0 +1,29 @@
# Maintenance: Optimize & Cleanup
`db/omnigraph/optimize.rs`.
## `optimize_all_tables(db)` — non-destructive
- Lance `compact_files()` on every node + edge table on `main`.
- Rewrites small fragments into fewer large ones; old fragments remain reachable via older manifests.
- Bounded by `OMNIGRAPH_MAINTENANCE_CONCURRENCY` (default 8).
- Returns `[TableOptimizeStats { table_key, fragments_removed, fragments_added, committed }]`.
## `cleanup_all_tables(db, options)` — destructive
- Lance `cleanup_old_versions()` per table.
- Removes manifests (and their unique fragments) older than the retention policy.
- `CleanupPolicyOptions { keep_versions: Option<u32>, older_than: Option<Duration> }` — at least one is required.
- Returns `[TableCleanupStats { table_key, bytes_removed, old_versions_removed }]`.
- CLI guards with `--confirm`; without it, prints a preview line.
- **Recovery floor:** `--keep < 3` may garbage-collect Lance versions that the open-time recovery sweep needs as a rollback target (the sweep restores to the branch's manifest-pinned table version, which is HEAD-1 in the typical Phase B → Phase C drift case). Default `--keep 10` is safe.
## Tombstones
Logical sub-table delete markers in `__manifest`; `tombstone_object_id(table_key, version)` excludes a sub-table version from snapshot reconstruction.
## Internal schema migrations (`db/manifest/migrations.rs`)
Version evolutions of the on-disk `__manifest` shape are reconciled automatically on the first write under a new binary. `INTERNAL_MANIFEST_SCHEMA_VERSION` declares the shape the binary expects; the on-disk stamp `omnigraph:internal_schema_version` (Lance schema-level metadata) records the on-disk shape. The publisher's open-for-write path calls `migrate_internal_schema` before reading state; reads are side-effect-free. No operator action is required for in-place upgrades. See [storage.md → Internal schema versioning](storage.md) for the full mechanism.
A binary opening a manifest stamped at a version *higher* than it knows about refuses to publish with a clear "upgrade omnigraph first" error — old binaries cannot clobber a newer schema.

44
docs/user/policy.md Normal file
View file

@ -0,0 +1,44 @@
# Authorization (Cedar policy)
OmniGraph integrates AWS Cedar (`cedar-policy = 4.9`) for ABAC.
## Policy actions
1. `read` — query / snapshot / list branches & commits
2. `export` — NDJSON export
3. `change` — mutations
4. `schema_apply` — apply schema migrations
5. `branch_create`
6. `branch_delete`
7. `branch_merge`
8. `run_publish`
9. `run_abort`
10. `admin` — reserved
## Scope kinds
- `branch_scope` — applied to source branch (`read`, `export`, `change`)
- `target_branch_scope` — applied to destination (`schema_apply`, branch ops, run ops)
- `protected_branches` — named list with special rules; rule scopes are `any | protected | unprotected`
## Configuration
`omnigraph.yaml`:
```yaml
policy:
file: ./policy.yaml # Cedar rules + groups
tests: ./policy.tests.yaml # declarative test cases
```
Each rule must use exactly one of `branch_scope` or `target_branch_scope`.
## CLI
- `omnigraph policy validate` — parse + count actors, exit 1 on parse error.
- `omnigraph policy test` — run cases in `policy.tests.yaml`, exit 1 on any expectation mismatch.
- `omnigraph policy explain --actor … --action … [--branch …] [--target-branch …]` — show decision and matched rule.
## Server enforcement
Every mutating endpoint calls `authorize_request()` *before* the handler runs; decisions are logged with actor / action / branch / outcome / matched rule.

111
docs/user/query-language.md Normal file
View file

@ -0,0 +1,111 @@
# Query Language (`.gq`)
Pest grammar at `crates/omnigraph-compiler/src/query/query.pest`. AST in `query/ast.rs`. Type checker in `query/typecheck.rs`. Lowering in `ir/lower.rs`.
## Query declarations
```
query <name>($p1: T1, $p2: T2?, …)
@description("…") @instruction("…") {
}
```
Two body shapes:
- **Read**: `match { … } return { … } [order { … }] [limit N]`
- **Mutation**: one or more of `insert | update | delete` statements
Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
## MATCH clauses
- **Binding**: `$x: NodeType { prop: <literal | $param | now()>, … }`
- **Traversal**: `$src EDGE_NAME { min, max? } $dst` — variable-length paths via hop bounds; default 1..1 if bounds omitted.
- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
## Search clauses (multi-modal)
Used inside MATCH or as expressions inside RETURN/ORDER:
| Function | Purpose | Underlying Lance facility |
|---|---|---|
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
| `search(field, q)` | Generic FTS | Inverted index |
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
| `match_text(field, q)` | Pattern match | Inverted index |
| `bm25(field, q)` | BM25 scoring | Inverted index |
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
## RETURN clause
`return { <expr> [as <alias>], … }` with expressions:
- Variable / property access: `$x`, `$x.prop`
- Literals: string, int, float, bool, list
- `now()`
- Aggregates: `count`, `sum`, `avg`, `min`, `max`
- All search functions above (so you can return a score column)
- `AliasRef` — re-use a previous projection alias
## ORDER & LIMIT
- `order { <expr> [asc|desc], … }` — supports plain expressions and `nearest(...)`.
- `limit <integer>` — required when there is a `nearest(...)` ordering.
## Mutation statements
- `insert <Type> { prop: <value>, … }`
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
- `delete <Type> where <prop> <op> <value>`
`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
### D₂ — mixed insert/update + delete is rejected at parse time
A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance 4.0.0 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until Lance exposes `DeleteJob::execute_uncommitted`, the parse-time rejection keeps both paths atomic and correct. See [docs/dev/runs.md](../dev/runs.md) and [docs/dev/invariants.md](../dev/invariants.md).
## IR (Intermediate Representation)
`QueryIR { name, params, pipeline: Vec<IROp>, return_exprs, order_by, limit }`
Pipeline operations:
- `NodeScan { variable, type_name, filters }`
- `Expand { src_var, dst_var, edge_type, direction (Out|In), dst_type, min_hops, max_hops, dst_filters }` — destination filters are pushed *into* the expand so Lance scalar pushdown can prune.
- `Filter { left, op, right }`
- `AntiJoin { outer_var, inner: Vec<IROp> }` — for `not { … }`
Lowering:
1. Partition MATCH clauses (bindings, traversals, filters, negations).
2. Identify "deferred" bindings (a destination of a traversal that has filters) so the Expand can carry the filter as a pushdown.
3. Emit NodeScan for the first binding, then Expand operations, then remaining Filter operations, then AntiJoins for negations.
4. Translate RETURN / ORDER expressions; preserve LIMIT.
## Linting & validation (`query/lint.rs`)
Codes seen so far:
- **Q000** (Error): parse error
- **L201** (Warning): nullable property never set by any UPDATE — "{type}.{prop} exists in schema but no update query sets it"
- (Warning): mutation declares no params — hardcoded mutations are easy to miss
- Plus all type errors from `typecheck_query_decl()` (undefined types, mismatched operators, undefined edges, etc.)
Output:
```
QueryLintOutput { status, schema_source, query_path,
queries_processed, errors, warnings, infos,
results: [{ name, kind, status, error?, warnings[] }],
findings: [{ severity, code, message, type_name?, property?, query_names[] }] }
```
CLI exits non-zero only on `status = Error`.

View file

@ -0,0 +1,80 @@
# Schema Language (`.pg`)
Pest grammar at `crates/omnigraph-compiler/src/schema/schema.pest`. AST at `schema/ast.rs`. Catalog at `catalog/mod.rs`.
## Top-level declarations
- `interface <Name> { property* }` — reusable property contracts.
- `node <Name> [implements <Iface>, ...] { property* | constraint* }`
- `edge <Name>: <FromType> -> <ToType> [@card(min..max)] { property* | constraint* }`
- Comments: line `//` and block `/* … */`.
## Property declarations
`<ident>: <TypeRef> [annotation*]`
## Built-in scalar types
| Scalar | Arrow type |
|---|---|
| `String` | Utf8 |
| `Blob` | LargeBinary |
| `Bool` | Boolean |
| `I32` / `I64` | Int32 / Int64 |
| `U32` / `U64` | UInt32 / UInt64 |
| `F32` / `F64` | Float32 / Float64 |
| `Date` | Date32 |
| `DateTime` | Date64 |
| `Vector(<dim>)` | FixedSizeList(Float32, dim), `1 ≤ dim ≤ i32::MAX` |
| `[<scalar>]` | List(scalar) |
| `enum(v1, v2, …)` | Utf8 with sorted/dedup'd set of allowed string values |
| `<scalar>?` | Same as scalar but `nullable: true` |
## Constraints (body level)
| Constraint | On | Effect |
|---|---|---|
| `@key(p, …)` | node | Primary key; implies index on key columns; `key_property()` returns the first key |
| `@unique(p, …)` | node, edge | Uniqueness across listed columns |
| `@index(p, …)` | node, edge | Build a scalar (BTREE) index on the columns |
| `@range(p, min..max)` | node | Numeric range validation (open ranges allowed) |
| `@check(p, "regex")` | node | Regex pattern validation |
| `@card(min..max?)` | edge | Edge multiplicity — default `0..*`; `0..1`, `1..1`, `1..*`, etc. |
Edge bodies only allow `@unique` and `@index`.
## Annotations
- `@<ident>` or `@<ident>(<literal>)` on any declaration or property.
- Known annotations:
- `@embed` on a Vector property — names the *source* property whose text gets embedded into this vector at ingest (`embed_sources` map in NodeType).
- `@description("…")`, `@instruction("…")` on query declarations (carried through to clients).
- Custom annotations are accepted by the parser and surfaced in catalog metadata; unrecognized annotations don't fail compilation.
## Catalog construction
- Pass 0: collect interfaces.
- Pass 1: collect nodes, expand `implements`, build constraint and `@embed` mappings, build the Arrow schema for each node table (`id: Utf8` plus all properties; blob columns get `LargeBinary`).
- Pass 2: collect edges, validate that `from_type` / `to_type` exist, normalize edge names case-insensitively for lookup, validate constraints for edges. Edge Arrow schema: `id: Utf8, src: Utf8, dst: Utf8` plus edge properties.
## Schema IR & stable type IDs
- `SCHEMA_IR_VERSION = 1` (`catalog/schema_ir.rs`).
- Each interface/node/edge currently gets a `stable_type_id` from a kind+name hash.
- Rename-preserving accepted IDs are an architectural invariant, but the current hash-on-name implementation is a known gap until migration carries IDs across `@rename_from`.
- Serialized as JSON for diff/migration plans.
## Schema migration planning
`plan_schema_migration(accepted, desired) -> SchemaMigrationPlan { supported, steps[] }` with step types:
- `AddType { type_kind, name }`
- `RenameType { type_kind, from, to }`
- `AddProperty { type_kind, type_name, property_name, property_type }`
- `RenameProperty { type_kind, type_name, from, to }`
- `AddConstraint { type_kind, type_name, constraint }`
- `UpdateTypeMetadata { … annotations }`
- `UpdatePropertyMetadata { … annotations }`
- `UnsupportedChange { entity, reason }` (forces `supported=false`)
`apply_schema()` returns `SchemaApplyResult { supported, applied, manifest_version, steps }` and is gated by an internal `__schema_apply_lock__` system branch so concurrent schema applies serialize.

61
docs/user/schema-lint.md Normal file
View file

@ -0,0 +1,61 @@
# Schema lint
The migration planner emits **code-tagged diagnostics** for every schema change it rejects. Codes have the form `OG-XXX-NNN` and identify the rule (not the message); operators reference them in suppression directives, severity overrides, and CI reports.
This page is the catalog of codes shipped today. The chassis behind it is tracked in [MR-694](https://linear.app/modernrelay/issue/MR-694).
## What's shipped in v0
- Stable code attached to every rejection the planner emits (today: 5 of 17 paths — the rest carry `code: None` and are tagged as future work).
- Code appears in the user-visible error message: `[OG-DS-104] removing property 'Person.age' is not supported …`.
- CLI `omnigraph schema plan` shows the code on `unsupported change …` lines.
- Tests in `tests/schema_apply.rs` assert on codes, not on free-text prose.
## What's not shipped yet
- Severity configuration in `omnigraph.yaml` (planned: `lint: { OG-DS-103: error }`).
- `@allow(OG-XXX-NNN, "rationale")` suppression directives.
- Pre-migration checks (the `migration_check { … }` block — MR-941).
- The CD / VE / LK / NM families (MR-942..945).
- CI integration (MR-946).
- Cost-class annotations (MR-944).
See the parent chassis issue (MR-694) for the design and the per-family sub-issues for what's planned.
## Code catalog (v0)
The chassis defines ten families. Today only DS and MF have emitted codes. The remaining families are reserved for future PRs.
| Code | Family | Tier | Default severity | Meaning |
|---|---|---|---|---|
| `OG-DS-101` | Destructive | destructive | error | drop graph type with rows (reserved; not yet emitted) |
| `OG-DS-102` | Destructive | destructive | error | drop node type with rows |
| `OG-DS-103` | Destructive | destructive | error | drop edge type with rows |
| `OG-DS-104` | Destructive | destructive | error | drop property with rows |
| `OG-DS-105` | Destructive | destructive | error | drop populated vector column (reserved) |
| `OG-MF-103` | Maybe-fail | validated | error | add required property without `@default` to populated type |
| `OG-MF-104` | Maybe-fail | validated | error | tighten nullable to non-nullable (reserved) |
| `OG-MF-106` | Maybe-fail | destructive | error | narrowing scalar type |
The full code catalog source of truth lives in `crates/omnigraph-compiler/src/lint/codes.rs`. CI-level invariants (uniqueness, format, family coverage) are unit-tested in the same module.
## Families
The ten chassis families:
| Prefix | Family | Status |
|---|---|---|
| **DS** | Destructive (data-loss) | shipped, v0 |
| **MF** | Maybe-fail / data-dependent | shipped, v0 |
| **CD** | Constraint deletion (relaxation warning) | tracked in MR-942 |
| **BC** | Backward-incompatible (rename) | implicit in `@rename_from`; codify later |
| **NM** | Naming conventions | tracked in MR-945 |
| **OW** | Ownership (per-resource Cedar) | tracked in MR-722 |
| **NL** | Non-linear (branch-merge divergence) | stubbed in MR-947 |
| **VE** | Vector / embedding | tracked in MR-943 |
| **ED** | Edge / graph topology | tracked in MR-701, MR-943 |
| **LK** | Lock duration / cost | tracked in MR-944 |
## Prior art
The chassis is modeled on [Atlas's `sqlcheck` analyzers](https://atlasgo.io/lint/analyzers) (DS / MF / CD / BC / NM families). Atlas was the direct inspiration for stable codes, per-rule severity, suppression directives with rationale, and pre-migration checks. omnigraph adapts the chassis to a typed-IR substrate (no SQL injection vector, no per-engine locking, native vector / edge / embedding types Atlas doesn't have).

101
docs/user/server.md Normal file
View file

@ -0,0 +1,101 @@
# HTTP Server (`omnigraph-server`)
Axum 0.8 + tokio + utoipa-generated OpenAPI. Single repo per process; deploy multiple processes for multi-tenant.
## Endpoint inventory
| Method | Path | Auth | Action | Handler |
|---|---|---|---|---|
| GET | `/healthz` | none | — | `server_health` |
| GET | `/openapi.json` | none | — | `server_openapi` (strips security if auth disabled) |
| GET | `/snapshot?branch=` | bearer + `read` | snapshot of branch | `server_snapshot` |
| POST | `/read` | bearer + `read` | run named query | `server_read` |
| POST | `/export` | bearer + `export` | NDJSON stream | `server_export` |
| POST | `/change` | bearer + `change` | mutation | `server_change` |
| GET | `/schema` | bearer + `read` | get current `.pg` source | `server_schema_get` |
| POST | `/schema/apply` | bearer + `schema_apply` (target=`main`) | migrate | `server_schema_apply` |
| POST | `/ingest` | bearer + `branch_create` (if new) + `change` | bulk load | `server_ingest` (32 MB body limit) |
| GET | `/branches` | bearer + `read` | list branches | `server_branch_list` |
| POST | `/branches` | bearer + `branch_create` | create | `server_branch_create` |
| DELETE | `/branches/{branch}` | bearer + `branch_delete` | delete | `server_branch_delete` |
| POST | `/branches/merge` | bearer + `branch_merge` | merge `source → target` | `server_branch_merge` |
| GET | `/commits?branch=` | bearer + `read` | list | `server_commit_list` |
| GET | `/commits/{commit_id}` | bearer + `read` | show | `server_commit_show` |
## Streaming
Only `/export` streams (`application/x-ndjson`, MPSC channel + `Body::from_stream`). Everything else is buffered JSON.
## Error model
Uniform `ErrorOutput { error, code?, merge_conflicts[], manifest_conflict? }` with `code ∈ unauthorized | forbidden | bad_request | not_found | conflict | too_many_requests | internal`. Merge conflicts attach structured `MergeConflictOutput { table_key, row_id?, kind, message }`.
`manifest_conflict` is set on **publisher CAS rejections** (HTTP 409): the
caller's pre-write view of one table's manifest version was stale.
`ManifestConflictOutput { table_key, expected, actual }` tells the client
which table to refresh and retry. This is the conflict shape produced by
concurrent `/change` or `/ingest` calls landing the same `(table, branch)`
race.
HTTP status codes used: 200, 400, 401, 403, 404, 409, 429, 500.
## Per-actor admission control
Disjoint
`(table, branch)` writes from different actors now run concurrently,
guarded only by the engine's per-(table, branch) write queue. To keep
one heavy actor from exhausting shared capacity (Lance I/O, manifest
churn, network), the server gates mutating handlers through a
`WorkloadController` configured per-process from environment variables:
| Env var | Default | Purpose |
|---|---|---|
| `OMNIGRAPH_PER_ACTOR_INFLIGHT_MAX` | 16 | Concurrent in-flight mutations per actor |
| `OMNIGRAPH_PER_ACTOR_BYTES_MAX` | 4 GiB | In-flight estimated bytes per actor |
When an actor exceeds its in-flight count or byte budget, the server
returns **HTTP 429 Too Many Requests** with `code: too_many_requests`
and a `Retry-After` header (seconds). The actor should back off; other
actors are unaffected.
Cedar policy authorization runs **before** admission accounting so
denied requests don't consume admission slots.
Today admission gates every mutating handler: `/change`, `/ingest`,
`/branches/{create,delete,merge}`, and `/schema/apply`. Read-only
endpoints (`/snapshot`, `/read`, `/export`, `/branches` GET, `/commits`,
`/schema` GET) are not admission-gated.
## Body limits
- Default: 1 MB
- `/ingest`: 32 MB
## Auth model (`bearer + SHA-256`)
- Tokens are SHA-256 hashed on startup; plaintext is never persisted in memory.
- Constant-time comparison via `subtle::ConstantTimeEq`.
- Three sources, in precedence:
1. `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` — AWS Secrets Manager (build with `--features aws`)
2. `OMNIGRAPH_SERVER_BEARER_TOKENS_FILE` or `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON` — JSON `{actor_id: token, …}`
3. `OMNIGRAPH_SERVER_BEARER_TOKEN` — single legacy token, actor `default`
- If no tokens configured, server runs unauthenticated (local dev) and `/openapi.json` strips the security scheme.
See [deployment.md](deployment.md) for token-source operational details.
## Tracing & observability
- `tower_http::TraceLayer::new_for_http()`
- Policy decisions logged at INFO level with actor, action, branch, decision, matched rule
- Startup logs: token source name, repo URI, bind address
- Graceful SIGINT shutdown
## Not implemented (by design or "TBD")
- CORS — not configured; add `tower_http::cors` if needed.
- Rate limiting — per-actor admission control gates `/change`, `/ingest`,
`/branches/{create,delete,merge}`, `/schema/apply` (see "Per-actor
admission control" above). No global rate limiter is configured;
add `tower_http::limit` if a graph-wide cap is needed.
- Pagination — none (commits/branches return everything; export streams).
- Multi-tenant routing — one repo per process.

115
docs/user/storage.md Normal file
View file

@ -0,0 +1,115 @@
# Storage
## L1 — Lance dataset (per node/edge type)
Every node type and every edge type is its own Lance dataset:
- **Columnar Arrow storage**: each property is a column; nullable per Arrow schema.
- **Fragments**: data is partitioned into fragments; new writes create new fragments.
- **Manifest versioning**: every commit produces a new dataset version; old versions remain readable.
- **Stable row IDs**: `enable_stable_row_ids: true` is set on every Lance dataset OmniGraph creates — node and edge data tables, `__manifest`, `_graph_commits.lance`, `_graph_commit_recoveries.lance`, and any future system tables. This is an architectural invariant: the flag is one-way at dataset create per Lance's row-id-lineage spec, so a future change that introduces a Lance dataset must preserve it. Consequences: `_row_created_at_version` and `_row_last_updated_at_version` are available on every dataset (load-bearing for change-feed validators); `CreateIndex × Rewrite` is not a retryable conflict, so indices survive `omnigraph optimize` without needing the Fragment Reuse Index; readers must use a Lance build that recognises the flag (our pinned 4.0.0 is fine). Pre-0.4.x repos created before this code path settled may have datasets without the flag and cannot be retrofitted in place — the supported path is dump-and-reload. The `stage_overwrite` rewrite path (used by `schema_apply`) preserves the flag through `Operation::Overwrite`; pinned by `stage_overwrite_preserves_stable_row_ids` in `crates/omnigraph/tests/staged_writes.rs`.
- **Append / delete / `merge_insert`**: native Lance write modes.
- **Per-dataset branches** (Lance native): copy-on-write at the dataset level.
- **Object-store agnostic**: file://, s3://, gs://, az://, http (read-only via Lance) — OmniGraph wires file:// and s3:// (`storage.rs`).
## L2 — Multi-dataset coordination via `__manifest`
OmniGraph is **not** a single Lance dataset; it is a *graph* of datasets coordinated through one append-only manifest table.
- **Manifest table**: `__manifest/` Lance dataset.
- **Layout** (`db/manifest/layout.rs`, `db/manifest/state.rs`):
- `nodes/{fnv1a64-hex(type_name)}` — one Lance dataset per node type
- `edges/{fnv1a64-hex(edge_type_name)}` — one Lance dataset per edge type
- `__manifest/` — the catalog of all sub-tables and their published versions
- `_graph_commits.lance` / `_graph_commit_actors.lance` — the commit graph and its actor map
- (legacy `_graph_runs.lance` / `_graph_run_actors.lance` from pre-v0.4.0 repos are inert; the run state machine was removed in MR-771 and these files are cleaned up via MR-770's production sweep)
- **Manifest row schema** (`object_id, object_type, location, metadata, base_objects, table_key, table_version, table_branch, row_count`):
- `object_type``table | table_version | table_tombstone`
- `table_key``node:<TypeName> | edge:<EdgeName>`
- `table_branch` is `null` for the main lineage and the branch name otherwise
- **Snapshot reconstruction**: latest visible `table_version` per `(table_key, table_branch)` minus tombstones — rows where `object_type = table_tombstone`, whose own `table_version` (acting as the tombstone version) is `>= the entry's table_version`.
- **Atomic publish**: multi-dataset commits publish via a `ManifestBatchPublisher` so a single write to `__manifest` flips all the new sub-table versions visible at once.
- **Row-level CAS on the merge-insert join key**: `object_id` carries `lance-schema:unenforced-primary-key=true` so Lance's bloom-filter conflict resolver rejects two concurrent commits that land the same `object_id` row. Without this annotation, Lance's transparent rebase would admit silent duplicates of `version:T@v=N` from racing publishers (see `.context/merge-insert-cas-granularity.md`).
- **Optimistic concurrency control on publish**: `ManifestBatchPublisher::publish` accepts a `expected_table_versions: HashMap<table_key, u64>` map. Each entry asserts the manifest's current latest non-tombstoned version for that table is exactly what the caller observed; mismatches surface as `OmniError::Manifest` with `ManifestConflictDetails::ExpectedVersionMismatch { table_key, expected, actual }`. Empty map preserves the legacy "best-effort publish" semantics. The publisher uses `conflict_retries(0)` against Lance and owns retry itself (`PUBLISHER_RETRY_BUDGET = 5`), re-running the pre-check on each iteration so concurrent advances surface as `ExpectedVersionMismatch` rather than being silently rebased through.
### Internal schema versioning (`db/manifest/migrations.rs`)
The on-disk shape of `__manifest` is reconciled with the binary via a single stamp + dispatcher. `INTERNAL_MANIFEST_SCHEMA_VERSION` declares the shape this binary writes; the on-disk stamp `omnigraph:internal_schema_version` lives in the manifest dataset's schema-level metadata (Lance `update_schema_metadata`).
- **`init_manifest_repo`** stamps the current version at creation, so newly initialized repos never need migration.
- **Publisher open-for-write path** (`load_publish_state`) calls `migrate_internal_schema(&mut dataset)` before reading state. When the on-disk stamp matches the binary, this is a single metadata read with no writes; otherwise the dispatcher walks `match`-arm steps forward (1→2, 2→3, …) until the stamp matches, then proceeds with the publish. Reads stay side-effect-free.
- **Forward-version protection**: a stamp *higher* than the binary's known version triggers a clear "upgrade omnigraph first" error. An old binary cannot clobber a newer schema by silently treating "unknown stamp" as "missing stamp".
- **Idempotency**: each migration step is safe to re-run. A crash between two metadata updates inside a single step leaves the partial state; the next open re-runs the step and the second update lands. The dispatcher itself is a cheap stamp-read on the steady-state path.
Adding a new on-disk shape change is one constant bump (`INTERNAL_MANIFEST_SCHEMA_VERSION`), one match arm in `migrate_internal_schema`, and one test. No code outside this module branches on the stamp.
| Stamp | Shape change |
|---|---|
| v1 (implicit, pre-stamp) | `__manifest.object_id` had no PK annotation; publisher had no row-level CAS protection. |
| v2 | `__manifest.object_id` carries `lance-schema:unenforced-primary-key=true`; row-level CAS engaged. Stamped as `omnigraph:internal_schema_version=2`. |
## On-disk layout
A repo on disk is a directory tree of Lance datasets. Each dataset follows the standard Lance layout (`_versions/`, `data/`, `_indices/`, `_refs/`); OmniGraph adds the multi-dataset coordination by keeping `__manifest/` alongside the per-type datasets.
```mermaid
flowchart TB
classDef l1 fill:#fef3e8,stroke:#c46900,color:#000
classDef l2 fill:#e8f4fd,stroke:#1e6aa8,color:#000
repo["repo URI<br/>file:// or s3://bucket/prefix"]:::l2
manifest["__manifest/<br/>L2 catalog of sub-tables"]:::l2
nodes["nodes/{fnv1a64-hex}/<br/>one dataset per node type"]:::l2
edges["edges/{fnv1a64-hex}/<br/>one dataset per edge type"]:::l2
cgraph["_graph_commits.lance/<br/>_graph_commit_actors.lance/<br/>_graph_commit_recoveries.lance/"]:::l2
recovery["__recovery/{ulid}.json<br/>recovery sidecars (transient)"]:::l2
refs["_refs/branches/{name}.json<br/>graph-level branches"]:::l2
repo --> manifest
repo --> nodes
repo --> edges
repo --> cgraph
repo --> recovery
repo --> refs
subgraph dataset[Inside each Lance dataset — L1]
ds_v["_versions/{n}.manifest<br/>per-dataset versions"]:::l1
ds_data["data/<br/>fragment files (Arrow IPC)"]:::l1
ds_idx["_indices/{uuid}/<br/>BTREE · Inverted FTS · IVF/HNSW"]:::l1
ds_refs["_refs/<br/>per-dataset Lance branches/tags"]:::l1
ds_tx["_transactions/<br/>commit transaction logs"]:::l1
end
nodes -.-> dataset
edges -.-> dataset
manifest -.-> dataset
```
**What's where:**
- **Repo root** is one directory (or S3 prefix). Everything below is part of one OmniGraph repo.
- **`__manifest/`** is a Lance dataset whose rows describe which sub-table version is published at which graph-branch. Reading a snapshot starts here.
- **`nodes/`** and **`edges/`** are sibling directories holding one Lance dataset per declared type. Names are `fnv1a64-hex` of the type name to keep paths fixed-length and case-safe.
- **`_graph_commits.lance`** is an L2 dataset that records the graph-level commit DAG, with a paired `_graph_commit_actors.lance` for the actor map. (Pre-v0.4.0 repos also have inert `_graph_runs.lance` / `_graph_run_actors.lance` from the removed Run state machine; MR-770 sweeps these in production.)
- **`_graph_commit_recoveries.lance`** — one row per recovery sweep action. Joined to `_graph_commits.lance` by `graph_commit_id`; the linked commit row carries `actor_id=omnigraph:recovery`. Operators correlate recoveries with the original mutations they rolled forward / back via this join. See `crates/omnigraph/src/db/recovery_audit.rs`.
- **`__recovery/{ulid}.json`** — transient sidecar files written by the four migrated writers (`MutationStaging::finalize`, `schema_apply`, `branch_merge`, `ensure_indices`) before Phase B begins, deleted after Phase C succeeds. A sidecar persisting after process exit means the writer crashed in the Phase B → Phase C window; the next `Omnigraph::open` recovery sweep processes it. Steady-state directory is empty. See `crates/omnigraph/src/db/manifest/recovery.rs`.
- **`_refs/branches/{name}.json`** is graph-level branch metadata — pointers from a branch name to the manifest version it heads.
- **Inside each Lance dataset** (orange): the standard Lance directory layout. `_versions/{n}.manifest` records every commit; `data/` holds the actual Arrow fragments; `_indices/{uuid}/` holds index segments with their own `fragment_bitmap` for partial coverage; `_refs/` holds Lance-native per-dataset branches and tags.
The split — L2 owns the cross-dataset catalog; L1 owns the per-dataset internals — means that schema work (which adds or removes datasets) updates `__manifest`, while data work (which adds fragments) updates `_versions/` inside the affected dataset and then bumps `__manifest`.
## URI scheme support (`storage.rs`)
| Scheme | Backend | Notes |
|---|---|---|
| local path / `file://` | `LocalStorageAdapter` (tokio) | Normalized to absolute paths |
| `s3://bucket/prefix` | `S3StorageAdapter` (object_store) | Honors `AWS_ENDPOINT_URL_S3`, `AWS_ALLOW_HTTP`, `AWS_S3_FORCE_PATH_STYLE` |
| `http(s)://host:port` | HTTP client to `omnigraph-server` | Used by CLI as a target, not a storage backend |
## Object-store env vars (S3-compatible)
- `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`
- `AWS_ENDPOINT_URL`, `AWS_ENDPOINT_URL_S3` — for MinIO / RustFS / GCS-via-XML
- `AWS_S3_FORCE_PATH_STYLE=true` — path-style URLs
- `AWS_ALLOW_HTTP=true` — allow plain HTTP (local dev)

168
docs/user/transactions.md Normal file
View file

@ -0,0 +1,168 @@
# Transactions in OmniGraph
OmniGraph does not have `BEGIN` / `COMMIT` / `ROLLBACK`. Branches do that job. This page explains the model, when to use which primitive, and shows worked examples for the patterns that come up most.
The architectural rule lives in [`docs/dev/invariants.md`](../dev/invariants.md):
> **Mutations publish at one boundary.** A `mutate_as` or `load` operation
> accumulates constructive writes, commits each touched table at the end, then
> publishes one manifest update.
If you need to coordinate multiple queries atomically, you fork a branch, run mutations on it, and merge when you're satisfied. If something goes wrong, you delete the branch.
## The atomicity model
Two primitives, two scopes:
| Scope | Primitive | Atomic? | Failure mode |
|---|---|---|---|
| **One `.gq` query** (any number of statements inside) | The query itself — handled by the publisher's atomic manifest commit | Yes — all statements land together or none of them do | The publisher never publishes; target unchanged |
| **Many queries that must succeed together** | Branches: `branch_create` → run N queries on the branch → `branch_merge` | Yes — the merge is a single atomic publish | Drop the branch (`branch_delete`); main is unaffected |
Snapshot isolation is per-query — every read inside one query sees one consistent manifest version. Two concurrent queries on the same branch see independent snapshots; the publisher's CAS catches racing writes.
## Comparison with `BEGIN` / `COMMIT`
| Postgres / MySQL | OmniGraph |
|---|---|
| `BEGIN; … ; COMMIT` | `branch_create review/X` → mutations on `review/X``branch_merge review/X --into main` |
| `ROLLBACK` | `branch_delete review/X` |
| Connection-bound session state | Branch-scoped lineage on disk |
| Locks (or MVCC + abort on conflict) | Snapshot isolation per query + three-way merge at branch-join |
| Transaction is invisible to ops | Branch is a durable artifact (visible in `branch_list`, queryable, time-travelable) |
The trade-off: branches are heavier than a connection-scoped transaction (they exist on disk, have a name, show up in `branch_list`), but they fit the agent-as-user model — agents naturally fork branches to plan, batch, and review work. And they're durable: if your process crashes mid-workflow, the branch survives and you can pick up where you left off.
## Worked examples
### 1. Single query, multi-statement (atomic by default)
A `.gq` query with multiple `insert` / `update` statements is one transaction. Either all statements land together at publish time, or none do.
```gq
query register_employee_with_team($name: String, $age: I32, $team: String) {
insert Person { name: $name, age: $age }
insert WorksAt { from: $name, to: $team }
}
```
```bash
omnigraph change --query ./mutations.gq --name register_employee_with_team \
--params '{"name":"Alice","age":30,"team":"Acme"}' ./repo.omni
```
If the second statement fails (e.g. `Acme` doesn't exist), the publisher never publishes; `Alice` is not in the database. Atomic.
### 2. Two separate queries on `main` (NOT atomic)
```bash
# Query 1
omnigraph change --query ./mutations.gq --name register_employee --params '{"name":"Alice","age":30}' ./repo.omni
# Query 2 — runs after Query 1 has already published
omnigraph change --query ./mutations.gq --name link_to_team --params '{"name":"Alice","team":"Acme"}' ./repo.omni
```
These are **two publishes** on `main`. If Query 2 fails, Query 1's effects are already visible. There is no `ROLLBACK` for Query 1.
If you want both-or-neither, you have two options:
- Combine them into a single `.gq` query (option 1 above), or
- Use a branch (option 3 below).
### 3. Many queries, atomic via a branch
The pattern when you need to run multiple queries — possibly across multiple commands, agents, or sessions — and have them succeed or fail as a unit.
```bash
# Fork a working branch from main.
omnigraph branch create --from main onboarding/2026-04-25 ./repo.omni
# Run any number of mutations on the branch — each one is its own publish on the branch.
# Concurrent reads of `main` are unaffected.
omnigraph change --branch onboarding/2026-04-25 \
--query ./mutations.gq --name register_employee \
--params '{"name":"Alice","age":30}' ./repo.omni
omnigraph change --branch onboarding/2026-04-25 \
--query ./mutations.gq --name register_employee \
--params '{"name":"Bob","age":25}' ./repo.omni
omnigraph change --branch onboarding/2026-04-25 \
--query ./mutations.gq --name link_to_team \
--params '{"name":"Alice","team":"Acme"}' ./repo.omni
# Inspect the branch — read queries work just like on main.
omnigraph read --branch onboarding/2026-04-25 \
--query ./queries.gq --name list_employees ./repo.omni
# Happy with what's on the branch? Merge it. This is one atomic publish:
# `main` flips to include every commit on the branch.
omnigraph branch merge onboarding/2026-04-25 --into main ./repo.omni
# OR: not happy? Throw it away. `main` is untouched.
# omnigraph branch delete onboarding/2026-04-25 ./repo.omni
```
Properties:
- Each query on the branch is its own publisher commit — so they're individually atomic. Per-query CAS works on branches just like on main.
- The branch lives on disk. Process crash mid-workflow? Re-open and resume.
- Multiple agents can work on different branches in parallel without blocking each other.
- The merge is a three-way merge at the row level. Conflicts surface as `OmniError::MergeConflicts(Vec<MergeConflict>)`, with structured kinds (`DivergentInsert`, `DivergentUpdate`, `DeleteVsUpdate`, …) so callers can handle them programmatically.
### 4. Coordinating multiple agents
Two agents writing to the same graph independently:
```bash
# Agent A
omnigraph branch create --from main agent-a/work ./repo.omni
omnigraph change --branch agent-a/work … ./repo.omni
# … many mutations …
omnigraph branch merge agent-a/work --into main ./repo.omni
# Agent B (running concurrently)
omnigraph branch create --from main agent-b/work ./repo.omni
omnigraph change --branch agent-b/work … ./repo.omni
# … many mutations …
omnigraph branch merge agent-b/work --into main ./repo.omni
```
Each agent sees a consistent snapshot of `main` at the time it forked. The first merge to `main` lands as a fast-forward (or a no-op if no concurrent change). The second merge runs three-way: rows touched by both branches surface as `MergeConflict`s for the caller to resolve.
This is the workflow MR-797 / agentic loops are designed around: **branches are the unit of "an agent's working set."**
## Failure modes
| Scenario | What happens | Caller action |
|---|---|---|
| Single query fails mid-flight | Publisher never publishes; target unchanged | Read the error, decide whether to retry |
| Concurrent writers race the same `(table, branch)` | Publisher CAS rejects the loser with `ManifestConflictDetails::ExpectedVersionMismatch` | Refresh handle, retry the query |
| Branch with N successful mutations, then merge fails (three-way conflict) | Each individual mutation already committed on the branch; merge surfaces `MergeConflicts` | Inspect, decide whether to keep working on the branch, abandon it (`branch_delete`), or resolve and re-merge |
| Process crashes mid-branch-workflow | Each completed mutation on the branch is durable | Re-open the repo, continue where you left off |
## When to use what
| Intent | Use |
|---|---|
| One conceptual change, multiple statements | One `.gq` query with multiple statements |
| Bulk import of a related set of records | One `omnigraph load` (the loader is one atomic query under the hood) |
| Many independent changes, no coordination needed | Many separate queries on `main`. Each is its own atomic unit. |
| "Do these N things, all together or not at all" | Branch → run N queries → merge |
| "Try things, evaluate, then commit" | Branch → mutate → read/inspect → merge or delete |
| "Multiple agents writing concurrently" | One branch per agent, merge to `main` at end of agent task |
| "Long-running workflow that may span sessions or process restarts" | Branch (durable on disk) |
## What this model can't do
- **Cross-query atomicity on `main` without a branch.** If you don't want to fork a branch, multiple queries on `main` publish independently. There is no implicit transaction.
- **Long-running interactive transactions.** No `BEGIN` over a connection. Branches are the durable equivalent.
- **Cross-graph (cross-repo) transactions.** Each repo is its own atomicity domain.
- **"Pessimistic" locks** that serialize writers before they reach the storage layer. Snapshot-MVCC + publisher CAS handles concurrency optimistically; the loser retries.
## See also
- [`docs/user/branches-commits.md`](branches-commits.md) — branch and commit-graph mechanics.
- [`docs/dev/merge.md`](../dev/merge.md) — three-way merge details and conflict kinds.
- [`docs/user/query-language.md`](query-language.md) — `.gq` syntax for the multi-statement queries used above.
- [`docs/dev/runs.md`](../dev/runs.md) — the per-query commit pipeline that gives single-query atomicity.
- [`docs/dev/invariants.md`](../dev/invariants.md) — the architectural rule.