docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)

Content build-out on top of the Phase 1 topic move. No behavior changes.

Splits (existing content relocated, cross-linked):
- queries/index.md → mutations/index.md (insert/update/delete + the
  inserts-vs-deletes rule) and search/index.md (the multi-modal search
  functions + a hybrid-ranking overview tying nearest/bm25/rrf together).
  queries/index.md now covers the read shape and points at both.
- branching/index.md → branching/time-travel.md (snapshots/time travel) and
  branching/merge.md (three-way merge + the 7 conflict kinds, verified against
  error.rs MergeConflictKind).

New pages (written from the code, user-facing):
- quickstart.md — init → load → query → branch, with verified CLI flags.
- concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing.

Expanded operations/audit.md from a 7-line struct dump into a real
actor-tracking page (server token-resolved vs CLI --as chain; reading the
trail; the omnigraph:recovery reserved actor).

Index wiring: docs/user/index.md and AGENTS.md's topic table link every new
page; also normalized AGENTS.md's docs/user link display text to match the
Phase 1 retargeted paths.

Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs).

Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in
queries/branching), guides/, and a possible reference/config.md split (the
config schema is already coherent in cli/reference.md).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Andrew Altshuler 2026-06-14 13:53:46 +03:00 committed by GitHub
parent d46e50dd6d
commit 612741b387
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 399 additions and 67 deletions

View file

@ -73,32 +73,38 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec
| **Lance docs index — fetch upstream Lance docs by problem domain** | **[docs/dev/lance.md](docs/dev/lance.md)** |
| **Test coverage map — what's covered, what helpers to reuse, before-every-task checklist** | **[docs/dev/testing.md](docs/dev/testing.md)** |
| Architecture, L1/L2 framing, concurrency model | [docs/dev/architecture.md](docs/dev/architecture.md) |
| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/storage.md](docs/user/concepts/storage.md) |
| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema-language.md](docs/user/schema/index.md) |
| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema-lint.md](docs/user/schema/lint.md) |
| `.gq` query language, MATCH/RETURN/ORDER, search funcs, mutations, IR ops, lint codes | [docs/user/query-language.md](docs/user/queries/index.md) |
| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/indexes.md](docs/user/search/indexes.md) |
| Embeddings (compiler + engine clients, env vars, `@embed`) | [docs/user/embeddings.md](docs/user/search/embeddings.md) |
| Branches, commit graph, snapshots, system branches | [docs/user/branches-commits.md](docs/user/branching/index.md) |
| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/transactions.md](docs/user/branching/transactions.md) |
| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/concepts/storage.md](docs/user/concepts/storage.md) |
| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema/index.md](docs/user/schema/index.md) |
| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema/lint.md](docs/user/schema/lint.md) |
| `.gq` query language, MATCH/RETURN/ORDER, IR ops, lint codes | [docs/user/queries/index.md](docs/user/queries/index.md) |
| Mutations — insert/update/delete, D2, atomicity | [docs/user/mutations/index.md](docs/user/mutations/index.md) |
| Search funcs (`nearest`/`bm25`/`rrf`), hybrid ranking | [docs/user/search/index.md](docs/user/search/index.md) |
| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/search/indexes.md](docs/user/search/indexes.md) |
| Embeddings (compiler + engine clients, env vars, `@embed`) | [docs/user/search/embeddings.md](docs/user/search/embeddings.md) |
| Concepts — what OmniGraph is, L1/L2 framing | [docs/user/concepts/index.md](docs/user/concepts/index.md) |
| Quickstart — init → load → query → branch | [docs/user/quickstart.md](docs/user/quickstart.md) |
| Branches, commit graph, system branches | [docs/user/branching/index.md](docs/user/branching/index.md) |
| Snapshots & time travel | [docs/user/branching/time-travel.md](docs/user/branching/time-travel.md) |
| Three-way merge and conflict kinds (user-facing) | [docs/user/branching/merge.md](docs/user/branching/merge.md) |
| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/branching/transactions.md](docs/user/branching/transactions.md) |
| Direct-publish write path (staging, D2, recovery sidecars; the former Run state machine) | [docs/dev/writes.md](docs/dev/writes.md) |
| Three-way merge and conflict kinds | [docs/dev/merge.md](docs/dev/merge.md) |
| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/changes.md](docs/user/branching/changes.md) |
| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/branching/changes.md](docs/user/branching/changes.md) |
| Query execution, mutation execution, bulk loader, `load` vs `ingest` | [docs/dev/execution.md](docs/dev/execution.md) |
| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/maintenance.md](docs/user/operations/maintenance.md) |
| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/cluster.md](docs/user/clusters/index.md) |
| Cedar policy actions, scopes, CLI | [docs/user/policy.md](docs/user/operations/policy.md) |
| HTTP server endpoints, auth, error model, body limits | [docs/user/server.md](docs/user/operations/server.md) |
| CLI quick-start | [docs/user/cli.md](docs/user/cli/index.md) |
| CLI command surface and config schemas (`~/.omnigraph/config.yaml`, legacy `omnigraph.yaml`) | [docs/user/cli-reference.md](docs/user/cli/reference.md) |
| Audit / actor tracking | [docs/user/audit.md](docs/user/operations/audit.md) |
| Error taxonomy and result serialization | [docs/user/errors.md](docs/user/operations/errors.md) |
| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/operations/maintenance.md](docs/user/operations/maintenance.md) |
| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/clusters/index.md](docs/user/clusters/index.md) |
| Cedar policy actions, scopes, CLI | [docs/user/operations/policy.md](docs/user/operations/policy.md) |
| HTTP server endpoints, auth, error model, body limits | [docs/user/operations/server.md](docs/user/operations/server.md) |
| CLI quick-start | [docs/user/cli/index.md](docs/user/cli/index.md) |
| CLI command surface and config schemas (`~/.omnigraph/config.yaml`, legacy `omnigraph.yaml`) | [docs/user/cli/reference.md](docs/user/cli/reference.md) |
| Audit / actor tracking | [docs/user/operations/audit.md](docs/user/operations/audit.md) |
| Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) |
| Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) |
| Deployment (binary / container / RustFS bootstrap / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
| CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) |
| Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) |
| Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) |
| Constants & tunables cheat sheet | [docs/user/constants.md](docs/user/reference/constants.md) |
| Constants & tunables cheat sheet | [docs/user/reference/constants.md](docs/user/reference/constants.md) |
| Per-version release notes | [docs/releases/](docs/releases/) |
---
@ -257,7 +263,7 @@ omnigraph policy explain --actor act-alice --action change --branch main
| Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing |
| Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` |
| Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming |
| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/operations/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. |
| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **multi-graph mode (v0.6.0+) with cluster routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Multi-graph boots from a cluster directory (`--cluster`) or the legacy `omnigraph.yaml`; add/remove graphs via `cluster apply` (or by editing the legacy file) and restarting.** |
| CLI with config | — | two-surface config (team `cluster.yaml` dir + per-operator `~/.omnigraph/config.yaml`; legacy `omnigraph.yaml` deprecated per RFC-008), aliases, multi-format output (json/jsonl/csv/kv/table) |
| Audit / actor tracking | — | `_as` write APIs + actor map in commit graph |
@ -282,7 +288,7 @@ Rules:
7. **Re-verify before recommending.** If you cite a flag, env var, endpoint, or constant to the user or in code, grep for it in source first. Memory and docs go stale; the code is authoritative.
8. **Keep AGENTS.md short.** This file is always loaded into agent context, so every added line has a recurring context-window cost. Prefer pointers and terse invariants here; put detail in `docs/`.
9. **Keep AGENTS.md a map, not an encyclopedia.** New deep content goes into `docs/`. Add an entry to "Where to find each topic" instead of pasting prose into this file. The "Always-on rules" section is the exception — it's for invariants that should always be in scope.
10. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema-language.md](docs/user/schema/index.md), [docs/user/query-language.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality.
10. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema/index.md](docs/user/schema/index.md), [docs/user/queries/index.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality.
11. **Always make smaller commits.** Each commit does one thing, compiles, and passes tests; mechanical refactors land separately from the behavior changes they enable.
12. **Test-first for bug fixes.** When fixing an identified bug, write a regression test that reproduces the failure first. Confirm it fails against the current code with the predicted symptom (not an unrelated error). Then land the fix in a separate commit and confirm the test turns green. The test commit lands just before the fix commit so the red → green pair is visible in `git log` and a reviewer can check out the test commit alone and reproduce the failure.
13. **Correct by design over symptomatic patches.** When a bug surfaces, identify the root cause and make the fix correct by construction. Don't patch the symptom. If the design admits the bug class, the fix is to close the class, not to add a guard around the latest instance. A symptomatic patch is acceptable only as a stop-gap, with an explicit note in the commit message and a follow-up issue tracking the design fix.

View file

@ -43,11 +43,9 @@ Notes:
## L2 — Snapshots & time travel
- `snapshot()` — current snapshot for the bound branch; cached.
- `snapshot_of(target)` — snapshot at a `ReadTarget` (branch | snapshot id).
- `snapshot_at_version(v: u64)` — historical snapshot from any manifest version.
- `entity_at(table_key, id, version)` — single-entity time travel without building a full snapshot.
- A `Snapshot` is a `(version, HashMap<table_key, SubTableEntry>)` — cheap to build, snapshot-isolated cross-table reads.
Reading a branch at a past version, or a single entity at a past version, is
covered on the [time travel](time-travel.md) page. Merging branches and the
conflict kinds are on the [merge](merge.md) page.
## L2 — Internal system branches

View file

@ -0,0 +1,47 @@
# Merging Branches
Merging integrates the changes on one branch into another. OmniGraph merges are
**three-way and row-level**: it compares both branches against their common
ancestor and merges each node/edge table row by row, then publishes the result as
**one atomic commit** across the whole graph.
```bash
omnigraph branch merge review/2026-04-25 --into main s3://bucket/graph.omni
```
`branch merge <source> [--into <target>]` merges `<source>` into `<target>`
(default `main`).
## Outcomes
A merge resolves to one of three outcomes:
- **Already up to date** — the target already contains every change on the source;
nothing to do.
- **Fast-forward** — the target has no changes the source lacks, so the target
simply advances to the source.
- **Merged** — both sides diverged; a new merge commit is created with two parents.
## Conflicts
When both branches changed the same data incompatibly, the merge fails with a
structured list of conflicts (the HTTP server returns `409` with a
`merge_conflicts[]` array). No partial result is published — the merge is
all-or-nothing. The conflict kinds are:
| Kind | Meaning |
|---|---|
| `DivergentInsert` | The same id was inserted on both branches. |
| `DivergentUpdate` | The same row was updated differently on both branches. |
| `DeleteVsUpdate` | One side deleted a row the other side updated. |
| `OrphanEdge` | An edge references a node the other side deleted. |
| `UniqueViolation` | The merged result would violate a unique constraint. |
| `CardinalityViolation` | The merged result would violate an edge cardinality constraint. |
| `ValueConstraintViolation` | The merged result would violate a value constraint (enum/range). |
Each conflict carries the table, the row id (when applicable), the kind, and a
message. Resolve conflicts by reconciling the two branches — typically by making
the conflicting change on one side and re-merging.
See [branches & commits](index.md) for the branch and commit-DAG model, and
[changes](changes.md) for diffing two branches before you merge.

View file

@ -0,0 +1,31 @@
# Snapshots & Time Travel
Every read in OmniGraph happens against a **snapshot** — a consistent, cross-table
view of the graph at one manifest version. A query holds one snapshot for its whole
lifetime, so it never sees a partial write from a concurrent commit (see
[transactions](transactions.md)).
## Reading the past
- **Current head** — by default a read targets the current head of the bound branch.
- **By snapshot id** — read a branch or a specific snapshot id (`--snapshot` on
`omnigraph read`).
- **By version** — reconstruct a historical snapshot from any past manifest version.
- **Single entity** — look up one entity at a past version without building a full
snapshot (cheaper when you only need one node or edge).
Snapshots are cheap to build: a snapshot is just the set of visible sub-table
versions at a manifest version, so cross-table reads stay snapshot-isolated.
## CLI
```bash
# Read a query against a past snapshot
omnigraph read --query ./q.gq --name find --snapshot <snapshot-id> s3://bucket/graph.omni
```
Time travel composes with branches: every branch has its own version history, and
you can read any branch at any of its past versions. Commits and the commit DAG
that these versions correspond to are described in
[branches & commits](index.md); diffing two versions is on the
[changes](changes.md) page.

View file

@ -0,0 +1,49 @@
# Concepts
OmniGraph is a typed property-graph engine built as a coordination layer over the
[Lance](https://lance.org) columnar storage format. It gives you a schema-checked
graph with vector, full-text, and graph queries in one runtime, plus Git-style
branches and commits across the whole graph.
## The data model
- A graph has **node types** and **edge types**, declared in a
[schema](../schema/index.md).
- Each node type and each edge type is stored as its **own Lance dataset**
columnar, versioned, on local disk or object storage.
- A single `__manifest` table coordinates all of those datasets, so the graph has
one coherent version even though it spans many datasets.
This split is what lets a graph commit be **atomic across every type at once**: a
publish flips every relevant dataset's version together in one manifest write, so
readers never see a half-applied change. See [storage](storage.md) for the layout.
## Two layers: inherited vs. added
Throughout the docs, capabilities are framed as **L1** (inherited from Lance) or
**L2** (added by OmniGraph):
| | L1 — from Lance | L2 — added by OmniGraph |
|---|---|---|
| Storage | Columnar Arrow datasets on object storage | Per-type datasets coordinated as one graph |
| Versioning | Per-dataset versions + time travel | [Snapshots](../branching/time-travel.md) across all types at once |
| Branches | Per-dataset branches | [Graph-level branches](../branching/index.md), atomic across types |
| Commits | Per-dataset commits | [Commit DAG](../branching/index.md) for the whole graph; three-way [merge](../branching/merge.md) |
| Indexes | Scalar / vector / full-text indexes | Built per relevant column; graph topology index for traversal |
| Search | Vector + full-text primitives | [`nearest` / `bm25` / `rrf`](../search/index.md) in one query, plus graph traversal |
| Querying | — | The [`.gq` query language](../queries/index.md) and [`.pg` schema language](../schema/index.md) |
## How the pieces fit
- The **schema** (`.pg`) and **query** (`.gq`) languages are compiled to a typed
intermediate representation.
- The **engine** runs queries and mutations against Lance, coordinates the manifest,
maintains the commit graph, and builds indexes.
- The **CLI** ([`omnigraph`](../cli/index.md)) and the
**HTTP server** ([`operations/server.md`](../operations/server.md)) are two front
ends over the same engine, so embedded and remote behavior match.
- [Cedar policy](../operations/policy.md) enforcement is engine-wide — every writer
goes through the same authorization gate regardless of front end.
For deployment-scale topics — multi-graph servers, control-plane operations,
recovery — see [clusters](../clusters/index.md).

View file

@ -12,6 +12,8 @@ start with install, then follow the section that matches your task.
| Goal | Read |
|---|---|
| Install OmniGraph | [install.md](install.md) |
| Run the core loop end to end | [quickstart.md](quickstart.md) |
| Understand the model | [concepts/index.md](concepts/index.md) |
| Run the CLI | [cli/index.md](cli/index.md) |
| Look up every CLI flag and config field | [cli/reference.md](cli/reference.md) |
@ -21,8 +23,9 @@ start with install, then follow the section that matches your task.
|---|---|
| Write schemas (the `.pg` language) | [schema/index.md](schema/index.md) |
| Read schema-lint diagnostic codes | [schema/lint.md](schema/lint.md) |
| Write queries and mutations (the `.gq` language) | [queries/index.md](queries/index.md) |
| Use vector / full-text / hybrid search | [search/indexes.md](search/indexes.md) |
| Write queries (the `.gq` language) | [queries/index.md](queries/index.md) |
| Write data — inserts, updates, deletes | [mutations/index.md](mutations/index.md) |
| Use vector / full-text / hybrid search | [search/index.md](search/index.md) |
| Generate embeddings | [search/embeddings.md](search/embeddings.md) |
| Build and use indexes | [search/indexes.md](search/indexes.md) |
@ -30,7 +33,9 @@ start with install, then follow the section that matches your task.
| Goal | Read |
|---|---|
| Work with branches, commits, and snapshots | [branching/index.md](branching/index.md) |
| Work with branches and commits | [branching/index.md](branching/index.md) |
| Read past versions (time travel) | [branching/time-travel.md](branching/time-travel.md) |
| Merge branches and resolve conflicts | [branching/merge.md](branching/merge.md) |
| Coordinate multi-query workflows | [branching/transactions.md](branching/transactions.md) |
| Read diffs and change feeds | [branching/changes.md](branching/changes.md) |
@ -56,6 +61,7 @@ start with install, then follow the section that matches your task.
| Goal | Read |
|---|---|
| Understand the model and L1/L2 framing | [concepts/index.md](concepts/index.md) |
| Understand graph layout and URI support | [concepts/storage.md](concepts/storage.md) |
| Look up constants and tunables | [reference/constants.md](reference/constants.md) |

View file

@ -0,0 +1,52 @@
# Mutations
Write statements live inside a `query` declaration whose body is one or more
mutation statements (the [query language](../queries/index.md) covers the read
shape and shared declaration syntax).
```
query onboard($name: String, $title: String) {
insert Person { name: $name, title: $title }
}
```
An edge type is inserted the same way — its endpoint columns are just
properties in the assignment block (`insert WorksAt { person: $p, org: $o }`).
## Statements
- `insert <Type> { prop: <value>, … }`
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
- `delete <Type> where <prop> <op> <value>`
`<value>` is a literal, `$param`, or `now()`.
## Atomicity
A change query publishes **one commit** at the end of the query. Multiple
insert/update statements accumulate in memory and commit together — a mid-query
failure leaves the graph untouched. See [transactions](../branching/transactions.md)
for the per-query atomicity contract and [branches](../branching/index.md) for
multi-query workflows.
## Inserts/updates and deletes cannot mix in one query
A single change query must be **either insert/update-only or delete-only**.
Mixing the two is rejected at parse time, before any I/O:
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split
> into separate mutations: (1) inserts and updates, then (2) deletes.`
Run two separate queries instead — the inserts/updates first, then the deletes.
The restriction exists because inserts/updates and deletes commit through
different paths today, and mixing them in one query creates ordering hazards
(e.g. a same-row insert-then-delete, or a cascading delete of a just-inserted
edge). Keeping the two kinds in separate queries keeps each one atomic and
correct.
## Bulk loading
For loading data from files rather than inline statements, use
[`omnigraph load`](../cli/index.md) (`--mode overwrite|append|merge`) — it is the
single bulk-write command and applies the same schema validation and atomic
publish as inline mutations.

View file

@ -1,7 +1,46 @@
# Audit / Actor tracking
# Audit & Actor Tracking
- `Omnigraph::audit_actor_id: Option<String>` is the actor in effect.
- `_as` variants of every write API let callers override the actor: `mutate_as`, `load_as`, `branch_merge_as`, `apply_schema_as`, etc.
- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map).
- HTTP server uses the bearer-token actor automatically. The CLI resolves one actor chain everywhere: `--as` > legacy `cli.actor` in `omnigraph.yaml` > `operator.actor` in `~/.omnigraph/config.yaml` > none (RFC-007).
- Pre-v0.4.0 graphs also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0. The v2→v3 manifest migration sweeps any stale `__run__*` branches on first write-open (MR-770); the inert dataset bytes remain until a `delete_prefix` primitive lands.
Every write in OmniGraph records **who made it**. The actor id is persisted on the
graph commit, so the commit history is an audit trail of which actor changed the
graph and when.
## Where the actor comes from
The actor is resolved differently depending on the front end, but it always lands
on the commit:
- **HTTP server** — the actor is resolved **server-side from the bearer token**. A
client cannot set its own actor id; it is derived from the authenticated token.
See [policy](policy.md) for how tokens map to actors.
- **CLI / embedded** — the actor is self-declared through one resolution chain:
1. `--as <actor>` on the command,
2. then `operator.actor` in `~/.omnigraph/config.yaml` (see the
[CLI reference](../cli/reference.md)),
3. otherwise none.
This difference is intentional: storage credentials imply a self-declared actor,
while a server resolves the actor from a token it trusts.
## Reading the audit trail
Actor ids are stored on each commit in the [commit graph](../branching/index.md).
List commits to see who made each change:
```bash
omnigraph commit list graph.omni
```
System-initiated writes use reserved actor ids — for example, automatic recovery
of an interrupted write records `omnigraph:recovery`, so operator changes and
machine repairs are distinguishable in the history:
```bash
omnigraph commit list --filter actor=omnigraph:recovery graph.omni
```
## What is tracked
Every successful publish — load, change, branch merge, and schema apply — appends a
commit carrying the resolving actor. Because publishes are atomic, the actor on a
commit is exactly the actor responsible for that whole change.

View file

@ -13,8 +13,11 @@ query <name>($p1: T1, $p2: T2?, …)
Two body shapes:
- **Read**: `match { … } return { … } [order { … }] [limit N]`
- **Mutation**: one or more of `insert | update | delete` statements
- **Read**: `match { … } return { … } [order { … }] [limit N]` — covered on this page.
- **Mutation**: one or more of `insert | update | delete` statements — see [mutations](../mutations/index.md).
Multi-modal search functions (`nearest`, `bm25`, `rrf`, …) used inside `match`,
`return`, and `order` are documented on the [search](../search/index.md) page.
Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
@ -25,21 +28,6 @@ Param types reuse all schema scalars; trailing `?` makes a param optional. The c
- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
## Search clauses (multi-modal)
Used inside MATCH or as expressions inside RETURN/ORDER:
| Function | Purpose | Underlying Lance facility |
|---|---|---|
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
| `search(field, q)` | Generic FTS | Inverted index |
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
| `match_text(field, q)` | Pattern match | Inverted index |
| `bm25(field, q)` | BM25 scoring | Inverted index |
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
## RETURN clause
`return { <expr> [as <alias>], … }` with expressions:
@ -48,7 +36,7 @@ Used inside MATCH or as expressions inside RETURN/ORDER:
- Literals: string, int, float, bool, list
- `now()`
- Aggregates: `count`, `sum`, `avg`, `min`, `max`
- All search functions above (so you can return a score column)
- [Search functions](../search/index.md) (so you can return a score column)
- `AliasRef` — re-use a previous projection alias
## ORDER & LIMIT
@ -58,21 +46,8 @@ Used inside MATCH or as expressions inside RETURN/ORDER:
- **Total, deterministic order.** Rows with equal user-sort keys are broken by the bound entities' key columns (`<var>.id`, ascending) appended as a final tie-break, so the result is a *total* order — reproducible across runs, and `order … limit N` returns a deterministic top-N even when ties straddle the cutoff. (Aggregate results have no entity-key columns; their group rows are already distinct on the projected group keys.)
- **NULL placement** is *nulls-first ascending, nulls-last descending* (i.e. `nulls_first = !descending`): a NULL sorts as if smaller than any value.
## Mutation statements
- `insert <Type> { prop: <value>, … }`
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
- `delete <Type> where <prop> <op> <value>`
`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
### D₂ — mixed insert/update + delete is rejected at parse time
A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance v6.0.1 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until the MR-A Lance v7 bump migrates `delete_where` to staged (`DeleteBuilder::execute_uncommitted` first ships in `v7.0.0-beta.10`), the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../../dev/writes.md), [docs/dev/lance.md](../../dev/lance.md), and [docs/dev/invariants.md](../../dev/invariants.md).
Write statements (`insert` / `update` / `delete`) are documented on the
[mutations](../mutations/index.md) page.
## IR (Intermediate Representation)

81
docs/user/quickstart.md Normal file
View file

@ -0,0 +1,81 @@
# Quickstart
This walks the core loop end to end: define a schema, initialize a graph, load
data, query it, and use a branch. It uses a local file-backed graph; swap the
path for an `s3://…` URI to run the same flow against object storage.
[Install](install.md) the `omnigraph` CLI first.
## 1. Write a schema
A schema (`.pg`) declares your node and edge types. Save this as `schema.pg`:
```
node Person {
name: String,
title: String?,
}
```
See the [schema language](schema/index.md) for types, constraints, and edges.
## 2. Initialize the graph
```bash
omnigraph init --schema schema.pg graph.omni
```
`init` creates an empty graph at the given URI with your schema applied.
## 3. Load data
`load` is the single bulk-write command. `--mode` is required
(`overwrite | append | merge`):
```bash
omnigraph load --data people.jsonl --mode overwrite graph.omni
```
`people.jsonl` is newline-delimited JSON, one record per line. For finer-grained
or inline writes, see [mutations](mutations/index.md).
## 4. Query
Write a query (`.gq`) — save as `queries.gq`:
```gq
query find_people($title: String) {
match { $p: Person { title: $title } }
return { $p.name }
}
```
Run it:
```bash
omnigraph read --query queries.gq --name find_people \
--params '{"title":"Engineer"}' --format table graph.omni
```
The [query language](queries/index.md) covers `match`/`return`/`order`, and
[search](search/index.md) covers vector and full-text search.
## 5. Work on a branch
Branches isolate changes until you merge them — Git-style, across the whole graph:
```bash
omnigraph branch create review/new-hires graph.omni
omnigraph load --data new-hires.jsonl --mode append --branch review/new-hires graph.omni
# inspect the branch, then integrate it
omnigraph branch merge review/new-hires --into main graph.omni
```
See [branches & commits](branching/index.md) and [merging](branching/merge.md).
## Next steps
- [CLI reference](cli/reference.md) — every command and flag.
- [Schema language](schema/index.md) and [query language](queries/index.md).
- [Operating a cluster](clusters/index.md) and [running the server](operations/server.md)
for multi-graph, multi-user deployments.

48
docs/user/search/index.md Normal file
View file

@ -0,0 +1,48 @@
# Search
OmniGraph runs vector, full-text, and hybrid search in the same runtime as graph
traversal — a single [query](../queries/index.md) can combine a vector `nearest`,
a `bm25` text score, and an `Expand` traversal. Search functions are used inside
`match` (to filter), or as expressions inside `return` / `order` (to score and
rank).
## Functions
| Function | Purpose | Backing index |
|---|---|---|
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | vector index (IVF / HNSW) |
| `search(field, q)` | Generic full-text search | inverted (FTS) index |
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | inverted index |
| `match_text(field, q)` | Pattern match | inverted index |
| `bm25(field, q)` | BM25 relevance scoring | inverted index |
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default `k=60`) | fuses scored rankings |
- `nearest()` requires a `limit`. The query vector is resolved from the param map,
or embedded from a text input at runtime via the configured
[embedding client](embeddings.md).
- Scores and ranks propagate as ordinary columns, so you can `return` a score and
`order` by it.
## Hybrid ranking with `rrf`
Reciprocal Rank Fusion combines two independent rankings (typically one vector and
one text) into a single fused ranking, without needing the two score scales to be
comparable. Rank each retrieval separately, then fuse:
```gq
query hybrid($q: String) {
match { $d: Document { } }
return {
$d,
rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score
}
order { score desc }
limit 10
}
```
## Indexes and embeddings
Search functions only work when the backing index exists — see
[indexes](indexes.md) for building vector and inverted indexes, and
[embeddings](embeddings.md) for generating the vectors `nearest` searches over.