mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-15 01:55:13 +02:00
docs(user): split language/branching pages + add front-door pages (Phase 2) (#225)
Content build-out on top of the Phase 1 topic move. No behavior changes. Splits (existing content relocated, cross-linked): - queries/index.md → mutations/index.md (insert/update/delete + the inserts-vs-deletes rule) and search/index.md (the multi-modal search functions + a hybrid-ranking overview tying nearest/bm25/rrf together). queries/index.md now covers the read shape and points at both. - branching/index.md → branching/time-travel.md (snapshots/time travel) and branching/merge.md (three-way merge + the 7 conflict kinds, verified against error.rs MergeConflictKind). New pages (written from the code, user-facing): - quickstart.md — init → load → query → branch, with verified CLI flags. - concepts/index.md — what OmniGraph is + the L1/L2 (Lance/OmniGraph) framing. Expanded operations/audit.md from a 7-line struct dump into a real actor-tracking page (server token-resolved vs CLI --as chain; reading the trail; the omnigraph:recovery reserved actor). Index wiring: docs/user/index.md and AGENTS.md's topic table link every new page; also normalized AGENTS.md's docs/user link display text to match the Phase 1 retargeted paths. Verified: zero broken .md links; check-agents-md.sh green (57 links, 54 docs). Deferred to Phase 3: de-dev polish (grammar paths, IR internals still in queries/branching), guides/, and a possible reference/config.md split (the config schema is already coherent in cli/reference.md). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
d46e50dd6d
commit
612741b387
11 changed files with 399 additions and 67 deletions
|
|
@ -43,11 +43,9 @@ Notes:
|
|||
|
||||
## L2 — Snapshots & time travel
|
||||
|
||||
- `snapshot()` — current snapshot for the bound branch; cached.
|
||||
- `snapshot_of(target)` — snapshot at a `ReadTarget` (branch | snapshot id).
|
||||
- `snapshot_at_version(v: u64)` — historical snapshot from any manifest version.
|
||||
- `entity_at(table_key, id, version)` — single-entity time travel without building a full snapshot.
|
||||
- A `Snapshot` is a `(version, HashMap<table_key, SubTableEntry>)` — cheap to build, snapshot-isolated cross-table reads.
|
||||
Reading a branch at a past version, or a single entity at a past version, is
|
||||
covered on the [time travel](time-travel.md) page. Merging branches and the
|
||||
conflict kinds are on the [merge](merge.md) page.
|
||||
|
||||
## L2 — Internal system branches
|
||||
|
||||
|
|
|
|||
47
docs/user/branching/merge.md
Normal file
47
docs/user/branching/merge.md
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
# Merging Branches
|
||||
|
||||
Merging integrates the changes on one branch into another. OmniGraph merges are
|
||||
**three-way and row-level**: it compares both branches against their common
|
||||
ancestor and merges each node/edge table row by row, then publishes the result as
|
||||
**one atomic commit** across the whole graph.
|
||||
|
||||
```bash
|
||||
omnigraph branch merge review/2026-04-25 --into main s3://bucket/graph.omni
|
||||
```
|
||||
|
||||
`branch merge <source> [--into <target>]` merges `<source>` into `<target>`
|
||||
(default `main`).
|
||||
|
||||
## Outcomes
|
||||
|
||||
A merge resolves to one of three outcomes:
|
||||
|
||||
- **Already up to date** — the target already contains every change on the source;
|
||||
nothing to do.
|
||||
- **Fast-forward** — the target has no changes the source lacks, so the target
|
||||
simply advances to the source.
|
||||
- **Merged** — both sides diverged; a new merge commit is created with two parents.
|
||||
|
||||
## Conflicts
|
||||
|
||||
When both branches changed the same data incompatibly, the merge fails with a
|
||||
structured list of conflicts (the HTTP server returns `409` with a
|
||||
`merge_conflicts[]` array). No partial result is published — the merge is
|
||||
all-or-nothing. The conflict kinds are:
|
||||
|
||||
| Kind | Meaning |
|
||||
|---|---|
|
||||
| `DivergentInsert` | The same id was inserted on both branches. |
|
||||
| `DivergentUpdate` | The same row was updated differently on both branches. |
|
||||
| `DeleteVsUpdate` | One side deleted a row the other side updated. |
|
||||
| `OrphanEdge` | An edge references a node the other side deleted. |
|
||||
| `UniqueViolation` | The merged result would violate a unique constraint. |
|
||||
| `CardinalityViolation` | The merged result would violate an edge cardinality constraint. |
|
||||
| `ValueConstraintViolation` | The merged result would violate a value constraint (enum/range). |
|
||||
|
||||
Each conflict carries the table, the row id (when applicable), the kind, and a
|
||||
message. Resolve conflicts by reconciling the two branches — typically by making
|
||||
the conflicting change on one side and re-merging.
|
||||
|
||||
See [branches & commits](index.md) for the branch and commit-DAG model, and
|
||||
[changes](changes.md) for diffing two branches before you merge.
|
||||
31
docs/user/branching/time-travel.md
Normal file
31
docs/user/branching/time-travel.md
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
# Snapshots & Time Travel
|
||||
|
||||
Every read in OmniGraph happens against a **snapshot** — a consistent, cross-table
|
||||
view of the graph at one manifest version. A query holds one snapshot for its whole
|
||||
lifetime, so it never sees a partial write from a concurrent commit (see
|
||||
[transactions](transactions.md)).
|
||||
|
||||
## Reading the past
|
||||
|
||||
- **Current head** — by default a read targets the current head of the bound branch.
|
||||
- **By snapshot id** — read a branch or a specific snapshot id (`--snapshot` on
|
||||
`omnigraph read`).
|
||||
- **By version** — reconstruct a historical snapshot from any past manifest version.
|
||||
- **Single entity** — look up one entity at a past version without building a full
|
||||
snapshot (cheaper when you only need one node or edge).
|
||||
|
||||
Snapshots are cheap to build: a snapshot is just the set of visible sub-table
|
||||
versions at a manifest version, so cross-table reads stay snapshot-isolated.
|
||||
|
||||
## CLI
|
||||
|
||||
```bash
|
||||
# Read a query against a past snapshot
|
||||
omnigraph read --query ./q.gq --name find --snapshot <snapshot-id> s3://bucket/graph.omni
|
||||
```
|
||||
|
||||
Time travel composes with branches: every branch has its own version history, and
|
||||
you can read any branch at any of its past versions. Commits and the commit DAG
|
||||
that these versions correspond to are described in
|
||||
[branches & commits](index.md); diffing two versions is on the
|
||||
[changes](changes.md) page.
|
||||
49
docs/user/concepts/index.md
Normal file
49
docs/user/concepts/index.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# Concepts
|
||||
|
||||
OmniGraph is a typed property-graph engine built as a coordination layer over the
|
||||
[Lance](https://lance.org) columnar storage format. It gives you a schema-checked
|
||||
graph with vector, full-text, and graph queries in one runtime, plus Git-style
|
||||
branches and commits across the whole graph.
|
||||
|
||||
## The data model
|
||||
|
||||
- A graph has **node types** and **edge types**, declared in a
|
||||
[schema](../schema/index.md).
|
||||
- Each node type and each edge type is stored as its **own Lance dataset** —
|
||||
columnar, versioned, on local disk or object storage.
|
||||
- A single `__manifest` table coordinates all of those datasets, so the graph has
|
||||
one coherent version even though it spans many datasets.
|
||||
|
||||
This split is what lets a graph commit be **atomic across every type at once**: a
|
||||
publish flips every relevant dataset's version together in one manifest write, so
|
||||
readers never see a half-applied change. See [storage](storage.md) for the layout.
|
||||
|
||||
## Two layers: inherited vs. added
|
||||
|
||||
Throughout the docs, capabilities are framed as **L1** (inherited from Lance) or
|
||||
**L2** (added by OmniGraph):
|
||||
|
||||
| | L1 — from Lance | L2 — added by OmniGraph |
|
||||
|---|---|---|
|
||||
| Storage | Columnar Arrow datasets on object storage | Per-type datasets coordinated as one graph |
|
||||
| Versioning | Per-dataset versions + time travel | [Snapshots](../branching/time-travel.md) across all types at once |
|
||||
| Branches | Per-dataset branches | [Graph-level branches](../branching/index.md), atomic across types |
|
||||
| Commits | Per-dataset commits | [Commit DAG](../branching/index.md) for the whole graph; three-way [merge](../branching/merge.md) |
|
||||
| Indexes | Scalar / vector / full-text indexes | Built per relevant column; graph topology index for traversal |
|
||||
| Search | Vector + full-text primitives | [`nearest` / `bm25` / `rrf`](../search/index.md) in one query, plus graph traversal |
|
||||
| Querying | — | The [`.gq` query language](../queries/index.md) and [`.pg` schema language](../schema/index.md) |
|
||||
|
||||
## How the pieces fit
|
||||
|
||||
- The **schema** (`.pg`) and **query** (`.gq`) languages are compiled to a typed
|
||||
intermediate representation.
|
||||
- The **engine** runs queries and mutations against Lance, coordinates the manifest,
|
||||
maintains the commit graph, and builds indexes.
|
||||
- The **CLI** ([`omnigraph`](../cli/index.md)) and the
|
||||
**HTTP server** ([`operations/server.md`](../operations/server.md)) are two front
|
||||
ends over the same engine, so embedded and remote behavior match.
|
||||
- [Cedar policy](../operations/policy.md) enforcement is engine-wide — every writer
|
||||
goes through the same authorization gate regardless of front end.
|
||||
|
||||
For deployment-scale topics — multi-graph servers, control-plane operations,
|
||||
recovery — see [clusters](../clusters/index.md).
|
||||
|
|
@ -12,6 +12,8 @@ start with install, then follow the section that matches your task.
|
|||
| Goal | Read |
|
||||
|---|---|
|
||||
| Install OmniGraph | [install.md](install.md) |
|
||||
| Run the core loop end to end | [quickstart.md](quickstart.md) |
|
||||
| Understand the model | [concepts/index.md](concepts/index.md) |
|
||||
| Run the CLI | [cli/index.md](cli/index.md) |
|
||||
| Look up every CLI flag and config field | [cli/reference.md](cli/reference.md) |
|
||||
|
||||
|
|
@ -21,8 +23,9 @@ start with install, then follow the section that matches your task.
|
|||
|---|---|
|
||||
| Write schemas (the `.pg` language) | [schema/index.md](schema/index.md) |
|
||||
| Read schema-lint diagnostic codes | [schema/lint.md](schema/lint.md) |
|
||||
| Write queries and mutations (the `.gq` language) | [queries/index.md](queries/index.md) |
|
||||
| Use vector / full-text / hybrid search | [search/indexes.md](search/indexes.md) |
|
||||
| Write queries (the `.gq` language) | [queries/index.md](queries/index.md) |
|
||||
| Write data — inserts, updates, deletes | [mutations/index.md](mutations/index.md) |
|
||||
| Use vector / full-text / hybrid search | [search/index.md](search/index.md) |
|
||||
| Generate embeddings | [search/embeddings.md](search/embeddings.md) |
|
||||
| Build and use indexes | [search/indexes.md](search/indexes.md) |
|
||||
|
||||
|
|
@ -30,7 +33,9 @@ start with install, then follow the section that matches your task.
|
|||
|
||||
| Goal | Read |
|
||||
|---|---|
|
||||
| Work with branches, commits, and snapshots | [branching/index.md](branching/index.md) |
|
||||
| Work with branches and commits | [branching/index.md](branching/index.md) |
|
||||
| Read past versions (time travel) | [branching/time-travel.md](branching/time-travel.md) |
|
||||
| Merge branches and resolve conflicts | [branching/merge.md](branching/merge.md) |
|
||||
| Coordinate multi-query workflows | [branching/transactions.md](branching/transactions.md) |
|
||||
| Read diffs and change feeds | [branching/changes.md](branching/changes.md) |
|
||||
|
||||
|
|
@ -56,6 +61,7 @@ start with install, then follow the section that matches your task.
|
|||
|
||||
| Goal | Read |
|
||||
|---|---|
|
||||
| Understand the model and L1/L2 framing | [concepts/index.md](concepts/index.md) |
|
||||
| Understand graph layout and URI support | [concepts/storage.md](concepts/storage.md) |
|
||||
| Look up constants and tunables | [reference/constants.md](reference/constants.md) |
|
||||
|
||||
|
|
|
|||
52
docs/user/mutations/index.md
Normal file
52
docs/user/mutations/index.md
Normal file
|
|
@ -0,0 +1,52 @@
|
|||
# Mutations
|
||||
|
||||
Write statements live inside a `query` declaration whose body is one or more
|
||||
mutation statements (the [query language](../queries/index.md) covers the read
|
||||
shape and shared declaration syntax).
|
||||
|
||||
```
|
||||
query onboard($name: String, $title: String) {
|
||||
insert Person { name: $name, title: $title }
|
||||
}
|
||||
```
|
||||
|
||||
An edge type is inserted the same way — its endpoint columns are just
|
||||
properties in the assignment block (`insert WorksAt { person: $p, org: $o }`).
|
||||
|
||||
## Statements
|
||||
|
||||
- `insert <Type> { prop: <value>, … }`
|
||||
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
|
||||
- `delete <Type> where <prop> <op> <value>`
|
||||
|
||||
`<value>` is a literal, `$param`, or `now()`.
|
||||
|
||||
## Atomicity
|
||||
|
||||
A change query publishes **one commit** at the end of the query. Multiple
|
||||
insert/update statements accumulate in memory and commit together — a mid-query
|
||||
failure leaves the graph untouched. See [transactions](../branching/transactions.md)
|
||||
for the per-query atomicity contract and [branches](../branching/index.md) for
|
||||
multi-query workflows.
|
||||
|
||||
## Inserts/updates and deletes cannot mix in one query
|
||||
|
||||
A single change query must be **either insert/update-only or delete-only**.
|
||||
Mixing the two is rejected at parse time, before any I/O:
|
||||
|
||||
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split
|
||||
> into separate mutations: (1) inserts and updates, then (2) deletes.`
|
||||
|
||||
Run two separate queries instead — the inserts/updates first, then the deletes.
|
||||
The restriction exists because inserts/updates and deletes commit through
|
||||
different paths today, and mixing them in one query creates ordering hazards
|
||||
(e.g. a same-row insert-then-delete, or a cascading delete of a just-inserted
|
||||
edge). Keeping the two kinds in separate queries keeps each one atomic and
|
||||
correct.
|
||||
|
||||
## Bulk loading
|
||||
|
||||
For loading data from files rather than inline statements, use
|
||||
[`omnigraph load`](../cli/index.md) (`--mode overwrite|append|merge`) — it is the
|
||||
single bulk-write command and applies the same schema validation and atomic
|
||||
publish as inline mutations.
|
||||
|
|
@ -1,7 +1,46 @@
|
|||
# Audit / Actor tracking
|
||||
# Audit & Actor Tracking
|
||||
|
||||
- `Omnigraph::audit_actor_id: Option<String>` is the actor in effect.
|
||||
- `_as` variants of every write API let callers override the actor: `mutate_as`, `load_as`, `branch_merge_as`, `apply_schema_as`, etc.
|
||||
- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map).
|
||||
- HTTP server uses the bearer-token actor automatically. The CLI resolves one actor chain everywhere: `--as` > legacy `cli.actor` in `omnigraph.yaml` > `operator.actor` in `~/.omnigraph/config.yaml` > none (RFC-007).
|
||||
- Pre-v0.4.0 graphs also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0. The v2→v3 manifest migration sweeps any stale `__run__*` branches on first write-open (MR-770); the inert dataset bytes remain until a `delete_prefix` primitive lands.
|
||||
Every write in OmniGraph records **who made it**. The actor id is persisted on the
|
||||
graph commit, so the commit history is an audit trail of which actor changed the
|
||||
graph and when.
|
||||
|
||||
## Where the actor comes from
|
||||
|
||||
The actor is resolved differently depending on the front end, but it always lands
|
||||
on the commit:
|
||||
|
||||
- **HTTP server** — the actor is resolved **server-side from the bearer token**. A
|
||||
client cannot set its own actor id; it is derived from the authenticated token.
|
||||
See [policy](policy.md) for how tokens map to actors.
|
||||
- **CLI / embedded** — the actor is self-declared through one resolution chain:
|
||||
|
||||
1. `--as <actor>` on the command,
|
||||
2. then `operator.actor` in `~/.omnigraph/config.yaml` (see the
|
||||
[CLI reference](../cli/reference.md)),
|
||||
3. otherwise none.
|
||||
|
||||
This difference is intentional: storage credentials imply a self-declared actor,
|
||||
while a server resolves the actor from a token it trusts.
|
||||
|
||||
## Reading the audit trail
|
||||
|
||||
Actor ids are stored on each commit in the [commit graph](../branching/index.md).
|
||||
List commits to see who made each change:
|
||||
|
||||
```bash
|
||||
omnigraph commit list graph.omni
|
||||
```
|
||||
|
||||
System-initiated writes use reserved actor ids — for example, automatic recovery
|
||||
of an interrupted write records `omnigraph:recovery`, so operator changes and
|
||||
machine repairs are distinguishable in the history:
|
||||
|
||||
```bash
|
||||
omnigraph commit list --filter actor=omnigraph:recovery graph.omni
|
||||
```
|
||||
|
||||
## What is tracked
|
||||
|
||||
Every successful publish — load, change, branch merge, and schema apply — appends a
|
||||
commit carrying the resolving actor. Because publishes are atomic, the actor on a
|
||||
commit is exactly the actor responsible for that whole change.
|
||||
|
|
|
|||
|
|
@ -13,8 +13,11 @@ query <name>($p1: T1, $p2: T2?, …)
|
|||
|
||||
Two body shapes:
|
||||
|
||||
- **Read**: `match { … } return { … } [order { … }] [limit N]`
|
||||
- **Mutation**: one or more of `insert | update | delete` statements
|
||||
- **Read**: `match { … } return { … } [order { … }] [limit N]` — covered on this page.
|
||||
- **Mutation**: one or more of `insert | update | delete` statements — see [mutations](../mutations/index.md).
|
||||
|
||||
Multi-modal search functions (`nearest`, `bm25`, `rrf`, …) used inside `match`,
|
||||
`return`, and `order` are documented on the [search](../search/index.md) page.
|
||||
|
||||
Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`.
|
||||
|
||||
|
|
@ -25,21 +28,6 @@ Param types reuse all schema scalars; trailing `?` makes a param optional. The c
|
|||
- **Filter**: `<expr> <op> <expr>` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`.
|
||||
- **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline.
|
||||
|
||||
## Search clauses (multi-modal)
|
||||
|
||||
Used inside MATCH or as expressions inside RETURN/ORDER:
|
||||
|
||||
| Function | Purpose | Underlying Lance facility |
|
||||
|---|---|---|
|
||||
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) |
|
||||
| `search(field, q)` | Generic FTS | Inverted index |
|
||||
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index |
|
||||
| `match_text(field, q)` | Pattern match | Inverted index |
|
||||
| `bm25(field, q)` | BM25 scoring | Inverted index |
|
||||
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings |
|
||||
|
||||
`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input).
|
||||
|
||||
## RETURN clause
|
||||
|
||||
`return { <expr> [as <alias>], … }` with expressions:
|
||||
|
|
@ -48,7 +36,7 @@ Used inside MATCH or as expressions inside RETURN/ORDER:
|
|||
- Literals: string, int, float, bool, list
|
||||
- `now()`
|
||||
- Aggregates: `count`, `sum`, `avg`, `min`, `max`
|
||||
- All search functions above (so you can return a score column)
|
||||
- [Search functions](../search/index.md) (so you can return a score column)
|
||||
- `AliasRef` — re-use a previous projection alias
|
||||
|
||||
## ORDER & LIMIT
|
||||
|
|
@ -58,21 +46,8 @@ Used inside MATCH or as expressions inside RETURN/ORDER:
|
|||
- **Total, deterministic order.** Rows with equal user-sort keys are broken by the bound entities' key columns (`<var>.id`, ascending) appended as a final tie-break, so the result is a *total* order — reproducible across runs, and `order … limit N` returns a deterministic top-N even when ties straddle the cutoff. (Aggregate results have no entity-key columns; their group rows are already distinct on the projected group keys.)
|
||||
- **NULL placement** is *nulls-first ascending, nulls-last descending* (i.e. `nulls_first = !descending`): a NULL sorts as if smaller than any value.
|
||||
|
||||
## Mutation statements
|
||||
|
||||
- `insert <Type> { prop: <value>, … }`
|
||||
- `update <Type> set { prop: <value>, … } where <prop> <op> <value>`
|
||||
- `delete <Type> where <prop> <op> <value>`
|
||||
|
||||
`<value>` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0).
|
||||
|
||||
### D₂ — mixed insert/update + delete is rejected at parse time
|
||||
|
||||
A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message:
|
||||
|
||||
> `mutation '<name>' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).`
|
||||
|
||||
Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance v6.0.1 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until the MR-A Lance v7 bump migrates `delete_where` to staged (`DeleteBuilder::execute_uncommitted` first ships in `v7.0.0-beta.10`), the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../../dev/writes.md), [docs/dev/lance.md](../../dev/lance.md), and [docs/dev/invariants.md](../../dev/invariants.md).
|
||||
Write statements (`insert` / `update` / `delete`) are documented on the
|
||||
[mutations](../mutations/index.md) page.
|
||||
|
||||
## IR (Intermediate Representation)
|
||||
|
||||
|
|
|
|||
81
docs/user/quickstart.md
Normal file
81
docs/user/quickstart.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
# Quickstart
|
||||
|
||||
This walks the core loop end to end: define a schema, initialize a graph, load
|
||||
data, query it, and use a branch. It uses a local file-backed graph; swap the
|
||||
path for an `s3://…` URI to run the same flow against object storage.
|
||||
|
||||
[Install](install.md) the `omnigraph` CLI first.
|
||||
|
||||
## 1. Write a schema
|
||||
|
||||
A schema (`.pg`) declares your node and edge types. Save this as `schema.pg`:
|
||||
|
||||
```
|
||||
node Person {
|
||||
name: String,
|
||||
title: String?,
|
||||
}
|
||||
```
|
||||
|
||||
See the [schema language](schema/index.md) for types, constraints, and edges.
|
||||
|
||||
## 2. Initialize the graph
|
||||
|
||||
```bash
|
||||
omnigraph init --schema schema.pg graph.omni
|
||||
```
|
||||
|
||||
`init` creates an empty graph at the given URI with your schema applied.
|
||||
|
||||
## 3. Load data
|
||||
|
||||
`load` is the single bulk-write command. `--mode` is required
|
||||
(`overwrite | append | merge`):
|
||||
|
||||
```bash
|
||||
omnigraph load --data people.jsonl --mode overwrite graph.omni
|
||||
```
|
||||
|
||||
`people.jsonl` is newline-delimited JSON, one record per line. For finer-grained
|
||||
or inline writes, see [mutations](mutations/index.md).
|
||||
|
||||
## 4. Query
|
||||
|
||||
Write a query (`.gq`) — save as `queries.gq`:
|
||||
|
||||
```gq
|
||||
query find_people($title: String) {
|
||||
match { $p: Person { title: $title } }
|
||||
return { $p.name }
|
||||
}
|
||||
```
|
||||
|
||||
Run it:
|
||||
|
||||
```bash
|
||||
omnigraph read --query queries.gq --name find_people \
|
||||
--params '{"title":"Engineer"}' --format table graph.omni
|
||||
```
|
||||
|
||||
The [query language](queries/index.md) covers `match`/`return`/`order`, and
|
||||
[search](search/index.md) covers vector and full-text search.
|
||||
|
||||
## 5. Work on a branch
|
||||
|
||||
Branches isolate changes until you merge them — Git-style, across the whole graph:
|
||||
|
||||
```bash
|
||||
omnigraph branch create review/new-hires graph.omni
|
||||
omnigraph load --data new-hires.jsonl --mode append --branch review/new-hires graph.omni
|
||||
# inspect the branch, then integrate it
|
||||
omnigraph branch merge review/new-hires --into main graph.omni
|
||||
```
|
||||
|
||||
See [branches & commits](branching/index.md) and [merging](branching/merge.md).
|
||||
|
||||
## Next steps
|
||||
|
||||
- [CLI reference](cli/reference.md) — every command and flag.
|
||||
- [Schema language](schema/index.md) and [query language](queries/index.md).
|
||||
- [Operating a cluster](clusters/index.md) and [running the server](operations/server.md)
|
||||
for multi-graph, multi-user deployments.
|
||||
48
docs/user/search/index.md
Normal file
48
docs/user/search/index.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# Search
|
||||
|
||||
OmniGraph runs vector, full-text, and hybrid search in the same runtime as graph
|
||||
traversal — a single [query](../queries/index.md) can combine a vector `nearest`,
|
||||
a `bm25` text score, and an `Expand` traversal. Search functions are used inside
|
||||
`match` (to filter), or as expressions inside `return` / `order` (to score and
|
||||
rank).
|
||||
|
||||
## Functions
|
||||
|
||||
| Function | Purpose | Backing index |
|
||||
|---|---|---|
|
||||
| `nearest($x.vec, $q)` | k-NN vector search (cosine) | vector index (IVF / HNSW) |
|
||||
| `search(field, q)` | Generic full-text search | inverted (FTS) index |
|
||||
| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | inverted index |
|
||||
| `match_text(field, q)` | Pattern match | inverted index |
|
||||
| `bm25(field, q)` | BM25 relevance scoring | inverted index |
|
||||
| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default `k=60`) | fuses scored rankings |
|
||||
|
||||
- `nearest()` requires a `limit`. The query vector is resolved from the param map,
|
||||
or embedded from a text input at runtime via the configured
|
||||
[embedding client](embeddings.md).
|
||||
- Scores and ranks propagate as ordinary columns, so you can `return` a score and
|
||||
`order` by it.
|
||||
|
||||
## Hybrid ranking with `rrf`
|
||||
|
||||
Reciprocal Rank Fusion combines two independent rankings (typically one vector and
|
||||
one text) into a single fused ranking, without needing the two score scales to be
|
||||
comparable. Rank each retrieval separately, then fuse:
|
||||
|
||||
```gq
|
||||
query hybrid($q: String) {
|
||||
match { $d: Document { } }
|
||||
return {
|
||||
$d,
|
||||
rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score
|
||||
}
|
||||
order { score desc }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
## Indexes and embeddings
|
||||
|
||||
Search functions only work when the backing index exists — see
|
||||
[indexes](indexes.md) for building vector and inverted indexes, and
|
||||
[embeddings](embeddings.md) for generating the vectors `nearest` searches over.
|
||||
Loading…
Add table
Add a link
Reference in a new issue