mirror of
https://github.com/ModernRelay/omnigraph.git
synced 2026-06-18 02:24:27 +02:00
docs: onboarding-first README + in-repo agent skill + drop RustFS script (#257)
* docs: optimize README for dev onboarding; fix 0.7.0 staleness The README's setup half drifted from the shipped 0.7.0 CLI and led with the heaviest path (Docker + RustFS). This reworks it for fast, correct onboarding: README.md - New zero-dependency "Your first graph in 60 seconds" hero: a fully copy-pasteable local file-backed loop (schema → init → load → query → branch). - Add a correct "Serve it" section (cluster apply + omnigraph-server --cluster); the server is cluster-only on main, so the old positional-URI boot is gone. - Demote the RustFS bootstrap to "rehearse the S3 path locally"; reframe the storage bullet as "filesystem or any S3-compatible store (AWS S3, R2, MinIO, RustFS)" — RustFS is a provider, not a storage class. - Fix crate/MCP descriptions (query/mutate/load, not read/change/ingest). docs/user/quickstart.md - Fix the query example: `read --name <q> … <uri>` is removed — the query name is positional and the graph is addressed with `--store` (`omnigraph query find_people --query queries.gq --store graph.omni`). scripts/local-rustfs-bootstrap.sh - Convert to cluster mode: write a cluster.yaml (storage: s3://…), then validate → import → apply, load the fixture into the derived root with the now-required --mode, and serve with `omnigraph-server --cluster`. The old flow (`load` without --mode, `omnigraph-server <URI>` positional boot) no longer works on a cluster-only server. * docs: move agent skill into the repo, add agent-setup snippet, drop rustfs script skills/omnigraph - The operational skill (formerly `omnigraph-best-practices` in the cookbooks repo) now lives with the engine it documents, co-versioned. Renamed to `omnigraph`; repository metadata repointed here. - Broadened the description to trigger on intent — storing/retrieving/querying knowledge, agent memory, building a knowledge graph, operating Omnigraph — as well as on CLI/artifact sightings (stays ≤1024 chars). - Install: `npx skills add ModernRelay/omnigraph@omnigraph`. README - New "Set it up with an AI agent" paste snippet: installs the skill, reads the docs (URL), browses the cookbooks, and asks the user about a use case before standing up a first graph. - "Agent skill & starter graphs" section points at skills/omnigraph + cookbooks. Drop scripts/local-rustfs-bootstrap.sh - Not CI-tested (so it rotted: it broke on the cluster-only migration — positional server boot, load without --mode), demoed the now-optional S3 path, and was the most fragile artifact in the repo. Replaced with a "Testing against S3 locally" guide in deployment.md (docker run RustFS/MinIO + AWS_* env + cluster-on-S3). README/AGENTS references updated.
This commit is contained in:
parent
05cb73eda6
commit
ee4986e9a1
17 changed files with 2297 additions and 481 deletions
|
|
@ -100,7 +100,7 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec
|
|||
| Audit / actor tracking | [docs/user/operations/audit.md](docs/user/operations/audit.md) |
|
||||
| Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) |
|
||||
| Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) |
|
||||
| Deployment (binary / container / RustFS bootstrap / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
|
||||
| Deployment (binary / container / S3-local testing / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) |
|
||||
| CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) |
|
||||
| Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) |
|
||||
| Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) |
|
||||
|
|
@ -192,7 +192,7 @@ cargo test -p omnigraph-engine --features failpoints --test failpoints # fault
|
|||
cargo build -p omnigraph-server --features aws # AWS Secrets Manager bearer-token source
|
||||
```
|
||||
|
||||
S3-backed tests (`s3_storage`, and the S3 paths in server/CLI system tests) **skip** unless `OMNIGRAPH_S3_TEST_BUCKET` + `AWS_*` (incl. `AWS_ENDPOINT_URL_S3` for non-AWS) are set; CI runs them against containerized RustFS. `scripts/local-rustfs-bootstrap.sh` stands up a local S3 environment.
|
||||
S3-backed tests (`s3_storage`, and the S3 paths in server/CLI system tests) **skip** unless `OMNIGRAPH_S3_TEST_BUCKET` + `AWS_*` (incl. `AWS_ENDPOINT_URL_S3` for non-AWS) are set; CI runs them against containerized RustFS. To run RustFS/MinIO yourself, see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally*.
|
||||
|
||||
CI does **not** run `clippy` or `rustfmt` as gates — but `cargo test --workspace --locked` is the exact gate, so run it before pushing. Two non-test CI checks: `scripts/check-agents-md.sh` (doc cross-link integrity — run it after moving/renaming docs) and OpenAPI drift (`crates/omnigraph-server/tests/openapi.rs` regenerates `openapi.json`; set `OMNIGRAPH_UPDATE_OPENAPI=1` to update the checked-in copy when a server/API change is intentional).
|
||||
|
||||
|
|
@ -268,7 +268,8 @@ omnigraph policy explain --cluster ./company-brain --graph knowledge --actor act
|
|||
| HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **cluster-only boot (RFC-011): always `--cluster <dir | s3://…>`, serving N graphs (N ≥ 1) under multi-graph routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Add/remove graphs via `cluster apply` and restart.** |
|
||||
| CLI with config | — | two-surface config (team `cluster.yaml` dir + per-operator `~/.omnigraph/config.yaml`), scope addressing (`--store`/`--server`/`--cluster`/`--profile`/defaults, RFC-011), aliases, multi-format output (json/jsonl/csv/kv/table) |
|
||||
| Audit / actor tracking | — | `_as` write APIs + actor map in commit graph |
|
||||
| Local RustFS bootstrap | — | `scripts/local-rustfs-bootstrap.sh` one-shot S3-backed dev environment |
|
||||
| Local S3 testing | — | run RustFS/MinIO + the `AWS_*` env; see [docs/user/deployment.md](docs/user/deployment.md) → *Testing against S3 locally* |
|
||||
| Agent skill | — | `skills/omnigraph` — operational playbook for driving Omnigraph; install with `npx skills add ModernRelay/omnigraph@omnigraph` |
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
48
README.md
48
README.md
|
|
@ -12,7 +12,7 @@ Hundreds of agents can enrich the graph on parallel isolated branches and change
|
|||
|
||||
- Git-style versioning & branching
|
||||
- Multimodal retrieval (graph+vector/fts+filters) optimized for context assembly
|
||||
- Object storage native (S3, RustFS)
|
||||
- Runs on the local filesystem or any S3-compatible object store (AWS S3, R2, MinIO, RustFS)
|
||||
- Native blob-as-data support (docs, images, videos, etc)
|
||||
- VPC, On-prem, hybrid deployment
|
||||
- [`Lance`](https://github.com/lance-format/lance) format as open storage layer
|
||||
|
|
@ -52,29 +52,45 @@ brew tap ModernRelay/tap
|
|||
brew install ModernRelay/tap/omnigraph
|
||||
```
|
||||
|
||||
For starter graphs and agent skills to bootstrap and operate Omnigraph, see [`ModernRelay/omnigraph-cookbooks`](https://github.com/ModernRelay/omnigraph-cookbooks).
|
||||
## Set it up with an AI agent
|
||||
|
||||
## One-Command Local RustFS Bootstrap
|
||||
Omnigraph is built to be set up by coding agents. Paste this into Claude Code,
|
||||
Cursor, or any agent that can read a URL, install a package, and run a shell
|
||||
command — it installs the skill, reads the docs, and walks you through setup for
|
||||
your use case:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash
|
||||
```text
|
||||
Help me set up Omnigraph (a lakehouse-native graph engine for agents).
|
||||
|
||||
1. Install the Omnigraph skill so you operate it correctly:
|
||||
npx skills add ModernRelay/omnigraph@omnigraph
|
||||
2. Read the docs at https://github.com/ModernRelay/omnigraph — start with
|
||||
docs/user/quickstart.md, then docs/user/clusters/index.md.
|
||||
3. Skim the starter graphs and seed data in the cookbooks:
|
||||
https://github.com/ModernRelay/omnigraph-cookbooks
|
||||
4. Ask me what I want to build (company brain, agent memory, dev graph,
|
||||
research / R&D layer, …). Then install the CLI, stand up a first graph for
|
||||
that use case, load a little data, and run a query so I can see it working.
|
||||
```
|
||||
|
||||
That bootstrap:
|
||||
Works with any agent that can browse a URL, install a package, and run a shell.
|
||||
|
||||
- starts RustFS on `127.0.0.1:9000`
|
||||
- creates a bucket and S3-backed graph
|
||||
- loads the checked-in context fixture
|
||||
- launches `omnigraph-server` on `127.0.0.1:8080`
|
||||
## Agent skill & starter graphs
|
||||
|
||||
Docker must be installed and running first.
|
||||
This repo ships the [**`omnigraph` agent skill**](skills/omnigraph) — the
|
||||
operational playbook (cluster mode, the two config surfaces, schema evolution,
|
||||
query linting, data writes, branches, Cedar policy, and common gotchas) that
|
||||
teaches a coding agent to drive Omnigraph correctly. Install it with:
|
||||
|
||||
The RustFS bootstrap prefers the rolling `edge` binaries and only falls back to
|
||||
source builds when release assets are unavailable.
|
||||
```bash
|
||||
npx skills add ModernRelay/omnigraph@omnigraph
|
||||
```
|
||||
|
||||
If a previous run left objects under the same graph prefix but did not finish
|
||||
initializing the graph, rerun with `RESET_REPO=1` or set `PREFIX` to a new
|
||||
value.
|
||||
For ready-to-run graphs with real seed data (company brain, VC operating system,
|
||||
pharma & industry intel),
|
||||
[`ModernRelay/omnigraph-cookbooks`](https://github.com/ModernRelay/omnigraph-cookbooks)
|
||||
is the fastest way to see Omnigraph shaped to a real domain. To rehearse the S3
|
||||
path locally, see [deployment.md → Testing against S3 locally](docs/user/deployment.md#testing-against-s3-locally).
|
||||
|
||||
## Common Commands
|
||||
|
||||
|
|
|
|||
|
|
@ -129,49 +129,46 @@ shape above) — the simplest AWS architecture.
|
|||
unvalidated** — boot is lock-free read-only so it should compose, but it
|
||||
is not yet exercised by tests.
|
||||
|
||||
## One-Command Local RustFS Bootstrap
|
||||
## Testing against S3 locally
|
||||
|
||||
The easiest local S3-backed deployment path is:
|
||||
To exercise the S3 storage path without a cloud account, run any S3-compatible
|
||||
store in Docker and point the standard `AWS_*` environment at it. RustFS is
|
||||
shown; MinIO works the same way.
|
||||
|
||||
```bash
|
||||
curl -fsSL https://raw.githubusercontent.com/ModernRelay/omnigraph/main/scripts/local-rustfs-bootstrap.sh | bash
|
||||
docker run -d --name omnigraph-s3 -p 9000:9000 \
|
||||
-e RUSTFS_ACCESS_KEY=omnigraph -e RUSTFS_SECRET_KEY=omnigraph \
|
||||
-e RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true \
|
||||
rustfs/rustfs:latest /data
|
||||
|
||||
export AWS_ACCESS_KEY_ID=omnigraph AWS_SECRET_ACCESS_KEY=omnigraph \
|
||||
AWS_REGION=us-east-1 AWS_ENDPOINT_URL_S3=http://127.0.0.1:9000 \
|
||||
AWS_ALLOW_HTTP=true AWS_S3_FORCE_PATH_STYLE=true
|
||||
|
||||
# create the bucket once (any S3 client works)
|
||||
aws --endpoint-url "$AWS_ENDPOINT_URL_S3" s3 mb s3://omnigraph-local
|
||||
```
|
||||
|
||||
The bootstrap:
|
||||
Now an `s3://…` URI works anywhere a graph or cluster root is expected. Root a
|
||||
cluster on the bucket and serve it config-free:
|
||||
|
||||
- starts a local RustFS-backed object store
|
||||
- creates a bucket and S3-backed Omnigraph graph
|
||||
- loads the checked-in context fixture
|
||||
- starts `omnigraph-server` on `127.0.0.1:8080`
|
||||
```bash
|
||||
# cluster.yaml
|
||||
# version: 1
|
||||
# storage: s3://omnigraph-local/clusters/demo
|
||||
# graphs: { demo: { schema: schema.pg } }
|
||||
|
||||
Supported behavior:
|
||||
omnigraph cluster validate --config .
|
||||
omnigraph cluster import --config .
|
||||
omnigraph cluster apply --config . --as you
|
||||
omnigraph load --data seed.jsonl --mode merge \
|
||||
s3://omnigraph-local/clusters/demo/graphs/demo.omni
|
||||
omnigraph-server --cluster s3://omnigraph-local/clusters/demo \
|
||||
--bind 127.0.0.1:8080 --unauthenticated
|
||||
```
|
||||
|
||||
- downloads the rolling `edge` binary when one exists for the current platform
|
||||
- otherwise clones `ModernRelay/omnigraph` and builds from source
|
||||
- reuses an existing RustFS container if it is already running
|
||||
|
||||
Useful overrides:
|
||||
|
||||
- `WORKDIR=/path/to/state`
|
||||
- `BUCKET=omnigraph-local`
|
||||
- `PREFIX=graphs/context`
|
||||
- `RESET_REPO=1` to delete an existing partially initialized graph prefix before recreating it
|
||||
- `BIND=127.0.0.1:8080`
|
||||
- `RUSTFS_CONTAINER_NAME=omnigraph-rustfs-demo`
|
||||
|
||||
The bootstrap expects:
|
||||
|
||||
- Docker
|
||||
- `curl`
|
||||
- either a matching release asset or a local Rust toolchain plus `git`
|
||||
|
||||
If `aws` is not installed, the script attempts a user-local AWS CLI install via
|
||||
`python3 -m pip`. Docker Desktop or another Docker daemon must already be
|
||||
running.
|
||||
|
||||
If a previous bootstrap left objects behind under the selected `PREFIX` but did
|
||||
not finish initializing the graph, rerun with `RESET_REPO=1` or choose a new
|
||||
`PREFIX`.
|
||||
The same `AWS_*` contract applies to a production object store — swap the
|
||||
endpoint and credentials. CI exercises this path against containerized RustFS.
|
||||
|
||||
## Container Deployment
|
||||
|
||||
|
|
|
|||
|
|
@ -53,10 +53,13 @@ query find_people($title: String) {
|
|||
Run it:
|
||||
|
||||
```bash
|
||||
omnigraph read --query queries.gq --name find_people \
|
||||
--params '{"title":"Engineer"}' --format table graph.omni
|
||||
omnigraph query find_people --query queries.gq \
|
||||
--params '{"title":"Engineer"}' --format table --store graph.omni
|
||||
```
|
||||
|
||||
The query name is positional; `--query` points at the `.gq` source and
|
||||
`--store` addresses the graph's storage directly.
|
||||
|
||||
The [query language](queries/index.md) covers `match`/`return`/`order`, and
|
||||
[search](search/index.md) covers vector and full-text search.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,425 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
REPO_SLUG="${REPO_SLUG:-ModernRelay/omnigraph}"
|
||||
SOURCE_REF="${SOURCE_REF:-main}"
|
||||
RELEASE_CHANNEL="${RELEASE_CHANNEL:-edge}"
|
||||
WORKDIR="${WORKDIR:-$PWD/.omnigraph-rustfs-demo}"
|
||||
RUSTFS_CONTAINER_NAME="${RUSTFS_CONTAINER_NAME:-omnigraph-rustfs-demo}"
|
||||
# Pinned to 1.0.0-beta.8 (2026-06-10), matching CI (.github/workflows/ci.yml).
|
||||
# beta.4+ has a credentials-policy check that refuses to start when the
|
||||
# access/secret keys are values it considers "default" (rustfsadmin/rustfsadmin
|
||||
# here); this script passes RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
|
||||
# below, so overriding RUSTFS_IMAGE to another tag is safe.
|
||||
RUSTFS_IMAGE="${RUSTFS_IMAGE:-rustfs/rustfs:1.0.0-beta.8}"
|
||||
RUSTFS_DATA_DIR="${RUSTFS_DATA_DIR:-$WORKDIR/rustfs-data}"
|
||||
BUCKET="${BUCKET:-omnigraph-local}"
|
||||
PREFIX="${PREFIX:-repos/context}"
|
||||
BIND="${BIND:-127.0.0.1:8080}"
|
||||
AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID:-rustfsadmin}"
|
||||
AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY:-rustfsadmin}"
|
||||
AWS_REGION="${AWS_REGION:-us-east-1}"
|
||||
AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-http://127.0.0.1:9000}"
|
||||
AWS_ENDPOINT_URL_S3="${AWS_ENDPOINT_URL_S3:-$AWS_ENDPOINT_URL}"
|
||||
AWS_ALLOW_HTTP="${AWS_ALLOW_HTTP:-true}"
|
||||
AWS_S3_FORCE_PATH_STYLE="${AWS_S3_FORCE_PATH_STYLE:-true}"
|
||||
FORCE_BUILD="${FORCE_BUILD:-0}"
|
||||
RESET_REPO="${RESET_REPO:-0}"
|
||||
|
||||
REPO_URI="s3://$BUCKET/$PREFIX"
|
||||
SERVER_LOG="$WORKDIR/omnigraph-server.log"
|
||||
SERVER_PID_FILE="$WORKDIR/omnigraph-server.pid"
|
||||
BIN_DIR=""
|
||||
FIXTURE_DIR=""
|
||||
AWS_BIN=""
|
||||
|
||||
log() {
|
||||
printf '==> %s\n' "$*"
|
||||
}
|
||||
|
||||
die() {
|
||||
printf 'error: %s\n' "$*" >&2
|
||||
exit 1
|
||||
}
|
||||
|
||||
need_cmd() {
|
||||
command -v "$1" >/dev/null 2>&1 || die "missing required command: $1"
|
||||
}
|
||||
|
||||
repo_root_from_shell() {
|
||||
if [ -f "$PWD/Cargo.toml" ] && [ -f "$PWD/crates/omnigraph/tests/fixtures/context.pg" ]; then
|
||||
printf '%s\n' "$PWD"
|
||||
return 0
|
||||
fi
|
||||
|
||||
if [ -n "${BASH_SOURCE[0]:-}" ] && [ -f "${BASH_SOURCE[0]}" ]; then
|
||||
local candidate
|
||||
candidate="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||
if [ -f "$candidate/Cargo.toml" ] && [ -f "$candidate/crates/omnigraph/tests/fixtures/context.pg" ]; then
|
||||
printf '%s\n' "$candidate"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
|
||||
return 1
|
||||
}
|
||||
|
||||
latest_release_tag() {
|
||||
local json
|
||||
json="$(curl -fsSL "https://api.github.com/repos/$REPO_SLUG/releases/latest" 2>/dev/null || true)"
|
||||
printf '%s' "$json" | sed -n 's/.*"tag_name":[[:space:]]*"\([^"]*\)".*/\1/p' | head -n 1
|
||||
}
|
||||
|
||||
platform_asset_name() {
|
||||
local os arch
|
||||
os="$(uname -s)"
|
||||
arch="$(uname -m)"
|
||||
|
||||
case "$os/$arch" in
|
||||
Linux/x86_64)
|
||||
printf 'omnigraph-linux-x86_64.tar.gz\n'
|
||||
;;
|
||||
Darwin/arm64)
|
||||
printf 'omnigraph-macos-arm64.tar.gz\n'
|
||||
;;
|
||||
*)
|
||||
return 1
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
checksum_command() {
|
||||
if command -v shasum >/dev/null 2>&1; then
|
||||
printf 'shasum -a 256'
|
||||
return
|
||||
fi
|
||||
|
||||
if command -v sha256sum >/dev/null 2>&1; then
|
||||
printf 'sha256sum'
|
||||
return
|
||||
fi
|
||||
|
||||
die "missing checksum tool: expected shasum or sha256sum"
|
||||
}
|
||||
|
||||
release_base_url() {
|
||||
case "$RELEASE_CHANNEL" in
|
||||
stable)
|
||||
printf 'https://github.com/%s/releases/latest/download\n' "$REPO_SLUG"
|
||||
;;
|
||||
edge)
|
||||
printf 'https://github.com/%s/releases/download/edge\n' "$REPO_SLUG"
|
||||
;;
|
||||
*)
|
||||
die "unsupported RELEASE_CHANNEL '$RELEASE_CHANNEL' (expected stable or edge)"
|
||||
;;
|
||||
esac
|
||||
}
|
||||
|
||||
verify_checksum() {
|
||||
local archive="$1"
|
||||
local checksum_file="$2"
|
||||
local expected actual tool
|
||||
|
||||
expected="$(awk '{print $1}' "$checksum_file")"
|
||||
[ -n "$expected" ] || die "checksum file did not contain a SHA256 digest"
|
||||
|
||||
tool="$(checksum_command)"
|
||||
actual="$($tool "$archive" | awk '{print $1}')"
|
||||
|
||||
[ "$actual" = "$expected" ] || die "checksum verification failed for $(basename "$archive")"
|
||||
}
|
||||
|
||||
ensure_aws_cli() {
|
||||
if command -v aws >/dev/null 2>&1; then
|
||||
AWS_BIN="$(command -v aws)"
|
||||
return
|
||||
fi
|
||||
|
||||
need_cmd python3
|
||||
|
||||
if ! python3 -m pip --version >/dev/null 2>&1; then
|
||||
python3 -m ensurepip --upgrade --user >/dev/null 2>&1 || die "aws cli not found and python3 pip bootstrap failed"
|
||||
fi
|
||||
|
||||
log "Installing a user-local AWS CLI"
|
||||
python3 -m pip install --user awscli >/dev/null
|
||||
export PATH="$HOME/.local/bin:$PATH"
|
||||
|
||||
command -v aws >/dev/null 2>&1 || die "aws cli installation succeeded but aws was not found on PATH"
|
||||
AWS_BIN="$(command -v aws)"
|
||||
}
|
||||
|
||||
download_fixture_files() {
|
||||
local ref="$1"
|
||||
local fixture_target="$WORKDIR/fixtures"
|
||||
mkdir -p "$fixture_target"
|
||||
|
||||
for file in context.pg context.jsonl; do
|
||||
curl -fsSL \
|
||||
"https://raw.githubusercontent.com/$REPO_SLUG/$ref/crates/omnigraph/tests/fixtures/$file" \
|
||||
-o "$fixture_target/$file" || return 1
|
||||
done
|
||||
|
||||
FIXTURE_DIR="$fixture_target"
|
||||
}
|
||||
|
||||
download_release_binaries() {
|
||||
local asset asset_stem archive_dir archive_path checksum_path base_url
|
||||
|
||||
[ "$FORCE_BUILD" = "1" ] && return 1
|
||||
|
||||
asset="$(platform_asset_name)" || return 1
|
||||
asset_stem="${asset%.tar.gz}"
|
||||
archive_dir="$WORKDIR/release"
|
||||
archive_path="$archive_dir/$asset"
|
||||
checksum_path="$archive_dir/$asset_stem.sha256"
|
||||
mkdir -p "$archive_dir" "$WORKDIR/bin"
|
||||
base_url="$(release_base_url)"
|
||||
|
||||
log "Downloading release asset $asset"
|
||||
curl -fsSL \
|
||||
"$base_url/$asset" \
|
||||
-o "$archive_path" || return 1
|
||||
curl -fsSL \
|
||||
"$base_url/$asset_stem.sha256" \
|
||||
-o "$checksum_path" || return 1
|
||||
verify_checksum "$archive_path" "$checksum_path" || return 1
|
||||
tar -C "$WORKDIR/bin" -xzf "$archive_path" || return 1
|
||||
|
||||
BIN_DIR="$WORKDIR/bin"
|
||||
if [ "$RELEASE_CHANNEL" = "stable" ]; then
|
||||
local tag
|
||||
tag="$(latest_release_tag)"
|
||||
[ -n "$tag" ] || return 1
|
||||
download_fixture_files "$tag" || return 1
|
||||
else
|
||||
download_fixture_files "main" || return 1
|
||||
fi
|
||||
}
|
||||
|
||||
build_from_source() {
|
||||
local repo_root
|
||||
repo_root="${1:-}"
|
||||
|
||||
if [ -z "$repo_root" ]; then
|
||||
need_cmd git
|
||||
need_cmd cargo
|
||||
|
||||
repo_root="$WORKDIR/source"
|
||||
if [ ! -d "$repo_root/.git" ]; then
|
||||
log "Cloning $REPO_SLUG at $SOURCE_REF"
|
||||
git clone --depth 1 --branch "$SOURCE_REF" "https://github.com/$REPO_SLUG.git" "$repo_root"
|
||||
fi
|
||||
fi
|
||||
|
||||
need_cmd cargo
|
||||
log "Building omnigraph binaries from source"
|
||||
(
|
||||
cd "$repo_root"
|
||||
cargo build --release --locked -p omnigraph-cli -p omnigraph-server
|
||||
)
|
||||
|
||||
BIN_DIR="$repo_root/target/release"
|
||||
FIXTURE_DIR="$repo_root/crates/omnigraph/tests/fixtures"
|
||||
}
|
||||
|
||||
setup_binaries() {
|
||||
local repo_root
|
||||
repo_root="$(repo_root_from_shell || true)"
|
||||
|
||||
if [ -n "${OMNIGRAPH_BIN_DIR:-}" ]; then
|
||||
BIN_DIR="$OMNIGRAPH_BIN_DIR"
|
||||
if [ -n "${OMNIGRAPH_FIXTURE_DIR:-}" ]; then
|
||||
FIXTURE_DIR="$OMNIGRAPH_FIXTURE_DIR"
|
||||
elif [ -n "$repo_root" ]; then
|
||||
FIXTURE_DIR="$repo_root/crates/omnigraph/tests/fixtures"
|
||||
fi
|
||||
elif ! download_release_binaries; then
|
||||
if [ -n "$repo_root" ]; then
|
||||
build_from_source "$repo_root"
|
||||
else
|
||||
build_from_source
|
||||
fi
|
||||
fi
|
||||
|
||||
[ -x "$BIN_DIR/omnigraph" ] || die "omnigraph binary not found in $BIN_DIR"
|
||||
[ -x "$BIN_DIR/omnigraph-server" ] || die "omnigraph-server binary not found in $BIN_DIR"
|
||||
[ -f "$FIXTURE_DIR/context.pg" ] || die "context fixture schema not found in $FIXTURE_DIR"
|
||||
[ -f "$FIXTURE_DIR/context.jsonl" ] || die "context fixture data not found in $FIXTURE_DIR"
|
||||
}
|
||||
|
||||
start_rustfs() {
|
||||
mkdir -p "$RUSTFS_DATA_DIR"
|
||||
|
||||
if docker ps --format '{{.Names}}' | grep -qx "$RUSTFS_CONTAINER_NAME"; then
|
||||
log "Reusing existing RustFS container $RUSTFS_CONTAINER_NAME"
|
||||
return
|
||||
fi
|
||||
|
||||
if docker ps -a --format '{{.Names}}' | grep -qx "$RUSTFS_CONTAINER_NAME"; then
|
||||
log "Removing stopped RustFS container $RUSTFS_CONTAINER_NAME"
|
||||
docker rm -f "$RUSTFS_CONTAINER_NAME" >/dev/null
|
||||
fi
|
||||
|
||||
log "Starting RustFS on $AWS_ENDPOINT_URL_S3"
|
||||
docker run -d \
|
||||
--name "$RUSTFS_CONTAINER_NAME" \
|
||||
-p 9000:9000 \
|
||||
-p 9001:9001 \
|
||||
-v "$RUSTFS_DATA_DIR:/data" \
|
||||
-e RUSTFS_ACCESS_KEY="$AWS_ACCESS_KEY_ID" \
|
||||
-e RUSTFS_SECRET_KEY="$AWS_SECRET_ACCESS_KEY" \
|
||||
-e RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true \
|
||||
"$RUSTFS_IMAGE" \
|
||||
/data >/dev/null
|
||||
}
|
||||
|
||||
wait_for_rustfs() {
|
||||
local attempt
|
||||
for attempt in $(seq 1 30); do
|
||||
if "$AWS_BIN" --endpoint-url "$AWS_ENDPOINT_URL_S3" s3api list-buckets >/dev/null 2>&1; then
|
||||
return
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
docker logs "$RUSTFS_CONTAINER_NAME" || true
|
||||
die "RustFS did not become ready"
|
||||
}
|
||||
|
||||
ensure_bucket() {
|
||||
log "Ensuring bucket $BUCKET exists"
|
||||
"$AWS_BIN" --endpoint-url "$AWS_ENDPOINT_URL_S3" \
|
||||
s3api create-bucket --bucket "$BUCKET" >/dev/null 2>&1 || true
|
||||
}
|
||||
|
||||
graph_prefix_has_objects() {
|
||||
local key_count
|
||||
key_count="$("$AWS_BIN" --endpoint-url "$AWS_ENDPOINT_URL_S3" \
|
||||
s3api list-objects-v2 \
|
||||
--bucket "$BUCKET" \
|
||||
--prefix "$PREFIX/" \
|
||||
--max-keys 1 \
|
||||
--query 'KeyCount' \
|
||||
--output text 2>/dev/null || true)"
|
||||
|
||||
[ -n "$key_count" ] && [ "$key_count" != "None" ] && [ "$key_count" != "0" ]
|
||||
}
|
||||
|
||||
reset_graph_prefix() {
|
||||
log "Removing existing objects under $REPO_URI"
|
||||
"$AWS_BIN" --endpoint-url "$AWS_ENDPOINT_URL_S3" \
|
||||
s3 rm "s3://$BUCKET/$PREFIX" --recursive >/dev/null
|
||||
}
|
||||
|
||||
initialize_graph() {
|
||||
if "$BIN_DIR/omnigraph" snapshot "$REPO_URI" --json >/dev/null 2>&1; then
|
||||
log "Reusing existing graph at $REPO_URI"
|
||||
return
|
||||
fi
|
||||
|
||||
if graph_prefix_has_objects; then
|
||||
if [ "$RESET_REPO" = "1" ]; then
|
||||
reset_graph_prefix
|
||||
else
|
||||
die "found existing objects under $REPO_URI but could not open an Omnigraph graph there. This usually means a previous bootstrap left a partially initialized prefix. Rerun with RESET_REPO=1 to delete that prefix and recreate it, or set PREFIX to a new value."
|
||||
fi
|
||||
fi
|
||||
|
||||
log "Initializing graph at $REPO_URI"
|
||||
"$BIN_DIR/omnigraph" init --schema "$FIXTURE_DIR/context.pg" "$REPO_URI"
|
||||
|
||||
log "Loading context fixture into $REPO_URI"
|
||||
"$BIN_DIR/omnigraph" load --data "$FIXTURE_DIR/context.jsonl" "$REPO_URI"
|
||||
}
|
||||
|
||||
start_server() {
|
||||
mkdir -p "$WORKDIR"
|
||||
|
||||
if [ -f "$SERVER_PID_FILE" ] && kill -0 "$(cat "$SERVER_PID_FILE")" >/dev/null 2>&1; then
|
||||
log "Stopping existing server process $(cat "$SERVER_PID_FILE")"
|
||||
kill "$(cat "$SERVER_PID_FILE")" >/dev/null 2>&1 || true
|
||||
sleep 1
|
||||
fi
|
||||
|
||||
log "Starting omnigraph-server on $BIND"
|
||||
nohup "$BIN_DIR/omnigraph-server" "$REPO_URI" --bind "$BIND" >"$SERVER_LOG" 2>&1 &
|
||||
echo "$!" > "$SERVER_PID_FILE"
|
||||
}
|
||||
|
||||
wait_for_server() {
|
||||
local bind_host bind_port health_host base_url
|
||||
bind_host="${BIND%:*}"
|
||||
bind_port="${BIND##*:}"
|
||||
health_host="$bind_host"
|
||||
if [ "$health_host" = "0.0.0.0" ]; then
|
||||
health_host="127.0.0.1"
|
||||
fi
|
||||
base_url="http://$health_host:$bind_port"
|
||||
|
||||
for _ in $(seq 1 30); do
|
||||
if curl -fsSL "$base_url/healthz" >/dev/null 2>&1; then
|
||||
printf '%s\n' "$base_url"
|
||||
return
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
cat "$SERVER_LOG" >&2 || true
|
||||
die "omnigraph-server did not pass /healthz"
|
||||
}
|
||||
|
||||
print_summary() {
|
||||
local base_url="$1"
|
||||
|
||||
cat <<EOF
|
||||
|
||||
Omnigraph local RustFS demo is up.
|
||||
|
||||
Server:
|
||||
$base_url
|
||||
|
||||
Graph URI:
|
||||
$REPO_URI
|
||||
|
||||
RustFS console:
|
||||
http://127.0.0.1:9001
|
||||
|
||||
Useful commands:
|
||||
curl -fsSL "$base_url/healthz"
|
||||
curl -fsSL "$base_url/snapshot?branch=main"
|
||||
"$BIN_DIR/omnigraph" snapshot "$REPO_URI" --json
|
||||
tail -f "$SERVER_LOG"
|
||||
kill \$(cat "$SERVER_PID_FILE")
|
||||
docker logs -f "$RUSTFS_CONTAINER_NAME"
|
||||
|
||||
EOF
|
||||
}
|
||||
|
||||
main() {
|
||||
need_cmd docker
|
||||
need_cmd curl
|
||||
docker info >/dev/null 2>&1 || die "docker is installed but the daemon is not reachable; start Docker Desktop or another daemon and rerun"
|
||||
|
||||
export AWS_ACCESS_KEY_ID
|
||||
export AWS_SECRET_ACCESS_KEY
|
||||
export AWS_REGION
|
||||
export AWS_ENDPOINT_URL
|
||||
export AWS_ENDPOINT_URL_S3
|
||||
export AWS_ALLOW_HTTP
|
||||
export AWS_S3_FORCE_PATH_STYLE
|
||||
|
||||
mkdir -p "$WORKDIR"
|
||||
|
||||
setup_binaries
|
||||
ensure_aws_cli
|
||||
start_rustfs
|
||||
wait_for_rustfs
|
||||
ensure_bucket
|
||||
initialize_graph
|
||||
start_server
|
||||
print_summary "$(wait_for_server)"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
414
skills/omnigraph/SKILL.md
Normal file
414
skills/omnigraph/SKILL.md
Normal file
|
|
@ -0,0 +1,414 @@
|
|||
---
|
||||
name: omnigraph
|
||||
description: Store, retrieve, and query knowledge, memory, and relationships in an Omnigraph graph, and operate a local or remote Omnigraph deployment. Use when the user wants to capture or recall facts, notes, or entities, build or query a knowledge graph or agent memory, or run Omnigraph — and whenever you see Omnigraph CLI commands (omnigraph init/query/mutate/load/schema/lint/embed/branch/commit/login/profile/cluster), .pg schema or .gq query files, s3:// graph URIs, bearer-authed graph endpoints, 504 errors, or a cluster.yaml / omnigraph.yaml / ~/.omnigraph/config.yaml. Covers cluster-mode deployments (cluster.yaml plan/apply, omnigraph-server --cluster), the two config surfaces (cluster.yaml + ~/.omnigraph/config.yaml), schema evolution, query linting, data writes (mutate; load needs --mode/--from), branches, embeddings, Cedar policy, and remote ops. Especially important before schema apply (plan first), any load (--mode required), any .gq/.pg edit (lint after), or any remote write (verify via commit list).
|
||||
license: MIT (see LICENSE at repo root)
|
||||
compatibility: Requires omnigraph CLI >= 0.7.0 — the unified `load`, the two config surfaces (cluster.yaml + ~/.omnigraph/config.yaml), and cluster apply/serve all require 0.7.0.
|
||||
metadata:
|
||||
author: ModernRelay
|
||||
version: "0.7.0"
|
||||
repository: https://github.com/ModernRelay/omnigraph
|
||||
---
|
||||
|
||||
# Operating Omnigraph Locally
|
||||
|
||||
This skill captures the operational rules for working with a locally or remotely deployed Omnigraph. Follow them when authoring schema, writing queries, loading data, evolving schema, or automating graph operations.
|
||||
|
||||
## The Seven Rules
|
||||
|
||||
1. **Lint before commit** — `omnigraph lint --schema schema.pg --query queries/foo.gq` validates both sides against each other. No running repo required.
|
||||
2. **Plan before apply** — never run `schema apply` without a successful `schema plan` first. Apply is destructive; plan is free. (Cluster mode has the same rule with different verbs: `cluster plan` before `cluster apply` — the plan embeds the engine's real migration steps.)
|
||||
3. **Branches are for data; apply is for schema** — review bulk data loads on a feature branch then merge. Schema changes go straight to `main`: in cluster mode edit the `.pg` and run `cluster apply` (a direct `schema apply` **refuses** a cluster-managed graph); `schema plan`/`apply` is for a non-cluster store.
|
||||
4. **Pick the right write command** — `mutate` for edits (typechecked, parameterized); `load` for bulk JSONL, local **or** remote, with a **required** `--mode` (`merge` upsert · `append` strict-insert · `overwrite` clean-slate). `load --from <base>` forks a review branch in one shot; bare `load` needs an existing target branch.
|
||||
5. **Parameterize everything** — never string-interpolate values into `.gq` bodies or `--params`. Declare `$var: Type` and pass via `--params`.
|
||||
6. **Expose agent operations as aliases** — not raw CLI invocations. Aliases decouple the operation name from the query implementation.
|
||||
7. **Verify after every remote write** — compare `commit list --branch main` head before and after. The CLI's exit code is not authoritative on remote graphs; proxies can drop the response while the write commits server-side. See `references/remote-ops.md` for the verification ritual and how to recover from 504s.
|
||||
|
||||
## Essentials: Queries, Mutations, Loads
|
||||
|
||||
The patterns below cover the daily 80% — enough to write correct `.gq` and JSONL without leaving this file. The long tail (multi-hop, negation, aggregations, hybrid search, every decorator) is in [`references/queries.md`](references/queries.md) and [`references/schema.md`](references/schema.md).
|
||||
|
||||
**Comments in `.pg` and `.gq` are `//`, never `#`** (the #1 parse error).
|
||||
|
||||
### Read query (`.gq`)
|
||||
|
||||
```gq
|
||||
query get_signal($slug: String) {
|
||||
match {
|
||||
$s: Signal { slug: $slug } // inline property filter goes in the match block
|
||||
$s formsPattern $p // edge FormsPattern declared PascalCase, traversed lowerCamelCase
|
||||
}
|
||||
return { $s.slug, $s.name, $p.slug }
|
||||
}
|
||||
```
|
||||
|
||||
- **Parameterize, never interpolate.** Declare `$var: Type` in the signature; pass via `--params '{"slug":"sig-foo"}'`. An empty signature still needs parens: `query foo() { ... }`.
|
||||
- **Edge traversal is lowerCamelCase** even though the schema declares edges PascalCase (`FormsPattern` → `formsPattern`).
|
||||
- **List/sort** by appending `order { $s.stagingTimestamp desc } limit 50` after `return`.
|
||||
- **Ranking ops (`nearest`/`bm25`/`rrf`) require a trailing `limit N`** — omitting it is a compile error. They live in `order { }`, not as filters. Scope with `match`/filters first, then rank (`order { nearest($d.embedding, $q) } limit 10`).
|
||||
|
||||
### Mutation (`.gq`)
|
||||
|
||||
There is **no top-level `mutation { }`** — every block is a named `query`; the verb (`insert`/`update`/`delete`) makes it a write. Dispatch with `omnigraph mutate` (not `query`).
|
||||
|
||||
```gq
|
||||
query add_signal($slug: String, $name: String, $brief: String, $createdAt: DateTime) {
|
||||
insert Signal { slug: $slug, name: $name, brief: $brief,
|
||||
stagingTimestamp: $createdAt, createdAt: $createdAt, updatedAt: $createdAt }
|
||||
}
|
||||
query link($from: String, $to: String) { insert FormsPattern { from: $from, to: $to } }
|
||||
query retitle($slug: String, $t: String) { update Signal set { name: $t } where slug = $slug }
|
||||
query remove($slug: String) { delete Signal where slug = $slug }
|
||||
```
|
||||
|
||||
- **Every non-nullable property must be supplied** or lint fails (`T12: insert for 'Signal' must provide non-nullable property 'X'`).
|
||||
- A single mutation is insert/update-only **or** delete-only — never both (parse-time D₂ rule); split them.
|
||||
- Edges have no `@key`: give `from`/`to` slugs; the property block is `{}` when the edge has none.
|
||||
|
||||
### Bulk load (JSONL)
|
||||
|
||||
```jsonl
|
||||
{"type":"Signal","data":{"slug":"sig-foo","name":"Foo","brief":"…","stagingTimestamp":"2026-04-14T00:00:00Z","createdAt":"2026-04-14T00:00:00Z","updatedAt":"2026-04-14T00:00:00Z"}}
|
||||
{"edge":"FormsPattern","from":"sig-foo","to":"pat-bar","data":{}}
|
||||
```
|
||||
|
||||
```bash
|
||||
omnigraph load --data seed.jsonl --mode merge $GRAPH # --mode is REQUIRED (no default)
|
||||
omnigraph load --data delta.jsonl --from main --branch review --mode merge $GRAPH # fork a review branch in one shot
|
||||
```
|
||||
|
||||
- `--mode`: `merge` (upsert by `@key`) · `append` (fails on collision) · `overwrite` (destructive, staged). `--from <base>` forks a missing `--branch`; bare `load` needs an existing branch. Works local **and** remote.
|
||||
- **Date footgun**: `mutate --params` takes ISO strings (`Date` `"2026-04-29"`, `DateTime` `"…T00:00:00Z"`); `load` JSONL takes **integer days since epoch** for `Date` (`20572`) but ISO for `DateTime`.
|
||||
|
||||
### Dispatching
|
||||
|
||||
```bash
|
||||
omnigraph alias signal sig-foo # operator alias → its bound stored query (read or write)
|
||||
omnigraph query get_signal --params '{"slug":"sig-foo"}' # served stored query by name (verb asserts read vs write)
|
||||
omnigraph query -e 'query q() { match { $s: Signal } return { $s.slug } limit 5 }' # ad-hoc/inline (or: --query f.gq <name>)
|
||||
omnigraph mutate add_signal --query mutations.gq --params '{"slug":"sig-foo", ...}' # name positional; ad-hoc file source
|
||||
omnigraph lint --schema schema.pg --query queries/foo.gq # after EVERY .gq/.pg edit (no server needed)
|
||||
```
|
||||
|
||||
### `.gq` grammar
|
||||
|
||||
The non-obvious facts that bite, then the full grammar:
|
||||
|
||||
- **Scalar param types**: `String Bool I32 I64 U32 U64 F32 F64 DateTime Date Blob`. Modifiers: `T?` (optional), `[T]` (list), `Vector(N)`. There is **no `Int`** — use `I64`.
|
||||
- **A read query needs `match` *and* `return`** (`order`/`limit` optional); a mutation has neither — only `insert`/`update`/`delete`.
|
||||
- **`limit` takes an integer literal, not a param** — `limit 50`, never `limit $n`.
|
||||
- **Variable-hop traversal**: `$p knows{1,3} $f` (`{1,}` = unbounded).
|
||||
- **Literals & calls**: `now()`, `date("2026-04-29")`, `datetime("…T00:00:00Z")`, list `[…]`.
|
||||
- **Filters** `= != > < >= <= contains`; **aggregates** `count/sum/avg/min/max` (`count($f) as n`).
|
||||
- **Stored-query metadata**: `@description("…")` / `@instruction("…")` may follow the param list.
|
||||
- **Casing**: type names uppercase-initial (`Signal`); idents/edges lowercase-initial (`formsPattern`); variables `$`-prefixed. `//` and `/* */` comments only.
|
||||
|
||||
Authoritative PEG grammar (pest) for `.gq` files ("NanoGraph" is the legacy engine name):
|
||||
|
||||
```pest
|
||||
// NanoGraph Query Grammar (.gq files)
|
||||
|
||||
WHITESPACE = _{ " " | "\t" | "\r" | "\n" }
|
||||
COMMENT = _{ LINE_COMMENT | BLOCK_COMMENT }
|
||||
LINE_COMMENT = _{ "//" ~ (!"\n" ~ ANY)* }
|
||||
BLOCK_COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
|
||||
|
||||
query_file = { SOI ~ query_decl* ~ EOI }
|
||||
|
||||
query_decl = {
|
||||
"query" ~ ident ~ "(" ~ param_list? ~ ")" ~ query_annotation* ~ "{"
|
||||
~ query_body
|
||||
~ "}"
|
||||
}
|
||||
query_annotation = { description_annotation | instruction_annotation }
|
||||
description_annotation = { "@description" ~ "(" ~ string_lit ~ ")" }
|
||||
instruction_annotation = { "@instruction" ~ "(" ~ string_lit ~ ")" }
|
||||
|
||||
query_body = { read_query_body | mutation_body }
|
||||
mutation_body = { mutation_stmt+ }
|
||||
read_query_body = {
|
||||
match_clause
|
||||
~ return_clause
|
||||
~ order_clause?
|
||||
~ limit_clause?
|
||||
}
|
||||
|
||||
mutation_stmt = { insert_stmt | update_stmt | delete_stmt }
|
||||
insert_stmt = { "insert" ~ type_name ~ "{" ~ mutation_assignment+ ~ "}" }
|
||||
update_stmt = { "update" ~ type_name ~ "set" ~ "{" ~ mutation_assignment+ ~ "}" ~ "where" ~ mutation_predicate }
|
||||
delete_stmt = { "delete" ~ type_name ~ "where" ~ mutation_predicate }
|
||||
mutation_assignment = { ident ~ ":" ~ match_value ~ ","? }
|
||||
mutation_predicate = { ident ~ comp_op ~ match_value }
|
||||
|
||||
param_list = { param ~ ("," ~ param)* }
|
||||
param = { variable ~ ":" ~ type_ref }
|
||||
|
||||
type_ref = { (list_type | base_type | vector_type) ~ "?"? }
|
||||
list_type = { "[" ~ base_type ~ "]" }
|
||||
vector_type = { "Vector" ~ "(" ~ integer ~ ")" }
|
||||
base_type = { "String" | "Blob" | "Bool" | "I32" | "I64" | "U32" | "U64" | "F32" | "F64" | "DateTime" | "Date" }
|
||||
|
||||
match_clause = { "match" ~ "{" ~ clause+ ~ "}" }
|
||||
|
||||
clause = { negation | binding | traversal | filter | text_search_clause }
|
||||
text_search_clause = { search_call | fuzzy_call | match_text_call }
|
||||
|
||||
// Binding: $p: Person { name: "Alice" }
|
||||
binding = { variable ~ ":" ~ type_name ~ ("{" ~ prop_match_list ~ "}")? }
|
||||
|
||||
prop_match_list = { prop_match ~ ("," ~ prop_match)* ~ ","? }
|
||||
prop_match = { ident ~ ":" ~ match_value }
|
||||
match_value = { literal | variable | now_call }
|
||||
|
||||
// Traversal: $p knows $f
|
||||
traversal = { variable ~ edge_ident ~ traversal_bounds? ~ variable }
|
||||
traversal_bounds = { "{" ~ integer ~ "," ~ integer? ~ "}" }
|
||||
|
||||
// Filter: $f.age > 25
|
||||
filter = { expr ~ filter_op ~ expr }
|
||||
|
||||
// Negation: not { ... }
|
||||
negation = { "not" ~ "{" ~ clause+ ~ "}" }
|
||||
|
||||
// Return clause — projections separated by commas or newlines
|
||||
return_clause = { "return" ~ "{" ~ projection+ ~ "}" }
|
||||
projection = { expr ~ ("as" ~ ident)? ~ ","? }
|
||||
|
||||
// Order clause
|
||||
order_clause = { "order" ~ "{" ~ ordering ~ ("," ~ ordering)* ~ "}" }
|
||||
ordering = { nearest_ordering | (expr ~ order_dir?) }
|
||||
nearest_ordering = { "nearest" ~ "(" ~ prop_access ~ "," ~ expr ~ ")" }
|
||||
order_dir = { "asc" | "desc" }
|
||||
|
||||
// Limit clause
|
||||
limit_clause = { "limit" ~ integer }
|
||||
|
||||
// Expressions
|
||||
expr = { now_call | nearest_ordering | search_call | fuzzy_call | match_text_call | bm25_call | rrf_call | agg_call | prop_access | variable | literal | ident }
|
||||
now_call = { "now" ~ "(" ~ ")" }
|
||||
search_call = { "search" ~ "(" ~ expr ~ "," ~ expr ~ ")" }
|
||||
fuzzy_call = { "fuzzy" ~ "(" ~ expr ~ "," ~ expr ~ ("," ~ expr)? ~ ")" }
|
||||
match_text_call = { "match_text" ~ "(" ~ expr ~ "," ~ expr ~ ")" }
|
||||
bm25_call = { "bm25" ~ "(" ~ expr ~ "," ~ expr ~ ")" }
|
||||
rank_expr = { nearest_ordering | bm25_call }
|
||||
rrf_call = { "rrf" ~ "(" ~ rank_expr ~ "," ~ rank_expr ~ ("," ~ expr)? ~ ")" }
|
||||
|
||||
prop_access = { variable ~ "." ~ ident }
|
||||
|
||||
agg_call = { agg_func ~ "(" ~ expr ~ ")" }
|
||||
agg_func = { "count" | "sum" | "avg" | "min" | "max" }
|
||||
|
||||
comp_op = { ">=" | "<=" | "!=" | ">" | "<" | "=" }
|
||||
filter_op = { "contains" | comp_op }
|
||||
|
||||
// Terminals
|
||||
variable = @{ "$" ~ (ident_chars | "_") }
|
||||
ident_chars = @{ (ASCII_ALPHA_LOWER | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
|
||||
|
||||
// Edge identifier — lowercase start, same as ident but used in traversal context
|
||||
// Must not match keywords
|
||||
edge_ident = @{ !("not" ~ !ASCII_ALPHANUMERIC) ~ (ASCII_ALPHA_LOWER | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
|
||||
|
||||
type_name = @{ ASCII_ALPHA_UPPER ~ (ASCII_ALPHANUMERIC | "_")* }
|
||||
ident = @{ (ASCII_ALPHA_LOWER | "_") ~ (ASCII_ALPHANUMERIC | "_")* }
|
||||
|
||||
literal = { list_lit | datetime_lit | date_lit | string_lit | float_lit | integer | bool_lit }
|
||||
date_lit = { "date" ~ "(" ~ string_lit ~ ")" }
|
||||
datetime_lit = { "datetime" ~ "(" ~ string_lit ~ ")" }
|
||||
list_lit = { "[" ~ (literal ~ ("," ~ literal)*)? ~ "]" }
|
||||
string_lit = @{ "\"" ~ string_char* ~ "\"" }
|
||||
string_char = @{ !("\"" | "\\") ~ ANY | "\\" ~ ANY }
|
||||
float_lit = @{ ASCII_DIGIT+ ~ "." ~ ASCII_DIGIT+ }
|
||||
integer = @{ ASCII_DIGIT+ }
|
||||
bool_lit = { "true" | "false" }
|
||||
```
|
||||
|
||||
## CLI Reference (condensed)
|
||||
|
||||
Notation: `<x>` required · `[x]` optional · `<a|b>` choice · `…` repeatable.
|
||||
|
||||
**Global addressing flags**: `--as <actor>` (direct/`--store` writes only — a server resolves the actor from its token), `--server <name|url>`, `--cluster <dir|uri>` (cluster-managed storage, for maintenance), `--graph <id>` (selects the graph within a `--server` or `--cluster` scope), `--profile <name>` (`$OMNIGRAPH_PROFILE`), `--store <uri>`. Data commands also take a positional `file://`/`s3://` URI (`--config <dir>` is for `cluster` commands only). Output: `--json`, or reads take `--format <json|jsonl|csv|kv|table>`. **Write guards:** `--yes` skips the confirm prompt for a destructive write (`cleanup`, overwrite `load`, `branch delete`) against a non-local scope (it *refuses* without it when non-TTY or `--json`); `--quiet` suppresses the resolved-target echo.
|
||||
|
||||
**Data plane** — `any` (served via `--server`/`--profile`, or direct via `--store`/URI):
|
||||
- `query` (alias `read`) `<name>` — a **served stored query** by name (via `--server`/`--profile`); or ad-hoc `[<name>] (--query <f.gq> | -e '<GQ>')` where `<name>` picks which query in the source. `[--params <json> | --params-file <p>] [--branch <b> | --snapshot <id>] [--format <fmt> | --json]`. No positional URI — address via `--server`/`--store`/`--profile`.
|
||||
- `mutate` (alias `change`) — same shape (served stored mutation by `<name>`, or ad-hoc `--query`/`-e`); `[--params …] [--branch <b>] [--json]`. The verb asserts kind: `query`→read, `mutate`→write (400 on mismatch).
|
||||
- `alias <name> [args…]` — invoke an operator alias's bound stored query (read or write); `[--params … | --params-file <p>] [--format <fmt> | --json]` (server/graph/query come from the binding)
|
||||
- `load --data <f.jsonl> --mode <overwrite|append|merge> [--branch <b>] [--from <base>] [--json]` — `--mode` required; `--from` forks a missing `--branch`
|
||||
- `snapshot [--branch <b>] [--json]`
|
||||
- `export [--branch <b>] [--type <T>…] [--table <K>…]` (streams JSONL)
|
||||
- `branch <create <name> [--from <base>] | list | delete <name> | merge <source> --into <target>> [--json]`
|
||||
- `commit <list [--branch <b>] | show <commit_id>> [--json]`
|
||||
- `schema <plan | apply> --schema <f.pg> [--allow-data-loss] [--json]` · `schema show` (alias `get`) — `apply` **refuses a cluster-managed graph** (evolve those via `cluster apply`)
|
||||
|
||||
**Served only** (needs `--server`/`--profile`): `graphs list [--json]`
|
||||
|
||||
**Direct / storage** — reject `--server`; address by positional URI or `--cluster <dir|s3> --graph <id>`:
|
||||
- `init --schema <f.pg> <uri> [--force]`
|
||||
- `lint --query <f.gq> [--schema <f.pg>] [<uri>] [--json]` — offline with `--schema`, graph-backed with a URI
|
||||
- `optimize [--json]` · `repair [--confirm] [--force] [--json]` · `cleanup (--keep <N> | --older-than <7d>) --confirm [--json]`
|
||||
- `queries <validate [<uri>] | list> [--json]`
|
||||
|
||||
**Control plane** — cluster (`--config <dir>`, default `.`):
|
||||
- `cluster <validate | plan | apply | status | refresh | import> [--config <dir>] [--json]`
|
||||
- `cluster approve <resource> --as <actor> [--config <dir>] [--json]` · `cluster force-unlock <lock_id> [--config <dir>] [--json]`
|
||||
|
||||
**Local** (no graph):
|
||||
- `policy <validate | test --tests <f> | explain --actor <a> --action <act> [--branch <b> | --target-branch <b>]> --cluster <dir> [--graph <id>]`
|
||||
- `embed --seed <embed.yaml> [--reembed_all | --clean | --select "<Type>:<field>=<value>"]`
|
||||
- `login <server> [--token <t>]` (prefer piping the token on stdin) · `logout <server>` · `profile <list | show [<name>]>` · `version`
|
||||
|
||||
Pre-0.7.0 spellings (`read`/`change`/`ingest`, `--target`, positional `http://`) → [`references/migrations.md`](references/migrations.md).
|
||||
|
||||
## Five Ontology Design Criteria (Gruber 1993)
|
||||
|
||||
Omnigraph schemas are ontologies. The canonical design criteria from Gruber's *Toward Principles for the Design of Ontologies Used for Knowledge Sharing* (Int. J. Human-Computer Studies 43:907–928) apply directly when authoring `.pg` files.
|
||||
|
||||
1. **Clarity** — definitions should communicate intended meaning unambiguously and be independent of social or computational context. In Omnigraph: precise type names, narrow enums over `String`, `@check`/`@range` for stated invariants. A reviewer should understand the domain from the schema alone.
|
||||
2. **Coherence** — inferences sanctioned by the schema must be consistent with the domain modeled. Gruber's trap: defining quantity as a `(magnitude, unit)` pair makes `6 feet ≠ 2 yards` even though they describe the same length. In Omnigraph: watch for `@card`, `@unique`, and edge directionality that let the schema distinguish things the domain treats as equal.
|
||||
3. **Extendibility** — the schema should support specialization without revising existing definitions. In Omnigraph: prefer interfaces for shared shape, leave enums open where the domain genuinely admits more, model identifiers via mapping functions rather than baking units/formats into the entity.
|
||||
4. **Minimal encoding bias** — representation choices made for notation or implementation convenience leak into the model. In Omnigraph: don't type dates as `String` because the source API returns strings; separate conceptual entities (a publication date, a person) from their surface encoding (a year integer, a name string) when both matter.
|
||||
5. **Minimal ontological commitment** — make as few claims about the world as the use case requires. In Omnigraph: don't add required properties, closed enums, or `@card(1..1)` "in case"; tighten later via `schema plan`/`apply` when a real constraint emerges. Weaker schemas leave consumers room to specialize.
|
||||
|
||||
The criteria trade off against each other — Clarity wants tight definitions while Minimal Commitment wants weak ones. Gruber's resolution: *having decided a distinction is worth making, give it the tightest possible definition*. Decide what to model conservatively; once modeled, constrain precisely.
|
||||
|
||||
## Schema Authoring Principles
|
||||
|
||||
Twelve practical rules for `.pg` authoring — full text and examples in [`docs/omni-schema.md`](../../docs/omni-schema.md). In short: schema-is-the-contract · explicit identity via `@key` · model meaning not tables · strong intentional types · deliberate optionality · shared shape in interfaces · schema-level constraints (`@unique`/`@index`/`@range`/`@check`/`@card`) · search as a schema decision · edge semantics matter · reviewable schemas · intentional migrations (`@rename_from`) · domain clarity over ORM habits.
|
||||
|
||||
Design flow: entities → stable keys → relationships worth their own edge → enum candidates → uniqueness/bounds/cardinality → search needs → shared shape into interfaces → evolution plan.
|
||||
|
||||
## Provenance Is Structural (Multi-Agent Source of Truth)
|
||||
|
||||
When Omnigraph serves as canonical truth across multiple agents, every assertion must answer *who said it, when, based on what evidence*. This is the runtime guarantee Gruber's criteria don't cover — his agents shared vocabulary; ours additionally must share attribution. Provenance belongs in the schema, not in logs.
|
||||
|
||||
Without structural provenance, agents cannot reconcile contradictory assertions, retract facts when a source is discredited, replay graph state at a past timestamp, or distinguish high-evidence facts from speculation.
|
||||
|
||||
**In Omnigraph:** model provenance as a `Claim`-style interface (or a separate `Claim` node linked to each sourced fact) with required fields — `asserted_by: Actor`, `asserted_at: DateTime`, `evidence_source: Source`, optionally `confidence: F64`. Don't stash provenance into a free-text `source: String` or a `metadata: JSON` dump — structured provenance is queryable, indexable, and migratable; free-form is none of these.
|
||||
|
||||
## Storage & Credentials
|
||||
|
||||
A graph's bytes live in one of two backends:
|
||||
|
||||
- **Local filesystem** — a path or `file://` URI. In cluster mode `storage:` defaults to the config directory, so local dev needs no object store.
|
||||
- **S3-compatible object storage** — AWS, Railway, Tigris, etc. (`s3://bucket/prefix`). Authenticate with the standard `AWS_*` environment contract; keep dev creds in a git-ignored `.env.omni` and source it before CLI calls:
|
||||
|
||||
```bash
|
||||
set -a && source .env.omni && set +a
|
||||
```
|
||||
|
||||
`init` and `load` write storage directly (bypassing the server); the server reads from it. Validate with `curl http://127.0.0.1:8080/healthz`, then `omnigraph snapshot <graph-uri> --json`.
|
||||
|
||||
## Project Layout
|
||||
|
||||
### Deployment & access (omnigraph >= 0.7.0)
|
||||
|
||||
- **Cluster deployment — the only way to serve.** A `cluster.yaml` declares the
|
||||
whole deployment (graphs, schemas, stored queries, policies, optional S3
|
||||
`storage:` root); `omnigraph cluster apply` converges it and
|
||||
`omnigraph-server --cluster .` (or `--cluster s3://bucket/prefix`,
|
||||
config-free) serves it. See `references/cluster.md`.
|
||||
- **Direct / embedded access — no server.** Address a graph's storage directly
|
||||
with `--store <file://|s3:// uri>` or a positional URI for one-off CLI ops.
|
||||
There is **no single-graph server mode** — the server is cluster-only.
|
||||
|
||||
### The two config surfaces (omnigraph >= 0.7.0)
|
||||
|
||||
Configuration has two single-owner homes (RFC-007/008), plus an
|
||||
everything-explicit flag/env tier:
|
||||
|
||||
| Surface | Owner | Location | Declares |
|
||||
|---|---|---|---|
|
||||
| **Cluster config** | the team, in the repo | `cluster.yaml` + the `.pg`/`.gq`/policy files it references | what the system **is**: graphs, schemas, queries, policies, storage |
|
||||
| **Operator config** | one person | `~/.omnigraph/config.yaml` (`$OMNIGRAPH_HOME` relocates it) | who **I** am: identity, named servers, output defaults, personal aliases |
|
||||
| Flags / env | per invocation | — | everything, explicitly |
|
||||
|
||||
```yaml
|
||||
# ~/.omnigraph/config.yaml — per operator, never committed
|
||||
operator:
|
||||
actor: act-andrew # default --as identity
|
||||
servers:
|
||||
intel-dev:
|
||||
url: https://graph.example.com # no tokens here, ever
|
||||
defaults:
|
||||
output: table # read-format default
|
||||
server: intel-dev # default served scope (or `store: file://…/g.omni` for a local default — mutually exclusive)
|
||||
default_graph: spike # graph within a server/cluster scope
|
||||
profiles: # optional named scope bundles — pick with --profile <name>
|
||||
staging: { server: intel-staging, default_graph: spike }
|
||||
aliases: # personal bindings to TEAM stored queries (see references/aliases.md)
|
||||
triage: { server: intel-dev, graph: spike, query: weekly_triage, args: [since] }
|
||||
```
|
||||
|
||||
The operator config and credentials are **auto-discovered — no flag points at them**: the CLI reads `$OMNIGRAPH_HOME/config.yaml` (default `~/.omnigraph/config.yaml`), and an absent file is just an empty layer (zero-config). `$OMNIGRAPH_HOME` relocates the *directory* only, not a specific file. (`--config`/`$OMNIGRAPH_CONFIG` is a separate flag for the cluster / server config — not this.)
|
||||
|
||||
Credentials live outside config: `echo $TOKEN | omnigraph login intel-dev`
|
||||
writes `~/.omnigraph/credentials` (`0600`); the matching token resolves via
|
||||
`OMNIGRAPH_TOKEN_INTEL_DEV` or that file.
|
||||
|
||||
**Addressing a graph**: `--store <file://|s3:// uri>` or a positional URI for
|
||||
direct storage; `--server <name|url>` (+ `--graph <id>`) for a served remote;
|
||||
`--profile <name>` for a named bundle; else the operator `defaults`. A remote is
|
||||
addressed with `--server` (a bare `http(s)://` URL is not a graph address). Run
|
||||
data-plane commands from a graph's project folder so relative `queries/`,
|
||||
`schema.pg`, and `.env.omni` paths resolve.
|
||||
|
||||
### What to commit
|
||||
|
||||
**Commit:** `schema.pg`, `queries/*.gq`, `cluster.yaml`, `seed.md`, `seed.jsonl`, and the project's `README.md` and `CLAUDE.md`.
|
||||
|
||||
**Ignore:** `.env.omni` (credentials), `.claude/` (local agent state), `*.omni/` (local graph artifacts), `__cluster/` and `graphs/` (cluster state + derived graph roots).
|
||||
|
||||
### Give agents a `CLAUDE.md`
|
||||
|
||||
A per-project `CLAUDE.md` tells coding agents where files live and what conventions matter. Without it, agents re-discover the same things every session.
|
||||
|
||||
## Common Gotchas
|
||||
|
||||
These are the traps most likely to bite. Scan this table before debugging any parse or runtime error.
|
||||
|
||||
| Trap | Symptom | Fix |
|
||||
|------|---------|-----|
|
||||
| `#` comments in `.pg` | `parse error: expected schema_file` | Use `//` |
|
||||
| Standalone `enum Foo { ... }` block | `parse error: expected EOI or schema_decl` | Inline: `kind: enum(a, b)` |
|
||||
| `[Category]` (list of enum) | compile error | Use `[String]`; lists must contain scalars |
|
||||
| `@embed(text)` without quotes | `unexpected constraint_name` | `@embed("text")` |
|
||||
| `@unique(src)` on edge without body block | parse error | `@card(1..1) { @unique(src) }` |
|
||||
| `load --mode merge` after `@embed` source change | stale embeddings | `omnigraph embed --reembed_all` or `load --mode overwrite` |
|
||||
| `schema apply` with feature branches open | rejected | Merge or delete branches first |
|
||||
| `nearest(...)` / `bm25(...)` / `rrf(...)` without `limit` | compile error | Add `limit N` |
|
||||
| Adding non-nullable property without backfill | unsupported migration | Make optional → backfill → tighten in follow-up apply |
|
||||
| `omnigraph init --json` | `unexpected argument --json` | `init` doesn't support `--json`; drop the flag |
|
||||
| `omnigraph init` on an already-initialized URI | `AlreadyInitialized` error (v0.6.0+) | `--force` to re-init (skips the schema preflight; does **not** purge data) |
|
||||
| `schema apply` dropping a property/type | soft-dropped or rejected (no data loss) | add `--allow-data-loss` to actually drop the column |
|
||||
| Committing `.env.omni` | credential leak | Add `.env*` to `.gitignore` |
|
||||
| Non-parameterized query values | typecheck surprise, injection risk | Declare `$param: Type` and pass via `--params` |
|
||||
| Missing required field in `insert` | `T12: insert for 'X' must provide non-nullable property 'Y'` | Accept the param in the mutation signature |
|
||||
| Long-lived feature branches | merge conflicts, schema apply blocked | Merge promptly; delete when done |
|
||||
| `mutation { ... }` wrapper in `.gq` | `parse error: expected query_file` at line 1 | Use `query <name>(...) { insert T { ... } }`; there is no top-level `mutation` keyword |
|
||||
| `--config` placed before subcommand | `unexpected argument --config` | Put `--config` **after** the subcommand (e.g. `omnigraph schema show --config X`) |
|
||||
| Reading a large schema via stdout-capped tool | Truncated, garbled, or duplicated output | `omnigraph schema show > /tmp/schema.pg` first; then read the file with offset/limit |
|
||||
| `omnigraph load` without `--mode` | error: `--mode` is required | Pass `--mode merge\|append\|overwrite` — there is no default (overwrite is destructive, so it is never implicit). `load` works against local and remote URIs |
|
||||
| Blind retry after 504 | Duplicate Signal/Decision/Claim (append-only types lack `@key` dedup) | `commit list --branch main --json` first; head advanced means it landed; only retry if unchanged |
|
||||
| `sync_branch()` mentioned in version-drift error | Searching for nonexistent CLI command | Server-internal directive in error text; just retry — the next call re-pins to the new head |
|
||||
| Stale empty branches at `main`'s head | 504-orphaned forks from a timed-out `load --from`; eventually block writes | List branches, find ones at `main`'s `graph_commit_id`, `omnigraph branch delete --config X <name>` |
|
||||
| `omnigraph schema apply` / `init` on a cluster-managed graph | refused — bypasses the cluster ledger | Evolve cluster graphs via `omnigraph cluster apply --config .`; `schema apply`/`init` are for a non-cluster store |
|
||||
| `omnigraph optimize` against a table with a `Blob` property | table is **skipped**, not failed (Lance blob-v2 compaction bug) | Expected — `--json` reports it under `skipped`; non-blob tables still compact |
|
||||
| `@unique` on a `[List]`/`Blob` column | `load` now errors loudly (was silently un-enforced before #160) | Use `@unique` only on scalar columns (and composite `@unique(a, b)`, now keyed as a true tuple) — uniqueness needs a type that reduces to a scalar key |
|
||||
|
||||
## Deep Dives
|
||||
|
||||
- `references/cluster.md` — cluster-mode declarative deployments: cluster.yaml, the validate/import/plan/apply loop, approval-gated deletes, `--cluster` serving, the two-file contract, recovery
|
||||
|
||||
For anything beyond the basics, load the relevant reference file. Each is self-contained — load only what you need.
|
||||
|
||||
| Reference | When to load |
|
||||
|-----------|--------------|
|
||||
| [`references/schema.md`](references/schema.md) | Editing `.pg` files, running `schema plan`/`apply`, renaming types, backfilling required fields |
|
||||
| [`references/queries.md`](references/queries.md) | Writing or linting `.gq` files, search functions, aggregations, multi-hop patterns |
|
||||
| [`references/data.md`](references/data.md) | Choosing between `mutate` and `load` (required `--mode`, `--from` to fork a review branch); branch review workflow; destructive ops |
|
||||
| [`references/remote-ops.md`](references/remote-ops.md) | Operating against a remote/CloudFront-fronted graph: 504 verification ritual, version drift, fork-branch 504 fingerprints, append-only retry safety, operator `--server`/`login` targeting |
|
||||
| [`references/search.md`](references/search.md) | Embeddings, `@embed`, vector/text ranking, scope-then-rank pattern |
|
||||
| [`references/aliases.md`](references/aliases.md) | Defining aliases for agents, structured output, JSON args |
|
||||
| [`references/stored-queries.md`](references/stored-queries.md) | Server-side stored-query registry: declared in `cluster.yaml`, `omnigraph queries validate/list`, `GET /graphs/{id}/queries` + `POST /graphs/{id}/queries/{name}`, `invoke_query` Cedar gating |
|
||||
| [`references/server-policy.md`](references/server-policy.md) | Starting the HTTP server, routes, bearer auth, Cedar policy gating, multi-graph mode |
|
||||
| [`references/commands.md`](references/commands.md) | `snapshot`, `export`, `commit list/show`, addressing & resolution |
|
||||
| [`references/migrations.md`](references/migrations.md) | Migrating a pre-0.7.0 setup, or you hit an old config/command/flag/route/error and need its current form |
|
||||
141
skills/omnigraph/references/aliases.md
Normal file
141
skills/omnigraph/references/aliases.md
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
# Aliases & Agent Automation
|
||||
|
||||
## Contents
|
||||
- What an alias is
|
||||
- Operator alias schema
|
||||
- Args binding & JSON-first parsing
|
||||
- Default to structured output
|
||||
- Alias naming convention
|
||||
- Secrets don't belong in aliases
|
||||
- Example alias set
|
||||
- Invocation patterns
|
||||
|
||||
How to wire Omnigraph operations for agents and scripts.
|
||||
|
||||
## What an alias is
|
||||
|
||||
An **operator alias** decouples a stable **operation name** from its implementation, so an agent calling `omnigraph alias signal …` keeps working as the query evolves. Aliases live in `~/.omnigraph/config.yaml` and are personal *bindings* to a **stored query on a named server** — they carry no query content; the stored query in the cluster catalog is the team's contract.
|
||||
|
||||
```yaml
|
||||
# ~/.omnigraph/config.yaml
|
||||
aliases:
|
||||
triage:
|
||||
server: intel-dev # an entry under servers:
|
||||
graph: spike # optional (multi-graph servers)
|
||||
query: weekly_triage # the STORED query's name — never a file
|
||||
args: [since] # positional args → params, in order
|
||||
params: { limit: 20 } # fixed defaults; positionals/--params win
|
||||
format: table
|
||||
```
|
||||
|
||||
```bash
|
||||
omnigraph alias triage 2026-06-01
|
||||
# → POST <intel-dev>/graphs/spike/queries/weekly_triage with the keyed credential
|
||||
```
|
||||
|
||||
> **Alias vs stored query.** The alias is *yours* (a personal name + defaults); the **stored query** it points at is the *team's* — declared in `cluster.yaml`, type-checked and served by the cluster (`GET /graphs/<id>/queries`, `POST /graphs/<id>/queries/<name>`, gated by `invoke_query`). See [`stored-queries.md`](stored-queries.md).
|
||||
## Operator Alias Schema
|
||||
|
||||
```yaml
|
||||
aliases:
|
||||
<alias-name>:
|
||||
server: <server-name> # an entry under servers: in ~/.omnigraph/config.yaml
|
||||
graph: <graph-id> # optional: for multi-graph servers
|
||||
query: <stored-query> # the stored query's NAME (never a file path)
|
||||
args: [<name1>, <name2>] # positional CLI args → named params, in order
|
||||
params: { <k>: <v> } # fixed default params; positionals / --params win
|
||||
format: table|kv|csv|jsonl|json # optional: output format
|
||||
```
|
||||
|
||||
Dispatch with `omnigraph alias <name> [args]` — one subcommand for read **and** write stored queries (a mutation alias is double-gated by `invoke_query` + `change`). Aliases live in their own namespace, so one can never shadow or be shadowed by a built-in verb.
|
||||
|
||||
### `args` bind to query parameters
|
||||
|
||||
If `args: [slug, name, age]`, then:
|
||||
|
||||
```bash
|
||||
omnigraph alias foo sig-bar "Some Name" 29
|
||||
```
|
||||
|
||||
...maps to `{"slug":"sig-bar","name":"Some Name","age":29}`.
|
||||
|
||||
### Args are JSON-first
|
||||
|
||||
Each arg is parsed as JSON first, then falls back to string:
|
||||
- `29` → integer
|
||||
- `"29"` → string
|
||||
- `true` → boolean
|
||||
- `Alice` → string (JSON parse fails, falls back)
|
||||
- `{"x":1}` → object
|
||||
|
||||
Explicit `--params '{...}'` wins on key conflict.
|
||||
|
||||
## Default to Structured Output
|
||||
|
||||
For scripts and agents, prefer `jsonl` or `json`; `table` is for humans. Set a default in `~/.omnigraph/config.yaml`:
|
||||
|
||||
```yaml
|
||||
defaults:
|
||||
output: jsonl
|
||||
```
|
||||
|
||||
Or per-alias (`format: jsonl`), or per-call (`--format jsonl`).
|
||||
|
||||
### When to use which
|
||||
|
||||
- **`jsonl`** — one JSON object per line, first line is metadata; streams; ideal for agents
|
||||
- **`json`** — pretty-printed JSON array; smaller results; human-readable
|
||||
- **`kv`** — `key: value` per line; good for single-row lookups
|
||||
- **`csv`** — for spreadsheets or line-count-heavy analysis
|
||||
- **`table`** — default human view; don't use in automation
|
||||
|
||||
## Alias Naming Convention
|
||||
|
||||
Short, hyphenated, matches the conceptual operation:
|
||||
|
||||
- `signal`, `pattern`, `element` — single lookup (typical pair with `format: kv`)
|
||||
- `signals`, `patterns`, `elements` — list
|
||||
- `signal-patterns`, `pattern-signals` — traversals
|
||||
- `add-signal`, `link-forms-pattern` — mutations
|
||||
|
||||
## Secrets Don't Belong in Aliases
|
||||
|
||||
Credentials never live in an alias or any config file. For remote servers, `omnigraph login <server>` stores the bearer token in `~/.omnigraph/credentials` (`0600`); for S3-backed storage, AWS creds go in `.env.omni`. Aliases should only contain query names and parameter bindings — never tokens, passwords, or API keys.
|
||||
|
||||
## Example Alias Set
|
||||
|
||||
```yaml
|
||||
# ~/.omnigraph/config.yaml
|
||||
servers:
|
||||
intel-dev: { url: https://graph.example.com }
|
||||
aliases:
|
||||
# Lookups (kv format for single-row readability)
|
||||
signal: { server: intel-dev, graph: spike, query: get_signal, args: [slug], format: kv }
|
||||
pattern: { server: intel-dev, graph: spike, query: get_pattern, args: [slug], format: kv }
|
||||
# Lists
|
||||
signals: { server: intel-dev, graph: spike, query: recent_signals }
|
||||
# Traversals
|
||||
pattern-signals: { server: intel-dev, graph: spike, query: pattern_signals, args: [slug] }
|
||||
# Mutations (stored mutation; invoke_query + change)
|
||||
add-signal: { server: intel-dev, graph: spike, query: add_signal, args: [slug, name, brief, stagingTimestamp, createdAt, updatedAt] }
|
||||
link-forms-pattern: { server: intel-dev, graph: spike, query: link_signal_forms_pattern, args: [signal, pattern] }
|
||||
```
|
||||
|
||||
Each `query:` names a stored query the cluster serves — declare them in `cluster.yaml` and `cluster apply` first (see [`stored-queries.md`](stored-queries.md)).
|
||||
|
||||
## Invocation Patterns
|
||||
|
||||
```bash
|
||||
# Invoke an alias (read or write — the bound stored query decides)
|
||||
omnigraph alias signal sig-kimi-k25
|
||||
omnigraph alias add-signal sig-new "Name" "Brief" \
|
||||
2026-04-14T00:00:00Z 2026-04-14T00:00:00Z 2026-04-14T00:00:00Z
|
||||
|
||||
# Override output format
|
||||
omnigraph alias signals --format jsonl
|
||||
|
||||
# Explicit --params (wins over positional args on key conflict)
|
||||
omnigraph alias signal --params '{"slug":"sig-override"}'
|
||||
```
|
||||
|
||||
The `alias` subcommand carries `--params`/`--params-file`, `--format`/`--json`, and `--config`; the server, graph, and stored-query name come from the binding. For a different server/graph or a branch read, call `query`/`mutate` directly.
|
||||
128
skills/omnigraph/references/cluster.md
Normal file
128
skills/omnigraph/references/cluster.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
# Cluster Mode — Declarative Deployments
|
||||
|
||||
## Contents
|
||||
- The model
|
||||
- The loop (validate → import → plan → apply → serve)
|
||||
- The config contract (`cluster.yaml` vs `~/.omnigraph/config.yaml`)
|
||||
- Serving (`--cluster`, config-free bucket boot)
|
||||
- Recovery cheat-sheet
|
||||
|
||||
The cluster control plane (omnigraph >= 0.7.0) manages a whole deployment —
|
||||
graphs, schemas, stored queries, Cedar policies — as **declared files in one
|
||||
directory**, converged Terraform-style. It is the **only way to serve** a
|
||||
graph (the server is cluster-only); the data-plane operations in the other
|
||||
references work against the cluster's graphs unchanged.
|
||||
|
||||
## The model
|
||||
|
||||
```
|
||||
company-brain/
|
||||
├── cluster.yaml # the deployment: graphs, schemas, queries, policies
|
||||
├── schema.pg
|
||||
├── queries/*.gq
|
||||
├── *.policy.yaml
|
||||
├── graphs/<id>.omni # DERIVED — created by apply, never by hand (gitignore)
|
||||
└── __cluster/ # ledger + catalog + approvals — local state (gitignore)
|
||||
```
|
||||
|
||||
```yaml
|
||||
# cluster.yaml
|
||||
version: 1
|
||||
# storage: s3://my-bucket/clusters/company-brain # optional — put ledger,
|
||||
# catalog, and graph roots on S3 object storage (default: this folder)
|
||||
state: { backend: cluster, lock: true }
|
||||
graphs:
|
||||
knowledge:
|
||||
schema: schema.pg
|
||||
queries: queries/ # the .gq files ARE the declaration — every `query <name>` registers
|
||||
policies:
|
||||
base: { file: base.policy.yaml, applies_to: [knowledge] } # or [cluster] for server-level
|
||||
```
|
||||
|
||||
`queries` also accepts a file list (`[a.gq, b.gq]`) or a fine-grained
|
||||
`name: { file: ... }` map. Discovery is loud: unparseable files and duplicate
|
||||
names across files fail validation.
|
||||
|
||||
## The loop (memorize this)
|
||||
|
||||
```bash
|
||||
omnigraph cluster validate --config . # parse + typecheck everything
|
||||
omnigraph cluster import --config . # one-time: create the state ledger
|
||||
omnigraph cluster plan --config . # preview — REQUIRED reading before apply
|
||||
omnigraph cluster apply --config . --as <you> # converge (idempotent)
|
||||
omnigraph-server --cluster . --bind 127.0.0.1:8080 --unauthenticated # serve (local dev)
|
||||
```
|
||||
|
||||
- **`apply` creates graphs** at `graphs/<id>.omni` — there is no separate
|
||||
`omnigraph init` in cluster mode.
|
||||
- **Schema changes**: edit the `.pg`, `plan` shows the engine's real migration
|
||||
steps (`add_property`, `drop_property [soft]`, `unsupported: …`), `apply`
|
||||
migrates the live graph. **Soft drops only** — data-loss migrations are not
|
||||
reachable from cluster apply (prior versions retain dropped columns).
|
||||
- **Applied = serving on the next server restart.** No hot reload.
|
||||
- **`storage: s3://bucket/prefix`** (optional) puts the entire cluster — state
|
||||
ledger, lock, content-addressed catalog, recovery sidecars, approval
|
||||
artifacts, and the derived graph roots (`<storage>/graphs/<id>.omni`) — on
|
||||
S3-compatible object storage. The ledger CAS uses S3 conditional writes and
|
||||
the lock becomes genuinely cross-machine. Absent, everything defaults to the
|
||||
config directory (byte-compatible with pre-existing clusters). Credentials
|
||||
come from the standard `AWS_*` env contract, never `cluster.yaml`.
|
||||
- **`--as <actor>` attributes every run** (sidecars, audit, engine commits).
|
||||
Defaults from your operator config's `operator.actor`; required for `approve`.
|
||||
- **Destructive changes are gated**: removing a graph from `cluster.yaml`
|
||||
blocks with `approval_required` until
|
||||
`omnigraph cluster approve graph.<id> --config . --as <you>` records a
|
||||
digest-bound approval. Any config/state drift after approving invalidates it.
|
||||
- **Drift**: `cluster refresh` re-observes live graphs and marks out-of-band
|
||||
changes `drifted`; the next `apply` converges them back to the declaration.
|
||||
- **Data is NOT cluster's job**: rows flow through `omnigraph load / mutate`
|
||||
against the derived roots, with branches as usual.
|
||||
|
||||
## The config contract (do not blur this)
|
||||
|
||||
| File | Owns | Read by |
|
||||
|---|---|---|
|
||||
| `cluster.yaml` | the deployment: graph set, schemas, stored queries, policy bindings, storage | `cluster` commands; the `--cluster` server |
|
||||
| `~/.omnigraph/config.yaml` | per-operator: identity (`operator.actor`), named `servers:`, output defaults, personal aliases | data-plane CLI commands (tokens live in `~/.omnigraph/credentials` via `omnigraph login`) |
|
||||
|
||||
Cluster commands read the operator config for **exactly one thing**: the actor
|
||||
default when `--as` is omitted (`--as` > `operator.actor`). A `--cluster` server
|
||||
reads it for **nothing** — boot from cluster state XOR the operator file, never
|
||||
a merge.
|
||||
Address a cluster-managed graph's data directly with `--store <storage>/graphs/<id>.omni`,
|
||||
or via `--server`/aliases against a serving instance — that is ergonomics, not
|
||||
coupling.
|
||||
|
||||
## Serving
|
||||
|
||||
`omnigraph-server --cluster <dir>` is exclusive (cannot combine with a URI,
|
||||
`--target`, or `--config`), always multi-graph (`/graphs/{id}/...`), and
|
||||
fail-fast: missing/pending/tampered state refuses boot with a remedy. Every
|
||||
declared query is exposed (`GET /graphs/<id>/queries`, `POST
|
||||
/graphs/<id>/queries/<name>`); Cedar bundles attach via `applies_to`
|
||||
(`cluster` → server-level gate incl. `graph_list`; `graph.<id>` → that
|
||||
graph's gate incl. `invoke_query`). Bearer tokens and bind stay process-level
|
||||
(env/flags).
|
||||
|
||||
**Config-free serving.** `--cluster` also accepts the storage-root URI
|
||||
directly — `omnigraph-server --cluster s3://bucket/prefix` boots from the
|
||||
applied revision on the bucket with **no checkout of the config repo**. The
|
||||
ledger and catalog on the bucket are the whole deployment artifact; policy
|
||||
bundles serve as digest-verified content from the catalog. The preferred
|
||||
container shape is **bucket, no volume** (AWS ECS / Railway recipes in the
|
||||
omnigraph repo's `docs/user/deployment.md`). For a mounted config directory
|
||||
instead, `OMNIGRAPH_CLUSTER=<dir>` works and the image ships the CLI for
|
||||
in-container `cluster apply`.
|
||||
|
||||
## Recovery cheat-sheet
|
||||
|
||||
| Symptom | Fix |
|
||||
|---|---|
|
||||
| Apply crashed mid-run | run `cluster apply` again — sidecars + sweep reconcile |
|
||||
| Held lock | `cluster status` (shows lock id) → `cluster force-unlock <LOCK_ID> --config .` |
|
||||
| Lost/corrupt `state.json` | `cluster import` rebuilds from config + live graphs, then `apply` |
|
||||
| Server refuses to boot | the error names its remedy (usually `cluster refresh` + `apply`, restart) |
|
||||
| `approval_stale` warning | re-run `cluster approve` — the plan changed since you approved |
|
||||
|
||||
Full reference: the omnigraph repo's `docs/user/clusters/index.md` (operator guide)
|
||||
and `docs/user/clusters/config.md` (every key, flag, and diagnostic).
|
||||
237
skills/omnigraph/references/commands.md
Normal file
237
skills/omnigraph/references/commands.md
Normal file
|
|
@ -0,0 +1,237 @@
|
|||
# Reference Commands
|
||||
|
||||
## Contents
|
||||
- Inspect state (snapshot, export)
|
||||
- Branches · commits · graphs
|
||||
- Schema · lint · embed · init
|
||||
- Load (bulk JSONL)
|
||||
- Query / mutate
|
||||
- Maintenance (optimize, cleanup)
|
||||
- Stored queries
|
||||
- Operator config & credentials
|
||||
- Config resolution order
|
||||
- Output formats · health check
|
||||
- Cluster control plane
|
||||
|
||||
Commands you'll reach for but don't need best-practice rules around. Quick syntax reference.
|
||||
|
||||
## Inspect State
|
||||
|
||||
### `snapshot` — tables + row counts
|
||||
|
||||
```bash
|
||||
omnigraph snapshot $REPO --branch main --json
|
||||
```
|
||||
|
||||
Returns the manifest: all node/edge tables with row counts and versions. Use this to verify a load succeeded or to see what types exist.
|
||||
|
||||
### `export` — full JSONL dump
|
||||
|
||||
```bash
|
||||
omnigraph export $REPO --branch main > graph.jsonl
|
||||
```
|
||||
|
||||
Streams all nodes and edges as JSONL. The right tool for large-snapshot inspection. Don't try to page through the whole graph with read queries.
|
||||
|
||||
Filter by type:
|
||||
|
||||
```bash
|
||||
omnigraph export $REPO --branch main --type Signal > signals.jsonl
|
||||
```
|
||||
|
||||
## Branches
|
||||
|
||||
```bash
|
||||
omnigraph branch create --from main <branch-name> --store $REPO
|
||||
omnigraph branch list --store $REPO
|
||||
omnigraph branch merge <branch-name> --into main --store $REPO
|
||||
omnigraph branch delete <branch-name> --store $REPO
|
||||
```
|
||||
|
||||
All support `--json`.
|
||||
|
||||
## Commits (History)
|
||||
|
||||
```bash
|
||||
omnigraph commit list $REPO --branch main
|
||||
omnigraph commit show $REPO <commit-id>
|
||||
```
|
||||
|
||||
Inspect graph history. Useful for "what changed between these two points" investigation.
|
||||
|
||||
## Graphs (multi-graph servers)
|
||||
|
||||
```bash
|
||||
omnigraph graphs list --config X --json
|
||||
```
|
||||
|
||||
Lists the graphs a multi-graph server serves. Remote servers only (rejects local URIs); the server must expose `GET /graphs` via `server.policy.file`. See `references/server-policy.md`.
|
||||
|
||||
## Schema
|
||||
|
||||
```bash
|
||||
omnigraph schema plan --schema next.pg $REPO --json
|
||||
omnigraph schema apply --schema next.pg $REPO
|
||||
```
|
||||
|
||||
See `references/schema.md` for the full workflow.
|
||||
|
||||
## Lint
|
||||
|
||||
```bash
|
||||
omnigraph lint --schema schema.pg --query queries/foo.gq --json
|
||||
# or against a live repo:
|
||||
omnigraph lint --query queries/foo.gq $REPO --json
|
||||
```
|
||||
|
||||
`lint` is the single query-validation command. See `references/queries.md`.
|
||||
|
||||
## Embed
|
||||
|
||||
```bash
|
||||
omnigraph embed --seed embed-config.yaml # fill missing
|
||||
omnigraph embed --seed embed-config.yaml --reembed_all # regenerate all
|
||||
omnigraph embed --seed embed-config.yaml --clean # delete
|
||||
omnigraph embed --seed embed-config.yaml --select "Type:field=value"
|
||||
```
|
||||
|
||||
See `references/search.md`.
|
||||
|
||||
## Init
|
||||
|
||||
```bash
|
||||
omnigraph init --schema schema.pg $REPO
|
||||
```
|
||||
|
||||
Creates a new graph at `$REPO` with the given schema. Declare the deployment in a `cluster.yaml` (see `references/cluster.md`).
|
||||
|
||||
**Strict by default (v0.6.0+):** `init` against a URI that already holds schema files errors with `AlreadyInitialized` instead of silently overwriting. Use `omnigraph init --force` to re-init deliberately. `--force` only skips the schema-file preflight — it does **not** purge existing Lance datasets.
|
||||
|
||||
**Note:** `init` does not accept `--json`. Drop the flag if you see `unexpected argument --json`.
|
||||
|
||||
## Load (bulk JSONL)
|
||||
|
||||
```bash
|
||||
# bare load: operates on an existing branch (default main); --mode is required
|
||||
omnigraph load --data seed.jsonl --mode merge $REPO
|
||||
|
||||
# --from forks a missing branch from <base>, then loads onto it (one-shot review branch)
|
||||
omnigraph load --data delta.jsonl --branch feature-x --from main --mode merge $REPO
|
||||
```
|
||||
|
||||
`--mode` is **required** (no default): `merge`, `append`, or `overwrite`. `load` works against local **and** remote URIs. See `references/data.md`.
|
||||
|
||||
## Query / Mutate
|
||||
|
||||
```bash
|
||||
omnigraph query get_signal --query queries/signals.gq --params '{"slug":"sig-foo"}' # ad-hoc file; <name> is positional
|
||||
omnigraph query get_signal --server intel-dev --params '{"slug":"sig-foo"}' # served stored query by name
|
||||
omnigraph mutate add_signal --query queries/mutations.gq --params '{"slug":"sig-foo",...}'
|
||||
```
|
||||
|
||||
With aliases:
|
||||
|
||||
```bash
|
||||
omnigraph alias signal sig-foo
|
||||
omnigraph alias add-signal sig-foo "Name" "Brief" 2026-04-14T00:00:00Z 2026-04-14T00:00:00Z 2026-04-14T00:00:00Z
|
||||
```
|
||||
|
||||
> `query` and `mutate` also accept inline source via `-e/--query-string '<gq>'` instead of `--query <file>`.
|
||||
|
||||
## Maintenance: Optimize & Cleanup (v0.6.1)
|
||||
|
||||
### `optimize` — non-destructive Lance compaction
|
||||
|
||||
```bash
|
||||
omnigraph optimize $REPO --json
|
||||
```
|
||||
|
||||
Compacts fragments and reclaims deleted-row space. Non-destructive — safe to run any time. **Skips tables with a `Blob` property** (Lance blob-v2 compaction decode bug); skipped tables are reported in the `skipped` field of `--json` output and in logs. Non-blob tables compact normally. Blob-table fragment count won't shrink until the upstream Lance fix lands — reads/writes are unaffected.
|
||||
|
||||
### `cleanup` — destructive version GC
|
||||
|
||||
```bash
|
||||
omnigraph cleanup $REPO --keep 5 --older-than 7d --confirm
|
||||
```
|
||||
|
||||
Garbage-collects old table versions, dropping time-travel reachability for anything pruned. **Destructive** — requires `--confirm`. Duration units for `--older-than`: `s`, `m`, `h`, `d`, `w`. Also reconciles orphaned per-table forks left by an interrupted `branch delete`.
|
||||
|
||||
## Stored Queries (v0.6.1)
|
||||
|
||||
```bash
|
||||
omnigraph queries validate # type-check the stored-query registry vs the live schema (offline; exits non-zero on drift)
|
||||
omnigraph queries list # list registry query names, MCP exposure, and typed params
|
||||
```
|
||||
|
||||
`validate` opens the addressed graph and type-checks every applied stored query against the live schema — catches drift without restarting the server. `list` prints that graph's registry. Address the graph with `--store <uri>` or a positional URI. Distinct from `lint` (which validates a single `.gq` file). See `references/stored-queries.md`.
|
||||
|
||||
## Operator Config & Credentials
|
||||
|
||||
```bash
|
||||
echo "$TOKEN" | omnigraph login <server> # store a bearer token in ~/.omnigraph/credentials (0600)
|
||||
omnigraph logout <server> # remove it (idempotent)
|
||||
```
|
||||
|
||||
The operator config and `~/.omnigraph/credentials` are **auto-discovered — there is no flag to point at them.** `$OMNIGRAPH_HOME` relocates the `~/.omnigraph` *directory* (mainly for test isolation), and an absent file is just an empty layer (zero-config). Separately, `$OMNIGRAPH_CONFIG` stands in for the `--config` flag — which targets the **cluster directory / server config**, never the operator config. See SKILL.md → *The two config surfaces*.
|
||||
|
||||
## Addressing a Graph
|
||||
|
||||
How the CLI resolves which graph a data command (`query`, `mutate`, `load`, `branch`, …) runs against. A remote is addressed with `--server` (a bare `http(s)://` URL is not a graph address).
|
||||
|
||||
Precedence (highest first):
|
||||
|
||||
1. **`--store <uri>`** or a **positional `file://`/`s3://` URI** — direct storage access (bypasses any server; no catalog, so stored-query *names* don't resolve). `--store` is exclusive with a positional URI and with `--server`.
|
||||
2. **`--server <name|url>`** (+ `--graph <id>` for a multi-graph server) — served/remote. A name resolves from `servers:` in `~/.omnigraph/config.yaml`; a literal `http(s)://` URL also works.
|
||||
3. **`--profile <name>`** (or `$OMNIGRAPH_PROFILE`) — a named scope bundle from `profiles:` in the operator config (binds one of server/cluster/store + a default graph).
|
||||
4. **Operator defaults** — `defaults.server` + `defaults.default_graph`, or `defaults.store` for a zero-flag local scope (mutually exclusive with `defaults.server`).
|
||||
|
||||
Control-plane commands use `--config <dir>` (cluster); maintenance against a cluster-managed graph uses `--cluster <dir|s3://> --graph <id>`. Each command declares a **capability** — `any` / `served` / `direct` / `control` / `local` — shown in `omnigraph --help`; mis-addressing (e.g. `--server` on a `direct` verb, or a remote URI to `optimize`) fails loudly.
|
||||
|
||||
For query source (`query`/`mutate`):
|
||||
|
||||
1. **`--query <file>`** or **`-e/--query-string '<gq>'`** — exactly one (operator aliases are invoked via the separate `alias` subcommand)
|
||||
2. Relative `--query` paths resolve through **`query.roots`** in config
|
||||
|
||||
For params:
|
||||
|
||||
1. **Explicit `--params '{...}'`** wins on key conflict
|
||||
2. **Positional alias args** map to alias `args` list
|
||||
|
||||
## Output Formats
|
||||
|
||||
`--format <fmt>` on query/mutate:
|
||||
|
||||
- `table` (default) — human-readable
|
||||
- `kv` — `key: value` per line; good for single rows
|
||||
- `csv` — comma-separated
|
||||
- `jsonl` — NDJSON, one per line, with metadata line first
|
||||
- `json` — pretty JSON array
|
||||
|
||||
For admin commands (branch, commit, schema, policy): use `--json` for structured output, otherwise human text.
|
||||
|
||||
## Health Check
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:8080/healthz
|
||||
```
|
||||
|
||||
Returns `200 OK` if the server is up.
|
||||
|
||||
## Cluster Control Plane (omnigraph >= 0.7.0)
|
||||
|
||||
```bash
|
||||
omnigraph cluster validate --config <dir> # parse + typecheck the declaration
|
||||
omnigraph cluster import --config <dir> # one-time: create the state ledger
|
||||
omnigraph cluster plan --config <dir> [--json] # preview (schema changes show migration steps)
|
||||
omnigraph cluster apply --config <dir> --as <actor> # converge; idempotent
|
||||
omnigraph cluster approve <resource> --config <dir> --as <actor> # gate destructive changes (graph deletes)
|
||||
omnigraph cluster status --config <dir> [--json] # read the ledger (read-only)
|
||||
omnigraph cluster refresh --config <dir> # re-observe live graphs; flags drift
|
||||
omnigraph cluster force-unlock <LOCK_ID> --config <dir> # clear a crashed run's lock (exact id from status)
|
||||
```
|
||||
|
||||
Topology rule: `omnigraph schema apply` and `omnigraph init` **refuse a
|
||||
cluster-managed graph** — in a cluster their jobs belong to `cluster apply`.
|
||||
Data commands (`load`, `mutate`, branches) work either way — point them at the
|
||||
derived root (`<dir>/graphs/<id>.omni`, or `<storage>/graphs/<id>.omni` for an
|
||||
S3-backed cluster). See `references/cluster.md`.
|
||||
175
skills/omnigraph/references/data.md
Normal file
175
skills/omnigraph/references/data.md
Normal file
|
|
@ -0,0 +1,175 @@
|
|||
# Data Changes & Branches
|
||||
|
||||
## Contents
|
||||
- Choose the right write command
|
||||
- `mutate` — single edits
|
||||
- `load` — bulk JSONL (`--mode`, `--from`)
|
||||
- Branches: review before merge
|
||||
- Destructive ops go through a branch
|
||||
- Branch commands
|
||||
- Inspecting state after changes
|
||||
|
||||
How to modify data safely in Omnigraph.
|
||||
|
||||
## Choose the Right Write Command
|
||||
|
||||
`load` is the one bulk-JSONL command — local **or** remote, against any
|
||||
existing branch, with a **required** `--mode`. `mutate` is for single typed
|
||||
edits.
|
||||
|
||||
| Task | Command | Why |
|
||||
|------|---------|-----|
|
||||
| Add/update a single entity | `mutate` with a named mutation | typechecked, parameterized, auditable |
|
||||
| Bulk upsert by `@key` | `load --mode merge` | preserves rows not in the file |
|
||||
| Additive-only bulk | `load --mode append` | fails on key collision |
|
||||
| Clean-slate reseed | `load --mode overwrite` | **destructive** — wipes the branch |
|
||||
| Bulk load onto a fresh review branch | `load --from main --mode merge --branch <name>` | forks `<name>` from `main`, loads onto it, leaves it for review |
|
||||
|
||||
> **`--mode` is required** — there is no default. Overwrite is destructive, so
|
||||
> the CLI never picks a mode for you.
|
||||
>
|
||||
> **Local and remote are one command.** `load` works against a local repo URI
|
||||
> (writing storage directly) *and* a remote `omnigraph-server` endpoint (the
|
||||
> server orchestrates the write and publishes one atomic commit). See
|
||||
> [`references/remote-ops.md`](remote-ops.md) for remote-specific concerns
|
||||
> (504 handling, write-verification ritual).
|
||||
|
||||
## `mutate` — Single Edits
|
||||
|
||||
Goes through the running server (the configured default graph, or an alias):
|
||||
|
||||
```bash
|
||||
omnigraph mutate add_signal \
|
||||
--query mutations.gq \
|
||||
--params '{"slug":"sig-foo","name":"Foo","brief":"...","stagingTimestamp":"2026-04-14T00:00:00Z","createdAt":"2026-04-14T00:00:00Z","updatedAt":"2026-04-14T00:00:00Z"}'
|
||||
```
|
||||
|
||||
Or via an alias:
|
||||
|
||||
```bash
|
||||
omnigraph alias add-signal sig-foo "Foo" "..." 2026-04-14T00:00:00Z 2026-04-14T00:00:00Z 2026-04-14T00:00:00Z
|
||||
```
|
||||
|
||||
Prefer `mutate` for interactive edits, mutations called from agents, and anything you want typechecked at call time.
|
||||
|
||||
## `load` — Bulk JSONL
|
||||
|
||||
JSONL format:
|
||||
|
||||
```jsonl
|
||||
{"type":"Signal","data":{"id":"sig-foo","slug":"sig-foo","name":"Foo","brief":"...","stagingTimestamp":"2026-04-14T00:00:00Z","createdAt":"2026-04-14T00:00:00Z","updatedAt":"2026-04-14T00:00:00Z"}}
|
||||
{"edge":"FormsPattern","from":"sig-foo","to":"pat-bar","data":{}}
|
||||
```
|
||||
|
||||
- Nodes: `{"type":"<NodeType>","data":{...props...}}` — `id` equals `slug`
|
||||
- Edges: `{"edge":"<EdgeType>","from":"<src_slug>","to":"<dst_slug>","data":{...edge_props...}}`
|
||||
|
||||
Load command:
|
||||
|
||||
```bash
|
||||
omnigraph load --data seed.jsonl --mode merge s3://my-bucket/repos/spike-intel
|
||||
```
|
||||
|
||||
`--from <base>` forks a missing `--branch` from `<base>` before loading (the
|
||||
one-shot review-branch flow below). Without `--from`, the target `--branch`
|
||||
(default `main`) must already exist.
|
||||
|
||||
### `--mode` semantics
|
||||
|
||||
- **`overwrite`** (destructive) — replaces every node/edge table on the branch with the file's contents. **Staged**: the loader validates node/edge constraints, referential integrity, and edge cardinality *before* any data moves, so a bad file fails before touching the branch. Safe on a **first** load; risky afterward. Don't run it against `main` in production without a branch backup path.
|
||||
- **`merge`** (upsert) — for each row, insert if `@key` is new, update if it exists. Rows not in the file are preserved. The safe default for incremental bulk updates.
|
||||
- **`append`** (strict insert) — fails on key collision. Use when you're certain every row is new.
|
||||
|
||||
### `merge` does NOT recompute embeddings
|
||||
|
||||
If you change seed rows that feed into `@embed("source")` via `load --mode merge`, the source field updates but the embedding stays stale.
|
||||
|
||||
**Fix:** run `omnigraph embed --reembed_all` after, or use `load --mode overwrite` once (which re-triggers embedding on load).
|
||||
|
||||
### `overwrite` is destructive
|
||||
|
||||
Wipes the entire branch's data for every node and edge type. Use only for:
|
||||
- First-time seed
|
||||
- Intentional full reseed on a feature branch
|
||||
- Recovery scenarios
|
||||
|
||||
Never on `main` without a branch backup.
|
||||
|
||||
## Branches: Review Before Merge
|
||||
|
||||
Branches exist for **data review**, not schema changes. Schema goes straight to `main` via `plan` + `apply`.
|
||||
|
||||
### The review loop
|
||||
|
||||
```bash
|
||||
REPO=s3://my-bucket/repos/spike-intel
|
||||
|
||||
# 1. Create feature branch from main
|
||||
omnigraph branch create --from main staging-2026-04-14 --store $REPO
|
||||
|
||||
# 2. Load delta onto the branch (merge mode is typical for review)
|
||||
omnigraph load --data delta.jsonl --branch staging-2026-04-14 --mode merge $REPO
|
||||
|
||||
# 3. Verify on the branch (reads can target --branch or --snapshot)
|
||||
omnigraph query recent_signals --query queries/signals.gq --branch staging-2026-04-14 --store $REPO
|
||||
|
||||
# 4. Merge to main when happy
|
||||
omnigraph branch merge staging-2026-04-14 --into main --store $REPO
|
||||
|
||||
# 5. Optionally delete the branch
|
||||
omnigraph branch delete staging-2026-04-14 --store $REPO
|
||||
```
|
||||
|
||||
### Fork a branch in one shot with `--from`
|
||||
|
||||
- Bare `load` operates on an existing branch (default `main`).
|
||||
- `load --from main --branch <name>` forks `<name>` from `main`, loads onto it, and leaves it for review — the whole review-branch flow in one command.
|
||||
|
||||
Use `--from` for anything you want reviewed before it touches `main`.
|
||||
|
||||
### Keep branches short-lived
|
||||
|
||||
Long-lived branches compound merge risk. The usual flow is: create → load → verify → merge → delete, all in the same session. A week-old feature branch is a yellow flag.
|
||||
|
||||
### Schema apply blocks non-main branches
|
||||
|
||||
`omnigraph schema apply` rejects the request if any non-main branches exist. Merge or delete them first. This is enforced — it's not just a guideline.
|
||||
|
||||
## Destructive Ops Go Through a Branch
|
||||
|
||||
For any bulk load that could disrupt downstream queries (overwriting a heavily-referenced node type, removing edges en masse, reseeding a core table), use a feature branch:
|
||||
|
||||
```bash
|
||||
omnigraph load --data risky.jsonl --branch recovery-2026-04-14 \
|
||||
--from main --mode overwrite $REPO
|
||||
# inspect, diff, verify reads
|
||||
omnigraph branch merge recovery-2026-04-14 --into main --store $REPO
|
||||
```
|
||||
|
||||
## Branch Commands (quick reference)
|
||||
|
||||
```bash
|
||||
omnigraph branch create --from main <branch-name> --store $REPO
|
||||
omnigraph branch list --store $REPO
|
||||
omnigraph branch merge <branch-name> --into main --store $REPO
|
||||
omnigraph branch delete <branch-name> --store $REPO
|
||||
```
|
||||
|
||||
All support `--json` for automation-friendly output. Address the graph with a
|
||||
positional `file://`/`s3://` URI (shown), `--store <uri>`, or `--server <name>`.
|
||||
|
||||
## Inspecting State After Changes
|
||||
|
||||
```bash
|
||||
omnigraph snapshot $REPO --branch main --json # tables + row counts
|
||||
omnigraph export $REPO --branch main > graph.jsonl # full JSONL dump
|
||||
omnigraph commit list $REPO --branch main --json # history
|
||||
```
|
||||
|
||||
`export` is the right tool for large-snapshot inspection — don't try to page through the whole graph with read queries.
|
||||
|
||||
> **Cluster note:** everything in this file applies unchanged in cluster
|
||||
> deployments — the control plane owns schema/queries/policies; rows, loads,
|
||||
> and branches stay on the data plane against the derived graph roots
|
||||
> (`<dir>/graphs/<id>.omni`, or `<storage>/graphs/<id>.omni` for an S3-backed
|
||||
> cluster).
|
||||
65
skills/omnigraph/references/migrations.md
Normal file
65
skills/omnigraph/references/migrations.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
# Migration & Deprecations (pre-0.7.0 → 0.7.0)
|
||||
|
||||
The rest of this skill teaches the **current 0.7.0 surface only**. Consult this page solely when you meet an old config file, command, flag, route, or error and need its current form. Pre-0.7.0 spellings keep working as deprecated aliases (they print a warning) unless marked **removed**.
|
||||
|
||||
## Config files
|
||||
|
||||
| Before (pre-0.7.0) | Now (0.7.0) |
|
||||
|---|---|
|
||||
| `omnigraph.yaml` (one combined file) | **`cluster.yaml`** (team deployment) + **`~/.omnigraph/config.yaml`** (operator) |
|
||||
| `cli.actor` | `operator.actor` |
|
||||
| `cli.graph` / `server.graph` | `defaults.default_graph` (+ `defaults.server`) |
|
||||
| `targets:` / `target:` | `graphs:` / `graph:` |
|
||||
| `omnigraph init` scaffolds `omnigraph.yaml` | `init` scaffolds nothing — start a `cluster.yaml` from [`cluster.md`](cluster.md) |
|
||||
|
||||
- **`omnigraph.yaml` is fully removed in 0.7.0** — no CLI command or server reads it, and there is **no `config migrate`**. Move team settings to `cluster.yaml` and personal settings (identity, `servers:`, `defaults:`, `aliases:`) to `~/.omnigraph/config.yaml` by hand.
|
||||
|
||||
## CLI addressing (RFC-011)
|
||||
|
||||
| Before | Now |
|
||||
|---|---|
|
||||
| `--target <name>` | **removed** — use `--server <name\|url>`, `--store <uri>`, or `--profile <name>` (SKILL.md → *Addressing a graph*) |
|
||||
| positional `http(s)://` URL → a server | **removed** — address a remote with `--server <url>` |
|
||||
| `--as` on a served (remote) write | no-op — the server resolves the actor from the bearer token (`--as` applies to direct `--store` writes) |
|
||||
| `--cluster-graph <id>` | **removed** — `--cluster <dir\|uri>` is a global scope; pick the graph with `--graph <id>`. `--graph` now selects within a `--server` *or* `--cluster` scope |
|
||||
| `query`/`mutate` `--name <q>` + positional graph URI / `--uri` | **removed** — the query name is the **positional** (`omnigraph query <name>`): a bare `<name>` invokes a served stored query (kind-asserted), `--query`/`-e` is the ad-hoc lane. Address the graph via `--server`/`--store`/`--profile` (not a positional URI on query/mutate) |
|
||||
|
||||
## Server boot & schema (RFC-011)
|
||||
|
||||
| Before | Now |
|
||||
|---|---|
|
||||
| `omnigraph-server <URI>` / `--config omnigraph.yaml` / `--target` / single-graph flat routes | **removed** — the server is **cluster-only**: `omnigraph-server --cluster <dir\|s3://>`; all HTTP is nested under `/graphs/<id>/...` (flat routes → 404) |
|
||||
| `omnigraph schema apply` on a cluster-managed graph | **refused** — evolve cluster graphs via `cluster apply` (the ledger). `schema apply` still works on a non-cluster store or via `--server` |
|
||||
| `policy …` / `queries validate` via `--config omnigraph.yaml` | `policy validate\|test\|explain` reads `--cluster <dir>` (+ `--graph`); `queries validate` takes the store URI |
|
||||
|
||||
## CLI verbs
|
||||
|
||||
| Before | Now |
|
||||
|---|---|
|
||||
| `omnigraph ingest …` | `omnigraph load --from main --mode merge …` |
|
||||
| `omnigraph read` | `omnigraph query` |
|
||||
| `omnigraph change` | `omnigraph mutate` |
|
||||
| `omnigraph query lint` / `query check` | `omnigraph lint` |
|
||||
| `omnigraph query --alias <n>` / `mutate --alias <n>` | `omnigraph alias <n>` (dedicated subcommand; the `--alias` flag was removed) |
|
||||
|
||||
## HTTP routes
|
||||
|
||||
| Before | Now |
|
||||
|---|---|
|
||||
| `POST /ingest` | `POST /load` |
|
||||
| `POST /read` | `POST /query` |
|
||||
| `POST /change` | `POST /mutate` |
|
||||
|
||||
The old routes remain as **deprecated aliases** (retained indefinitely), carrying `Deprecation: true` + `Link: <successor>` response headers.
|
||||
|
||||
## Server token resolution
|
||||
|
||||
| Before | Now |
|
||||
|---|---|
|
||||
| `graphs.<name>.bearer_token_env` in `omnigraph.yaml` | `omnigraph login <server>` → `~/.omnigraph/credentials`, or `OMNIGRAPH_TOKEN_<NAME>` |
|
||||
|
||||
The client bearer token now comes only from `OMNIGRAPH_TOKEN_<NAME>` or the credentials file — the `omnigraph.yaml` `bearer_token_env` chain is gone with the file.
|
||||
|
||||
## Older removals (still worth knowing)
|
||||
|
||||
- The transactional **Run** state machine, its `/runs` routes, and the `run_publish` / `run_abort` Cedar actions were **removed in v0.4.0**. Writes publish directly — use `GET /commits` for history and the `change` action for write gating; `/runs` returns 404.
|
||||
302
skills/omnigraph/references/queries.md
Normal file
302
skills/omnigraph/references/queries.md
Normal file
|
|
@ -0,0 +1,302 @@
|
|||
# Query Authoring & Linting
|
||||
|
||||
## Contents
|
||||
- File organization
|
||||
- Linting
|
||||
- Parameterization
|
||||
- Query structure
|
||||
- Search functions
|
||||
- Aggregations
|
||||
- Filter operators
|
||||
- Mutations
|
||||
- Naming convention
|
||||
- Aliases over raw queries
|
||||
|
||||
Writing `.gq` query files in Omnigraph.
|
||||
|
||||
## File Organization
|
||||
|
||||
- One `.gq` file per primary node type (`signals.gq`, `patterns.gq`, `elements.gq`)
|
||||
- One `mutations.gq` file for all insert/update/delete queries
|
||||
- Put query files in `queries/` — cluster mode discovers `queries/*.gq` automatically
|
||||
|
||||
## Linting
|
||||
|
||||
```bash
|
||||
omnigraph lint --schema schema.pg --query queries/signals.gq
|
||||
```
|
||||
|
||||
Or (lint against a live repo):
|
||||
|
||||
```bash
|
||||
omnigraph lint --query queries/signals.gq s3://bucket/repo
|
||||
```
|
||||
|
||||
Lint returns:
|
||||
- `"status": "ok"` — all queries passed
|
||||
- `"errors": N` — count of type errors (exit 1 when nonzero)
|
||||
- `"warnings": N` — count of drift warnings
|
||||
|
||||
Run lint after every `.gq` or `.pg` edit. Wire into precommit.
|
||||
|
||||
## Parameterization
|
||||
|
||||
### Always declare typed parameters
|
||||
|
||||
```gq
|
||||
query get_signal($slug: String) {
|
||||
match { $s: Signal { slug: $slug } }
|
||||
return { $s.slug, $s.name }
|
||||
}
|
||||
```
|
||||
|
||||
Never string-interpolate values into query bodies. Pass them via `--params`:
|
||||
|
||||
```bash
|
||||
omnigraph query get_signal --query signals.gq --params '{"slug":"sig-foo"}'
|
||||
```
|
||||
|
||||
The compiler typechecks parameter values against declared types.
|
||||
|
||||
> For one-off/ad-hoc execution, pass the query inline instead of a file with `-e/--query-string` (v0.6.0+): `omnigraph query -e 'query q($slug: String){ match { $s: Signal { slug: $slug } } return { $s.name } }' --params '{"slug":"sig-foo"}'` (and `omnigraph mutate -e '...'`). `-e` is mutually exclusive with `--query <file>` — exactly one of the two is required. (Operator aliases are invoked via the separate `omnigraph alias <name>` subcommand.)
|
||||
|
||||
## Query Structure
|
||||
|
||||
### Match → Return → Order → Limit
|
||||
|
||||
```gq
|
||||
query recent_signals() {
|
||||
match {
|
||||
$s: Signal
|
||||
}
|
||||
return { $s.slug, $s.name, $s.stagingTimestamp }
|
||||
order { $s.stagingTimestamp desc }
|
||||
limit 50
|
||||
}
|
||||
```
|
||||
|
||||
### Edge traversal (lowerCamelCase)
|
||||
|
||||
Schema edges are PascalCase; traversal uses lowerCamelCase:
|
||||
|
||||
```gq
|
||||
match {
|
||||
$s: Signal { slug: $slug }
|
||||
$s formsPattern $p // edge FormsPattern: Signal -> Pattern
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-hop
|
||||
|
||||
Chain traversal clauses:
|
||||
|
||||
```gq
|
||||
query friends_of_friends($name: String) {
|
||||
match {
|
||||
$p: Person { name: $name }
|
||||
$p knows $mid
|
||||
$mid knows $fof
|
||||
}
|
||||
return { $fof.name }
|
||||
}
|
||||
```
|
||||
|
||||
### Reverse traversal
|
||||
|
||||
Flip the subject/object:
|
||||
|
||||
```gq
|
||||
query employees_of($company: String) {
|
||||
match {
|
||||
$c: Company { name: $company }
|
||||
$p worksAt $c
|
||||
}
|
||||
return { $p.name }
|
||||
}
|
||||
```
|
||||
|
||||
### Negation
|
||||
|
||||
```gq
|
||||
query orphan_signals() {
|
||||
match {
|
||||
$s: Signal
|
||||
not { $s formsPattern $_ }
|
||||
}
|
||||
return { $s.slug }
|
||||
}
|
||||
```
|
||||
|
||||
## Search Functions
|
||||
|
||||
### Text search
|
||||
|
||||
```gq
|
||||
match {
|
||||
$d: Doc
|
||||
search($d.title, $q) // full-text on @index'd String
|
||||
}
|
||||
```
|
||||
|
||||
```gq
|
||||
match {
|
||||
$d: Doc
|
||||
fuzzy($d.title, $q, 2) // fuzzy match, max 2 edits
|
||||
}
|
||||
```
|
||||
|
||||
```gq
|
||||
match {
|
||||
$d: Doc
|
||||
match_text($d.body, $q) // phrase match
|
||||
}
|
||||
```
|
||||
|
||||
### Vector/ranking (require `limit`)
|
||||
|
||||
```gq
|
||||
query vector_search($q: Vector(3072)) {
|
||||
match { $d: Doc }
|
||||
return { $d.slug, $d.title }
|
||||
order { nearest($d.embedding, $q) }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
`nearest`, `bm25`, and `rrf` are ranking operators, not filters. Every query using them **must** end with `limit N` — omitting it is a compile error.
|
||||
|
||||
### Hybrid (reciprocal rank fusion)
|
||||
|
||||
```gq
|
||||
query hybrid_search($vq: Vector(3072), $tq: String) {
|
||||
match { $d: Doc }
|
||||
return { $d.slug, $d.title }
|
||||
order { rrf(nearest($d.embedding, $vq), bm25($d.title, $tq)) }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
## Aggregations
|
||||
|
||||
```gq
|
||||
query friend_counts() {
|
||||
match {
|
||||
$p: Person
|
||||
$p knows $f
|
||||
}
|
||||
return {
|
||||
$p.name
|
||||
count($f) as friends
|
||||
}
|
||||
order { friends desc }
|
||||
limit 20
|
||||
}
|
||||
```
|
||||
|
||||
Supported: `count`, `sum`, `avg`, `min`, `max`. Grouping is implicit on non-aggregated return fields.
|
||||
|
||||
## Filter Operators
|
||||
|
||||
`=`, `!=`, `>`, `<`, `>=`, `<=`, `contains`
|
||||
|
||||
```gq
|
||||
match {
|
||||
$p: Person
|
||||
$p.age > 30
|
||||
$p.name contains "Al"
|
||||
}
|
||||
```
|
||||
|
||||
## Mutations
|
||||
|
||||
> **No top-level `mutation { ... }` wrapper.** Agents trained on GraphQL reflexively write `mutation { insert T { ... } }` — that fails the parser at character 1 with `parse error: expected query_file`. Every executable block in a `.gq` file is a named `query`; the body's verb (`insert` / `update` / `delete`) determines whether it's a write. Dispatch via `omnigraph mutate` (not `query`).
|
||||
|
||||
### Insert
|
||||
|
||||
```gq
|
||||
query add_signal($slug: String, $name: String, $brief: String,
|
||||
$stagingTimestamp: DateTime, $createdAt: DateTime, $updatedAt: DateTime) {
|
||||
insert Signal {
|
||||
slug: $slug,
|
||||
name: $name,
|
||||
brief: $brief,
|
||||
stagingTimestamp: $stagingTimestamp,
|
||||
createdAt: $createdAt,
|
||||
updatedAt: $updatedAt
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Every non-nullable property must be provided.** Lint catches missing ones as:
|
||||
|
||||
```
|
||||
error: T12: insert for 'Signal' must provide non-nullable property 'brief'
|
||||
```
|
||||
|
||||
### Insert edge
|
||||
|
||||
```gq
|
||||
query link_signal_forms_pattern($signal: String, $pattern: String) {
|
||||
insert FormsPattern { from: $signal, to: $pattern }
|
||||
}
|
||||
```
|
||||
|
||||
Edge `data` block is `{}` if the edge has no properties — just specify `from` and `to` slugs.
|
||||
|
||||
### Update
|
||||
|
||||
```gq
|
||||
query retitle_signal($slug: String, $new_title: String) {
|
||||
update Signal set { name: $new_title } where slug = $slug
|
||||
}
|
||||
```
|
||||
|
||||
### Delete
|
||||
|
||||
```gq
|
||||
query remove_signal($slug: String) {
|
||||
delete Signal where slug = $slug
|
||||
}
|
||||
```
|
||||
|
||||
### Multi-statement
|
||||
|
||||
```gq
|
||||
query add_and_link($slug: String, $pattern: String, $createdAt: DateTime, $updatedAt: DateTime) {
|
||||
insert Signal { slug: $slug, name: $slug, brief: $slug,
|
||||
stagingTimestamp: $createdAt, createdAt: $createdAt, updatedAt: $updatedAt }
|
||||
insert FormsPattern { from: $slug, to: $pattern }
|
||||
}
|
||||
```
|
||||
|
||||
There's no `upsert` keyword at the query level — use `load --mode merge` for bulk upsert.
|
||||
|
||||
> **Insert/update-only OR delete-only (the D₂ rule).** A single mutation query may contain inserts and updates, **or** deletes — never both. Mixing a `delete` with an `insert`/`update` in the same query is rejected at parse time. (Inserts/updates go through a staged two-phase publish; deletes inline-commit — omnigraph doesn't yet use Lance's two-phase delete API (it shipped in Lance 7.0.0 but isn't wired in) — so they can't share one atomic statement.) Split a delete-then-insert into two separate mutations.
|
||||
|
||||
### Date and DateTime values
|
||||
|
||||
Date format is asymmetric between `mutate` (parameter values) and `load` (JSONL):
|
||||
|
||||
| Path | Date | DateTime |
|
||||
|---|---|---|
|
||||
| `mutate --params` | ISO string `"2026-04-29"` | ISO string `"2026-04-29T10:00:00Z"` |
|
||||
| `load` JSONL | Integer days since epoch `20572` | ISO string `"2026-04-29T10:00:00Z"` |
|
||||
|
||||
Compute integer days form for a given date `d`:
|
||||
|
||||
```python
|
||||
(d - datetime.date(1970, 1, 1)).days # d is the date you're loading, not today()
|
||||
```
|
||||
|
||||
This asymmetry is one of the most common silent type errors when bulk-loading data prepared for one path through the other.
|
||||
|
||||
## Naming Convention
|
||||
|
||||
`verb_object`:
|
||||
- `get_signal`, `recent_signals`, `search_signals`
|
||||
- `signal_patterns`, `signal_elements` (traversal queries)
|
||||
- `add_signal`, `link_signal_forms_pattern` (mutations)
|
||||
|
||||
## Aliases Over Raw Queries
|
||||
|
||||
For anything an agent or script will call repeatedly, define an operator alias. See `references/aliases.md`.
|
||||
142
skills/omnigraph/references/remote-ops.md
Normal file
142
skills/omnigraph/references/remote-ops.md
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
# Remote Graph Operations
|
||||
|
||||
## Contents
|
||||
- What's different about remote
|
||||
- Verify after every write
|
||||
- 504 Gateway Timeout
|
||||
- Fork-branch 504 fingerprint
|
||||
- Targeting a remote graph (`--server`, `login`)
|
||||
- Version drift / `sync_branch()`
|
||||
- `manifest_conflict` 409
|
||||
- 429 Too Many Requests
|
||||
- Duplicate risk on blind retry
|
||||
- Reading large schemas safely
|
||||
- Prevention checklist
|
||||
|
||||
When the graph URI is a remote endpoint (`omnigraph-server` behind ALB / CloudFront, bearer-authenticated) instead of a local S3 path, several CLI behaviors change in ways the local-storage workflow never exposes. This reference covers the failures and operational rituals specific to remote graphs.
|
||||
|
||||
## What's different about remote
|
||||
|
||||
A remote graph runs server-side. Every write executes on the server — staged per touched table, then published atomically as a **single manifest commit** guarded by a compare-and-swap on expected table versions — and is gated by a connection-level idle timeout (CloudFront defaults to ~30s). There is no separate "run" object to poll — write status is implied by the HTTP response (and verifiable via `commit list`). The local CLI is a thin client; it never sees the commit happen, only the HTTP response. That asymmetry is the root of every gotcha below.
|
||||
|
||||
| Local repo | Remote repo |
|
||||
|---|---|
|
||||
| CLI writes S3 directly | Server executes the write, publishes one atomic manifest commit |
|
||||
| No connection timeout | ~30s idle timeout (CloudFront) |
|
||||
| No admission control | Per-actor `429` + `Retry-After` on writes |
|
||||
| `load` writes S3-backed storage directly | `load` is server-orchestrated — same command, one atomic commit |
|
||||
| CLI exit code is authoritative | CLI exit code can lie — verify via `commit list` |
|
||||
|
||||
## Verify after every write
|
||||
|
||||
The CLI's exit code is **not authoritative on remote graphs**. The proxy can drop a response after the server has already committed. Always verify by comparing `main`'s head:
|
||||
|
||||
```bash
|
||||
HEAD_BEFORE=$(omnigraph commit list --config X --branch main --json | jq -r '.commits[0].graph_commit_id')
|
||||
|
||||
# … run your load / mutate …
|
||||
|
||||
HEAD_AFTER=$(omnigraph commit list --config X --branch main --json | jq -r '.commits[0].graph_commit_id')
|
||||
|
||||
if [[ "$HEAD_BEFORE" != "$HEAD_AFTER" ]]; then
|
||||
echo "landed"
|
||||
else
|
||||
echo "did NOT land — safe to retry"
|
||||
fi
|
||||
```
|
||||
|
||||
For a `load --from` that forks a review branch, also compare the new branch head's `graph_commit_id` against `main`'s. **Identical means the load didn't land — empty fork left behind.**
|
||||
|
||||
For pointed verification of a single record:
|
||||
|
||||
```bash
|
||||
omnigraph export --config X --type <NodeType> | grep <slug>
|
||||
omnigraph export --config X --type <EdgeType> | grep <slug>
|
||||
```
|
||||
|
||||
## 504 Gateway Timeout: response lost, write status unknown
|
||||
|
||||
A 504 from the proxy means the server didn't respond within the idle timeout. Two server-side outcomes are possible — **the 504 alone cannot distinguish them**:
|
||||
|
||||
1. **Write completed and published** — landed, `main`'s head advanced. Common for small mutations finishing just past the 30s edge.
|
||||
2. **Write still in progress** — will publish or fail soon. Re-check after a minute.
|
||||
|
||||
Always verify via `commit list` before retrying. Blind retry on append-only types creates duplicates.
|
||||
|
||||
## Fork-branch 504 fingerprint
|
||||
|
||||
`load --from <base>` creates the branch **before** loading data. A timed-out fork-load where the data didn't land leaves an empty branch at `<base>`'s head. Stale numbered branches (`feature-v2`, `-v3`, `-v4` …) all sitting at the same `graph_commit_id` as `main` are the fingerprint of prior 504-blocked attempts.
|
||||
|
||||
Find them by comparing each branch's head against `main`'s in `omnigraph branch list --config X --json`, then delete the empty ones.
|
||||
|
||||
## Targeting a remote graph: `--server` and `login`
|
||||
|
||||
`load`, `query`, and `mutate` all run against a remote `omnigraph-server` endpoint — there is no local-only restriction as of 0.7.0. Address an operator-defined server by name instead of pasting URLs and juggling tokens:
|
||||
|
||||
```bash
|
||||
echo "$TOKEN" | omnigraph login intel-dev # stores it in ~/.omnigraph/credentials (0600)
|
||||
omnigraph load --server intel-dev --graph spike \
|
||||
--data delta.jsonl --from main --mode merge --branch staging
|
||||
```
|
||||
|
||||
`--server <name>` resolves the URL from `~/.omnigraph/config.yaml` and the token via `OMNIGRAPH_TOKEN_<NAME>` or the credentials file. A token is only ever sent to the server it is keyed to. `--graph <id>` selects the graph on a multi-graph server.
|
||||
|
||||
## Version drift / `sync_branch()`
|
||||
|
||||
```
|
||||
version drift on node:<Type>: snapshot pinned vN but dataset is at vM — call sync_branch() and retry
|
||||
```
|
||||
|
||||
- `sync_branch()` is **not a CLI command** — it's a server-internal directive that leaked into the error text. Don't go looking for it.
|
||||
- Cause: another actor committed to `main` between your CLI's snapshot pin and your `mutate` attempt.
|
||||
- Usually self-resolves on retry — the next call re-pins.
|
||||
- Calling `omnigraph snapshot` does **not** reliably re-pin for subsequent `mutate`s in the same session.
|
||||
- If persistent, fall back to `load --from main` onto a fresh branch — a forked branch doesn't suffer from concurrent-commit drift on `main`.
|
||||
- The cleaner, modern form of this conflict is a structured `manifest_conflict` **409** — see below.
|
||||
|
||||
## `manifest_conflict` 409 — stale snapshot, retry
|
||||
|
||||
When another actor commits to the same branch between your query's snapshot pin and your write, the server returns a structured **`manifest_conflict` 409** carrying `table_key` / `expected` / `actual`, rather than silently overwriting. Since v0.4.2 this is the form most concurrent update/delete/merge races take.
|
||||
|
||||
- **Retry it.** A 409 means your write was computed against a stale view and was rejected *before* committing — there is no partial state and no duplicate risk. Re-issue the same call; it re-pins to the new head.
|
||||
- Concurrent `mutate` × branch-merge on the same target branch resolves to either success or a clean 409 depending on who wins the server's per-table queue — both outcomes are safe.
|
||||
|
||||
## 429 Too Many Requests — back off, then retry
|
||||
|
||||
The server applies **per-actor admission control** to every mutating endpoint (`mutate` / `load` / `schema apply` / branch create·delete·merge). An actor that exceeds its in-flight-request or estimated-byte budget gets a structured **HTTP 429** (`code: too_many_requests`) with a `Retry-After` header — instead of blocking unrelated actors behind a global lock.
|
||||
|
||||
- This is **not** a failed write — the write never started. Honor `Retry-After` and retry; it is always safe (no partial write, no duplicate risk).
|
||||
- It's per-actor, so one noisy automation can't starve others. If you hit it constantly, batch less aggressively or space your calls out.
|
||||
- Read-only endpoints are not admission-gated.
|
||||
|
||||
## Duplicate risk on blind retry
|
||||
|
||||
After a 504, never retry without verifying first. Different node kinds have different retry semantics:
|
||||
|
||||
| Kind | Retry safety |
|
||||
|---|---|
|
||||
| Pointer nodes (`Org`, `Person`, `Opportunity`, `Channel`, `Actor`, `ActionItem`, `Artifact`, `Meeting`, `Technology`, `Campaign`, `UseCase`) | ✓ Idempotent — `@key` upserts dedupe |
|
||||
| Append-only nodes (`Signal`, `Claim`, `Decision`, `Event`, `Interaction`, `MarketingElement`, `Policy`, `Outcome`) | ✗ Duplicates on retry — verify before retrying |
|
||||
| Edges | ⚠ No `@key`. Verify via `export --type <EdgeName>` + grep. Some simple edges dedupe server-side; don't rely on it. |
|
||||
|
||||
## Reading large schemas safely
|
||||
|
||||
Remote schemas can be large (tens of KB). Tools that cap stdout (~50KB is common) will truncate or duplicate the output silently — leading to memory-based answers from agents that look correct but reference nonexistent fields.
|
||||
|
||||
Always redirect to a file before reading:
|
||||
|
||||
```bash
|
||||
omnigraph schema show --config X > /tmp/schema.pg
|
||||
wc -l /tmp/schema.pg
|
||||
```
|
||||
|
||||
Then read the file with offset/limit, not via piped stdout.
|
||||
|
||||
## Prevention checklist
|
||||
|
||||
- Keep mutations small. Single-node inserts finish well under the timeout.
|
||||
- Prefer `mutate` over `load` for ≤ a handful of records.
|
||||
- Always run `commit list` after a 504 before deciding to retry.
|
||||
- For destructive or large-batch work, use `load --from main` onto a feature branch and verify the branch head before merging.
|
||||
- Read large schemas via file redirect, not piped stdout.
|
||||
- A `429` (throttle) or a `manifest_conflict` `409` (stale snapshot) is always safe to retry — the write never committed. Honor `Retry-After` on a 429.
|
||||
192
skills/omnigraph/references/schema.md
Normal file
192
skills/omnigraph/references/schema.md
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
# Schema Authoring & Evolution
|
||||
|
||||
## Contents
|
||||
- Authoring (.pg files)
|
||||
- Evolution (schema plan/apply)
|
||||
- Supported types
|
||||
- Decorators (quick reference)
|
||||
- Interfaces
|
||||
- Design principles
|
||||
- Schema evolution in cluster mode
|
||||
|
||||
How to write and evolve `.pg` schemas in Omnigraph.
|
||||
|
||||
## Authoring (.pg files)
|
||||
|
||||
### Use `//` for comments
|
||||
|
||||
Not `#`. The compiler rejects `#` with a parse error that looks like:
|
||||
|
||||
```
|
||||
parse error: expected schema_file
|
||||
```
|
||||
|
||||
### Enums are inline, not standalone
|
||||
|
||||
The compiler does **not** accept top-level `enum Foo { ... }` blocks. Put the values inline on the property:
|
||||
|
||||
```pg
|
||||
kind: enum(product, technology, framework, concept, ops) @index
|
||||
```
|
||||
|
||||
If the same enum appears on multiple nodes, duplicate it inline — there's no shared enum type.
|
||||
|
||||
### Lists contain scalars only
|
||||
|
||||
`[String]` and `[I32]` are fine. `[Category]` (a list of enum values) is **not** supported. Use `[String]` with query-side filtering, or use a single-valued enum property if one value is enough.
|
||||
|
||||
### `@embed` takes a quoted string
|
||||
|
||||
```pg
|
||||
embedding: Vector(3072) @embed("text") @index
|
||||
```
|
||||
|
||||
Not `@embed(text)`. The source property name is a string literal.
|
||||
|
||||
### Edge constraints go inside a body block
|
||||
|
||||
`@unique(src, dst)` on an edge goes inside `{ }`, after `@card(...)`:
|
||||
|
||||
```pg
|
||||
edge PartOfArtifact: Chunk -> InformationArtifact @card(1..1) {
|
||||
@unique(src)
|
||||
}
|
||||
```
|
||||
|
||||
### Lint after every edit
|
||||
|
||||
```bash
|
||||
omnigraph lint --schema schema.pg --query queries/signals.gq
|
||||
```
|
||||
|
||||
This validates the schema **and** the queries against it. No running repo required. Wire it into a precommit hook.
|
||||
|
||||
## Evolution (schema plan/apply)
|
||||
|
||||
### Plan before apply — always
|
||||
|
||||
```bash
|
||||
omnigraph schema plan --schema next.pg s3://bucket/repo --json
|
||||
# inspect "supported": true|false and the step list
|
||||
omnigraph schema apply --schema next.pg s3://bucket/repo
|
||||
```
|
||||
|
||||
If `supported: false`, fix the source before applying. Plan is free; run it as often as needed.
|
||||
|
||||
Plan/apply diagnostics carry stable codes of the form **`OG-XXX-NNN`** (since v0.5.0) — match on the code, not the free-form message text.
|
||||
|
||||
**Destructive drops are gated (since v0.5.0).** Dropping a property or type is a soft drop by default (or rejected); to actually lose data you must opt in:
|
||||
|
||||
```bash
|
||||
omnigraph schema apply --schema next.pg s3://bucket/repo --allow-data-loss
|
||||
```
|
||||
|
||||
Over HTTP the equivalent is `{"allow_data_loss": true}` in the schema-apply body. Without the flag, a destructive drop returns a structured diagnostic instead of silently deleting columns.
|
||||
|
||||
### Apply is main-only
|
||||
|
||||
`omnigraph schema apply` rejects any non-`main` branches. Delete or merge feature branches first. This is deliberate: schema changes don't go through review branches. They go straight to main via `plan` + `apply`.
|
||||
|
||||
### Rename, don't replace
|
||||
|
||||
Use `@rename_from(...)` on renames so the planner emits a rename step (preserves data), not a drop+add pair (loses data):
|
||||
|
||||
```pg
|
||||
node Account @rename_from("User") {
|
||||
full_name: String @rename_from("name")
|
||||
}
|
||||
```
|
||||
|
||||
Works on node types, edge types, and properties.
|
||||
|
||||
### Required properties need a backfill plan
|
||||
|
||||
Adding a non-nullable property to an existing node is rejected as unsupported. Pattern:
|
||||
|
||||
1. Add as optional: `new_prop: String?`
|
||||
2. Apply
|
||||
3. Backfill via a `mutate` or `load --mode merge`
|
||||
4. Tighten to required in a follow-up apply: `new_prop: String`
|
||||
|
||||
### Keep `@key` stable
|
||||
|
||||
Changing the key field is effectively a replace — it invalidates every external reference to the node. Treat identity changes as deliberate, multi-step migrations, not casual field renames.
|
||||
|
||||
### `schema apply` blocks writes while running
|
||||
|
||||
No concurrent mutations during an apply. Plan for a short read-only window.
|
||||
|
||||
## Supported Types
|
||||
|
||||
- **Scalars:** `String`, `Bool`, `I32`, `I64`, `U32`, `U64`, `F32`, `F64`, `Date`, `DateTime`, `Blob`
|
||||
- **Collections:** `Vector(N)` (fixed-size float vector), `[ScalarType]` (list of scalar)
|
||||
- **Enums:** `enum(value1, value2, ...)` — inline only, values can contain alphanumerics, underscores, hyphens
|
||||
- **Optional:** any type + `?` suffix (`String?`, `[I32]?`, `Vector(4)?`)
|
||||
|
||||
## Decorators (quick reference)
|
||||
|
||||
**Property-level:**
|
||||
- `@key` — primary key (implies index; usually one per node)
|
||||
- `@unique` — uniqueness constraint
|
||||
- `@index` — query optimization
|
||||
- `@range(min, max)` — numeric bounds (open ranges allowed)
|
||||
- `@check(prop, "regex")` — regex pattern validation on a String property
|
||||
- `@embed("source_prop")` — embed from a String source into a Vector property
|
||||
- `@description("...")` — metadata (no migration impact)
|
||||
- `@instruction("...")` — semantic hint for LLMs/operators
|
||||
|
||||
**Edge-level:**
|
||||
- `@card(min..max)` — edge cardinality (default: `0..*`)
|
||||
|
||||
**Type-level (nodes/edges/properties):**
|
||||
- `@rename_from("OldName")` — migration-aware rename
|
||||
|
||||
**Group-level (inside body block):**
|
||||
- `@unique(prop1, prop2)` — composite uniqueness, enforced as a true tuple key at both intake and merge (works on edges too: `@unique(src, dst)`). Columns must reduce to a scalar key: `@unique` on a `[List]`/`Blob` column is rejected loudly at `load` (it used to be silently un-enforced — fixed in #160).
|
||||
- `@index(prop1, prop2)` — composite index
|
||||
|
||||
## Interfaces
|
||||
|
||||
Supported but rarely used. Declare shared property contracts and node types implement them:
|
||||
|
||||
```pg
|
||||
interface Searchable {
|
||||
title: String @index
|
||||
embedding: Vector(3072) @embed("title")
|
||||
}
|
||||
|
||||
node Doc implements Searchable {
|
||||
slug: String @key
|
||||
body: String
|
||||
}
|
||||
```
|
||||
|
||||
Most schemas are fine without interfaces. Reach for them only when 3+ node types need to share a property contract.
|
||||
|
||||
## Design Principles (brief)
|
||||
|
||||
- **Identity is explicit** — use `@key` on a semantic slug, not internal row IDs
|
||||
- **Narrow types** — `Date` over `String` for dates, `enum` over `String` for lifecycle states
|
||||
- **Edge semantics matter** — prefer `AuthoredBy` over `RelatedTo`
|
||||
- **Constraints live in the schema** — `@unique`, `@range`, `@card` keep invariants out of application code
|
||||
- **Schemas are reviewable** — clear names, explicit enums, obvious keys
|
||||
|
||||
## Schema Evolution in Cluster Mode
|
||||
|
||||
In a cluster deployment there is **no direct `omnigraph schema apply`** — the
|
||||
schema is declared (`graphs.<id>.schema:` in `cluster.yaml`) and converged:
|
||||
|
||||
```bash
|
||||
$EDITOR schema.pg
|
||||
omnigraph cluster plan --config . # shows the engine's migration steps
|
||||
omnigraph cluster apply --config . --as <you>
|
||||
# restart the --cluster server to serve the new shape
|
||||
```
|
||||
|
||||
Differences from direct `schema apply` (on a non-cluster store): **soft drops
|
||||
only** (`--allow-data-loss` is not reachable from cluster apply — prior versions
|
||||
retain dropped columns),
|
||||
and out-of-band schema changes on the live graph are *drift* — `cluster
|
||||
refresh` flags them and the next `apply` converges the graph back to the
|
||||
declared schema. Everything else in this file (`@rename_from`, backfills,
|
||||
linting, enum discipline) applies unchanged to the `.pg` you edit.
|
||||
150
skills/omnigraph/references/search.md
Normal file
150
skills/omnigraph/references/search.md
Normal file
|
|
@ -0,0 +1,150 @@
|
|||
# Search & Embeddings
|
||||
|
||||
## Contents
|
||||
- Embeddings are schema-declared
|
||||
- Generating embeddings
|
||||
- Embeddings + `load --mode merge` interaction
|
||||
- Search functions in queries
|
||||
- The key pattern: scope first, rank second
|
||||
- Model / config
|
||||
|
||||
Vector embeddings and text search in Omnigraph.
|
||||
|
||||
## Embeddings are Schema-Declared
|
||||
|
||||
```pg
|
||||
node Chunk {
|
||||
text: String
|
||||
chunk_index: I32
|
||||
embedding: Vector(3072) @embed("text") @index
|
||||
createdAt: DateTime
|
||||
}
|
||||
```
|
||||
|
||||
- `Vector(N)` — fixed-size float vector
|
||||
- `@embed("source_prop")` — what text field to embed from (quoted string)
|
||||
- `@index` — enables vector search on this field
|
||||
|
||||
The schema says **where** embeddings live and **what** they come from. Queries don't recompute; they read.
|
||||
|
||||
## Generating Embeddings
|
||||
|
||||
### First time / refresh missing
|
||||
|
||||
```bash
|
||||
omnigraph embed --seed embed-config.yaml
|
||||
```
|
||||
|
||||
Default mode is `fill_missing` — only generates embeddings for rows without one.
|
||||
|
||||
### Re-embed everything
|
||||
|
||||
```bash
|
||||
omnigraph embed --seed embed-config.yaml --reembed_all
|
||||
```
|
||||
|
||||
Use when:
|
||||
- You changed the source field: `@embed("body")` → `@embed("title")`
|
||||
- You mutated text at scale and need fresh embeddings
|
||||
- You switched embedding models (rare)
|
||||
|
||||
### Selective refresh
|
||||
|
||||
```bash
|
||||
omnigraph embed --seed embed-config.yaml --select "Chunk:chunk_index=42"
|
||||
```
|
||||
|
||||
Regenerate only rows matching the selector.
|
||||
|
||||
### Clean (delete) embeddings
|
||||
|
||||
```bash
|
||||
omnigraph embed --seed embed-config.yaml --clean
|
||||
```
|
||||
|
||||
## Embeddings + `load --mode merge` Interaction
|
||||
|
||||
**`load --mode merge` does NOT recompute embeddings.**
|
||||
|
||||
If you update rows whose source fields feed into `@embed(...)`, the source updates but the embedding stays stale.
|
||||
|
||||
Two fixes:
|
||||
1. Run `omnigraph embed --reembed_all` after the merge
|
||||
2. Use `load --mode overwrite` instead, which re-triggers embedding on load
|
||||
|
||||
## Search Functions in Queries
|
||||
|
||||
All ranking functions require `limit N` — they're order operators, not filters.
|
||||
|
||||
### Vector similarity
|
||||
|
||||
```gq
|
||||
query nearest_chunks($q: Vector(3072)) {
|
||||
match { $c: Chunk }
|
||||
return { $c.text }
|
||||
order { nearest($c.embedding, $q) }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
### BM25 text ranking
|
||||
|
||||
```gq
|
||||
query top_titles($q: String) {
|
||||
match { $d: Doc }
|
||||
return { $d.slug, $d.title }
|
||||
order { bm25($d.title, $q) }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
### Hybrid (Reciprocal Rank Fusion)
|
||||
|
||||
```gq
|
||||
query hybrid($vq: Vector(3072), $tq: String) {
|
||||
match { $d: Doc }
|
||||
return { $d.slug, $d.title }
|
||||
order { rrf(nearest($d.embedding, $vq), bm25($d.title, $tq)) }
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
### Text filter (not ranking — no `limit` required)
|
||||
|
||||
```gq
|
||||
match {
|
||||
$d: Doc
|
||||
search($d.title, $q) // full-text filter
|
||||
fuzzy($d.title, $q, 2) // fuzzy filter, max 2 edits
|
||||
match_text($d.body, $q) // phrase filter
|
||||
}
|
||||
```
|
||||
|
||||
## The Key Pattern: Scope First, Rank Second
|
||||
|
||||
Filter with graph traversal before invoking vector or text ranking. Ranking over a narrow set is both cheaper and more relevant.
|
||||
|
||||
```gq
|
||||
query related_chunks($artifact_slug: String, $q: Vector(3072)) {
|
||||
match {
|
||||
$a: InformationArtifact { slug: $artifact_slug }
|
||||
$c partOfArtifact $a // scope: only this artifact's chunks
|
||||
}
|
||||
return { $c.text }
|
||||
order { nearest($c.embedding, $q) } // rank: vector similarity within scope
|
||||
limit 10
|
||||
}
|
||||
```
|
||||
|
||||
Don't rank over the entire chunk set if you know a traversal can narrow it first.
|
||||
|
||||
## Model / Config
|
||||
|
||||
Omnigraph uses **two distinct embedding clients** — don't conflate them:
|
||||
|
||||
| Client | When it runs | Default model | Configured via |
|
||||
|--------|--------------|---------------|----------------|
|
||||
| **Engine / load-time** | At load, when an `@embed("source")` field is populated (and `omnigraph embed`) | `gemini-embedding-2-preview` (3072-dim) | `GEMINI_API_KEY`, `OMNIGRAPH_GEMINI_BASE_URL`, `OMNIGRAPH_EMBED_*`, `OMNIGRAPH_EMBEDDINGS_MOCK` |
|
||||
| **Compiler / query-time** | When a query passes a *string* to a ranking op (e.g. `nearest($c.embedding, "some text")`) and the server auto-embeds it | `text-embedding-3-small` (OpenAI-style) | `NANOGRAPH_EMBED_MODEL`, `OPENAI_API_KEY`, `OPENAI_BASE_URL`, `NANOGRAPH_EMBEDDINGS_MOCK` |
|
||||
|
||||
The vector stored in the schema is produced by the **load-time (engine)** client, so `Vector(N)` must match that model's output dimension — `Vector(3072)` for `gemini-embedding-2-preview`. If you point the query-time client at a model with a different dimension than your stored vectors, similarity search returns garbage or errors — keep both sides on the same dimension. Vectors are stored L2-normalized.
|
||||
224
skills/omnigraph/references/server-policy.md
Normal file
224
skills/omnigraph/references/server-policy.md
Normal file
|
|
@ -0,0 +1,224 @@
|
|||
# HTTP Server & Cedar Policy
|
||||
|
||||
## Contents
|
||||
- Starting the server (boot sources)
|
||||
- HTTP routes
|
||||
- Auth
|
||||
- Setup operations bypass the server
|
||||
- Cedar policy
|
||||
- Multi-graph mode
|
||||
- Server + policy together
|
||||
- Cluster-booted servers
|
||||
|
||||
How to run `omnigraph-server` and gate operations with Cedar policies.
|
||||
|
||||
## Starting the Server
|
||||
|
||||
The server is the canonical runtime entry point — all CLI queries, mutations, and admin ops go through it. **Boot is cluster-only** (RFC-011): the server boots from a cluster and serves N graphs (N ≥ 1) under nested routes. There is **no** single-graph / bare-URI / `omnigraph.yaml` boot.
|
||||
|
||||
```bash
|
||||
omnigraph-server --cluster ./company-brain --bind 127.0.0.1:8080 # a config directory …
|
||||
omnigraph-server --cluster s3://bucket/prefix --bind 0.0.0.0:8080 # … or a storage-root URI (config-free)
|
||||
```
|
||||
|
||||
`--cluster` boots from the cluster's applied revision (see *Cluster-Booted Servers* below). Run it in a separate terminal or background process.
|
||||
|
||||
## HTTP Routes
|
||||
|
||||
All per-graph routes are nested under `/graphs/{id}/...` (`{id}` = a graph id from the applied cluster); bare flat paths (`/query`, `/snapshot`, …) return **404**. `/healthz` and `/graphs` stay flat.
|
||||
|
||||
| Route | Purpose |
|
||||
|-------|---------|
|
||||
| `GET /healthz` | liveness probe (flat) |
|
||||
| `GET /graphs` | enumerate served graphs (flat; `graph_list`-gated) |
|
||||
| `GET /graphs/{id}/snapshot?branch=` | table state + row counts |
|
||||
| `POST /graphs/{id}/query` | read query (canonical; `/read` = deprecated alias) |
|
||||
| `POST /graphs/{id}/mutate` | mutation (`/change` = deprecated alias) |
|
||||
| `POST /graphs/{id}/load` | bulk JSONL load, 32 MB; branch creation opt-in via `from` (`/ingest` = deprecated alias) |
|
||||
| `POST /graphs/{id}/export` | NDJSON stream of a branch |
|
||||
| `GET /graphs/{id}/queries` · `POST /graphs/{id}/queries/{name}` | stored-query catalog (`read`) + invocation (`invoke_query`, +`change` for a stored mutation; deny == 404) |
|
||||
| `GET /graphs/{id}/schema` · `POST /graphs/{id}/schema/apply` | read `.pg` · migrate (`schema_apply`) |
|
||||
| `GET/POST /graphs/{id}/branches` · `DELETE …/branches/{b}` · `POST …/branches/merge` | branch ops |
|
||||
| `GET /graphs/{id}/commits?branch=` · `…/commits/{commit_id}` | history |
|
||||
|
||||
Read routes take `?branch=main` or `?snapshot=<id>`. Writes publish directly and commit atomically via `__manifest`; use the commits route for write/audit history.
|
||||
|
||||
## Auth
|
||||
|
||||
Set bearer tokens on the server process. Three sources, in precedence: `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` (AWS Secrets Manager) → `OMNIGRAPH_SERVER_BEARER_TOKENS_JSON`/`_FILE` (JSON `{actor_id: token}`) → `OMNIGRAPH_SERVER_BEARER_TOKEN` (single token, actor `default`):
|
||||
|
||||
```bash
|
||||
OMNIGRAPH_SERVER_BEARER_TOKENS_JSON='{"act-reader":"s3cret"}' \
|
||||
omnigraph-server --cluster ./company-brain --bind 0.0.0.0:8080
|
||||
```
|
||||
|
||||
On the client side (0.7.0), register the server once and store its token out of band:
|
||||
|
||||
```bash
|
||||
echo "s3cret" | omnigraph login remote # → ~/.omnigraph/credentials (0600)
|
||||
omnigraph query get_signal --server remote --graph spike --params '{"slug":"sig-foo"}'
|
||||
```
|
||||
|
||||
`--server remote` resolves the URL from `~/.omnigraph/config.yaml`'s `servers:` and the token via `OMNIGRAPH_TOKEN_REMOTE` or the credentials file. A token is only ever sent to the server it is keyed to.
|
||||
|
||||
### Running without auth requires an explicit opt-in
|
||||
|
||||
You can no longer just "leave auth off." Since v0.6.0 the server **refuses to start** when it has neither bearer tokens nor a policy file, unless you explicitly opt in:
|
||||
|
||||
```bash
|
||||
omnigraph-server --cluster . --unauthenticated
|
||||
# or: OMNIGRAPH_UNAUTHENTICATED=1 omnigraph-server --cluster .
|
||||
```
|
||||
|
||||
This is a guardrail against accidentally shipping an open server. For pure local dev, pass `--unauthenticated` deliberately.
|
||||
|
||||
## Setup Operations Bypass the Server
|
||||
|
||||
`init` and **local** `load` write storage directly — they don't go through the server (a **remote** `load` is server-orchestrated, POSTing `/load`). Pass the repo URI:
|
||||
|
||||
```bash
|
||||
omnigraph init --schema schema.pg s3://my-bucket/repos/<name>
|
||||
omnigraph load --data seed.jsonl --mode overwrite s3://my-bucket/repos/<name>
|
||||
```
|
||||
|
||||
Everything else — `query`, `mutate`, `snapshot`, `schema plan/apply`, `branch`, `commit` — goes through the running server.
|
||||
|
||||
## Cedar Policy
|
||||
|
||||
Omnigraph can gate sensitive actions with [Cedar](https://www.cedarpolicy.com/) policies.
|
||||
|
||||
### Default-deny posture
|
||||
|
||||
Policy is enforced engine-wide (every authoring path calls the same gate), and the default is **closed**, not open:
|
||||
|
||||
| Server state | Bearer tokens | Policy file | Behavior |
|
||||
|---|---|---|---|
|
||||
| **Open** | no | no | Every request permitted — but the server refuses to start without `--unauthenticated` / `OMNIGRAPH_UNAUTHENTICATED=1`. |
|
||||
| **DefaultDeny** | yes | no | Every authenticated request for an action other than `read` is rejected (HTTP 403). "Tokens but forgot the policy file" no longer ships the illusion of protection. |
|
||||
| **PolicyEnabled** | yes | yes | Requests are evaluated against your Cedar rules. |
|
||||
|
||||
So configuring a policy file is what *enables* writes — there is no "permit everything by default" mode once tokens are set.
|
||||
|
||||
### Gated actions
|
||||
|
||||
Per-graph actions (evaluated against the graph being addressed):
|
||||
|
||||
| Action | Protects |
|
||||
|--------|----------|
|
||||
| `read` | query execution |
|
||||
| `export` | data export |
|
||||
| `change` | mutations |
|
||||
| `invoke_query` | stored-query invocation via `POST /graphs/{id}/queries/{name}` (graph-scoped, not branch-scoped). A stored **mutation** is double-gated — it also passes `change`. For a caller without the grant, a denial and an unknown query name both return the same **404** so the catalog can't be probed. |
|
||||
| `schema_apply` | schema migrations |
|
||||
| `branch_create` | branch creation |
|
||||
| `branch_delete` | branch deletion |
|
||||
| `branch_merge` | merges (especially into protected branches) |
|
||||
|
||||
`admin` exists but is reserved (no call site yet — don't write rules for it). A server-scoped `graph_list` action gates `GET /graphs`; declare it in a `[cluster]`-scoped bundle.
|
||||
|
||||
For any shared repo, gate at least `schema_apply` and `branch_merge`.
|
||||
|
||||
### Where policy is declared
|
||||
|
||||
Cedar bundles are declared in `cluster.yaml` and attach via `applies_to`: `[cluster]` is the server-level engine (gates `graph_list` / `GET /graphs`); `[<graph-id>]` is that graph's engine (gates `invoke_query`, `read`, `change`, `branch_*`, `schema_apply`). `cluster apply` publishes them and the `--cluster` server enforces the applied revision. The `policy.yaml` rule format (below) is the bundle content.
|
||||
|
||||
### `policy.yaml` shape
|
||||
|
||||
The policy model is **allow-only**: every rule is a `permit`. You grant capabilities to groups; anything ungranted is denied by default. There is **no `deny` / `effect` key** — to forbid something, simply don't grant it.
|
||||
|
||||
```yaml
|
||||
version: 1 # required; must be 1
|
||||
|
||||
groups:
|
||||
admins: [act-alice, act-bob]
|
||||
team: [act-carol, act-dan]
|
||||
|
||||
protected_branches:
|
||||
- main
|
||||
|
||||
rules:
|
||||
- id: admins-can-apply-schema # rules use `id`, not `name`
|
||||
allow: # required `allow:` block
|
||||
actors: { group: admins } # references a group by name
|
||||
actions: [schema_apply]
|
||||
target_branch_scope: protected
|
||||
|
||||
- id: team-can-merge-to-protected
|
||||
allow:
|
||||
actors: { group: team }
|
||||
actions: [branch_merge]
|
||||
target_branch_scope: protected
|
||||
|
||||
- id: team-can-read-write-unprotected
|
||||
allow:
|
||||
actors: { group: team }
|
||||
actions: [read, change]
|
||||
branch_scope: unprotected
|
||||
```
|
||||
|
||||
To "block unreviewed schema applies," you don't write a deny rule — you just don't grant `schema_apply` to that group. Default-deny does the rest.
|
||||
|
||||
Scope rules (a rule's `allow` block may use **at most one**):
|
||||
|
||||
- `branch_scope: any | protected | unprotected` — for `read`, `export`, `change` (matches the source branch).
|
||||
- `target_branch_scope: any | protected | unprotected` — for `schema_apply`, `branch_create`, `branch_delete`, `branch_merge` (matches the destination branch).
|
||||
|
||||
### Validate, test, explain
|
||||
|
||||
```bash
|
||||
# Compile Cedar + check the cluster's applied policies
|
||||
omnigraph policy validate --cluster .
|
||||
|
||||
# Run declarative test cases
|
||||
omnigraph policy test --cluster . --tests policy.tests.yaml
|
||||
|
||||
# Debug a single decision
|
||||
omnigraph policy explain \
|
||||
--actor act-alice \
|
||||
--action schema_apply \
|
||||
--target-branch main \
|
||||
--cluster .
|
||||
```
|
||||
|
||||
### Test cases (`policy.tests.yaml`)
|
||||
|
||||
```yaml
|
||||
version: 1 # required; must be 1
|
||||
cases:
|
||||
- id: alice-can-apply-schema # cases use `id`, not `name`
|
||||
actor: act-alice
|
||||
action: schema_apply
|
||||
target_branch: main # schema_apply is target-branch scoped
|
||||
expect: allow # `allow` / `deny` (not `permit`)
|
||||
|
||||
- id: random-user-cannot-merge-to-main
|
||||
actor: act-random
|
||||
action: branch_merge
|
||||
target_branch: main
|
||||
expect: deny
|
||||
```
|
||||
|
||||
Run `policy test` after every policy edit. Tests are cheap.
|
||||
|
||||
## Multi-graph serving
|
||||
|
||||
A `--cluster` server serves every graph in the applied cluster, each under `/graphs/{id}/...`. `GET /graphs` enumerates them (sorted by id), gated by the cluster-level `graph_list` action — even under `--unauthenticated`, topology stays closed until a `[cluster]` policy grants it. `omnigraph graphs list` mirrors it (remote servers only).
|
||||
|
||||
Policy attaches at two levels via `cluster.yaml` `applies_to`:
|
||||
- `[<graph-id>]` — per-graph rules (`read`, `change`, `branch_*`, `schema_apply`, `invoke_query`).
|
||||
- `[cluster]` — server-level rules (`graph_list`).
|
||||
|
||||
There is no runtime add/remove of graphs — edit `cluster.yaml`, `cluster apply`, restart.
|
||||
|
||||
## Server + Policy Together
|
||||
|
||||
When the server is running with a policy file:
|
||||
1. Every request resolves the actor from the bearer token (the client cannot set actor identity) and checks it against Cedar rules.
|
||||
2. Unauthorized requests return `403 Forbidden`.
|
||||
3. The CLI doesn't bypass policy when it connects over HTTP — it's enforced at the server. Enforcement is also engine-wide, so CLI direct-engine writes and embedded SDK consumers hit the same gate.
|
||||
|
||||
Setup ops (`init`, `load`) write storage directly. With a policy configured they still flow through the engine-layer enforce gate for the actor you pass via `--as` (or `operator.actor` in `~/.omnigraph/config.yaml`); gate the raw storage layer too (S3 bucket ACLs, object locks) if the bucket is shared.
|
||||
|
||||
## Cluster-Booted Servers
|
||||
|
||||
`omnigraph-server --cluster <dir|s3://>` is the only boot source (covered above). It serves the cluster's **applied revision**: `cluster apply` changes take effect on the next restart (no hot reload), and boot is fail-fast with named remedies for missing/pending/tampered state. Bearer tokens and bind stay process-level (env/flags). See `references/cluster.md`.
|
||||
54
skills/omnigraph/references/stored-queries.md
Normal file
54
skills/omnigraph/references/stored-queries.md
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
# Stored-Query Registries
|
||||
|
||||
A **stored query** is a `.gq` query that the *server* loads, type-checks at startup, and exposes by name — without ever accepting ad-hoc query source from the client. It's how you publish a vetted, typed query surface to remote callers and MCP tools.
|
||||
|
||||
This is a server-side feature introduced in **v0.6.1**. It is distinct from CLI `aliases:` (see [`aliases.md`](aliases.md)): an alias is local client ergonomics; a stored query is a server-published, policy-gated endpoint.
|
||||
|
||||
## Declaring stored queries (`cluster.yaml`)
|
||||
|
||||
Stored queries are declared in the cluster's `cluster.yaml` — every `query <name>` in the listed `.gq` files registers:
|
||||
|
||||
```yaml
|
||||
graphs:
|
||||
<id>:
|
||||
schema: schema.pg
|
||||
queries: queries/ # discover every `query <name>` in queries/*.gq
|
||||
```
|
||||
|
||||
`queries` also accepts an explicit file list (`[a.gq, b.gq]`) or a fine-grained `name: { file: … }` map; an unparseable `.gq` or a duplicate query name across files fails `cluster validate`. `cluster apply` publishes them to the content-addressed catalog, and the `--cluster` server type-checks and serves every applied query. Every applied query is listed (per-query `mcp:`/expose flags are a planned phase).
|
||||
|
||||
## CLI
|
||||
|
||||
```bash
|
||||
omnigraph queries validate # type-check every stored query against the live schema (offline; opens the graph; exits non-zero on drift)
|
||||
omnigraph queries list # print the addressed graph's registry: query names and typed params
|
||||
```
|
||||
|
||||
- `validate` catches schema drift **without restarting the server** — run it after a `schema apply` or before deploying a config change. The server also runs this check at startup and **refuses to boot** on drift or on a duplicate MCP tool name.
|
||||
- `validate` opens the graph (address with `--store <uri>` or a positional URI); `list` reads the addressed graph's catalog.
|
||||
- `queries` is distinct from `lint` — `lint` validates a single `.gq` file you point it at; `queries validate` validates the registry the server will actually serve.
|
||||
|
||||
## HTTP surface
|
||||
|
||||
| Route | Gate | Purpose |
|
||||
|-------|------|---------|
|
||||
| `GET /graphs/{id}/queries` | `read` | Typed tool catalog of the served queries. Graph-wide (branch-independent; `read` authorized against `main`). |
|
||||
| `POST /graphs/{id}/queries/{name}` | `invoke_query` (+ `change` for a stored mutation) | Invoke a named query. Body carries params only — **never** `.gq` source. A stored mutation cannot target a `snapshot` (`400`); a param type error is a structured `400` naming the param. |
|
||||
|
||||
`?branch=` / `?snapshot=` query params apply to `POST /graphs/{id}/queries/{name}` reads; branch/snapshot access stays enforced by the inner `read`/`change` gate (`invoke_query` itself is graph-scoped, not branch-scoped).
|
||||
|
||||
## Policy gating (`invoke_query`)
|
||||
|
||||
- **`invoke_query`** is a per-graph Cedar action gating the whole stored-query invocation surface. Grant it like any other action (see [`server-policy.md`](server-policy.md)).
|
||||
- **Stored mutations are double-gated:** the caller needs `invoke_query` to reach the query **and** `change` for the write. An actor with `invoke_query` but not `change` gets `403` on a stored mutation.
|
||||
- **Deny == unknown:** for a caller *lacking* `invoke_query`, a denial and an unknown query name return the **same 404** (identical body) — the catalog can't be probed. A caller who *holds* `invoke_query` may still get a `403` from the inner gate for a query it can't `read`/`change`, so existence is visible to grant-holders by design.
|
||||
- **Default-deny mode** (bearer tokens, no `policy.file`) permits only `read`, so *every* `/graphs/{id}/queries/{name}` call returns `404` until an `invoke_query` rule is configured.
|
||||
|
||||
## MCP exposure
|
||||
|
||||
Every applied query is listed in `GET /graphs/{id}/queries` as a typed MCP tool. Per-query exposure controls (`mcp.expose`, `tool_name`) are a planned phase — there is no per-query `mcp:` flag in cluster mode today.
|
||||
|
||||
## Note on per-query authorization
|
||||
|
||||
The catalog is **not** Cedar-filtered per query yet: a caller with `read` but not `invoke_query` can *list* a query it cannot *invoke* (invocation would 404). Per-query authorization is future work; for now the catalog is a discovery surface and `invoke_query` is the invocation gate.
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue