omnigraph-policy is a new crate this release cycle (Cedar policy
engine, MR-722). It wasn't added to the publish list when it was
created, so v0.5.0's tag-triggered publish run succeeded for
omnigraph-compiler but failed at omnigraph-engine:
failed to prepare local package for uploading
Caused by:
no matching package named `omnigraph-policy` found
location searched: crates.io index
required by package `omnigraph-engine v0.5.0`
omnigraph-policy has no internal omnigraph-* deps so it can publish
after omnigraph-compiler (either could go first). omnigraph-engine
depends on both; server on the three; cli on everything.
publish_if_new is idempotent — re-running with the v0.5.0 tag after
this lands will skip omnigraph-compiler (already published), then
publish policy + engine + server + cli.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)
Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).
## What this enables
1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
`None` for list-contains (the comment said *"Can't pushdown list
contains"*) because string SQL can't easily express it. With Expr,
it lowers to DataFusion's `array_has(col, value)` builtin via the
`nested_expressions` feature, and pushes down to Lance's scan layer
the same way Eq/Lt/etc. do. Pinned by the new regression test
`end_to_end::ir_filter_with_list_contains_pushes_down`.
2. **DataFusion 53's optimizer rules now reach our predicates.** Once
the Expr lands at the Lance scanner, DF's planner runs:
- `IN`-list vectorized eq kernel (DF #20528)
- `PhysicalExprSimplifier` (DF #20111)
- CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
- Push limit into hash join (DF #20228)
None of these were applicable before because the string SQL path
short-circuited the optimizer.
## Scope
This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:
- The Expand pushdown still serializes through `hydrate_nodes`'s
`extra_filter_sql: Option<&str>` parameter. Migrating it changes the
`TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
`Option<Expr>`) and the cascading call sites — out of scope for this
MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
Lance v7 bump per issue #112) will migrate that path to
`DeleteBuilder::execute_uncommitted` taking an Expr.
The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.
## Cargo
Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).
## Tests
- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)
The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:
HTTP error: error sending request
The "Dump RustFS logs on failure" step revealed the container was
dying at startup:
[FATAL] Server encountered an error and is shutting down:
Default root credentials are not allowed on non-loopback listeners;
set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
for local development only
`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.
This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.
Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
- Rotate the CI creds to less-default values (less coupling to
RustFS's "default" set definition)
- Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
error message
- Use a workflow service container with controlled lifecycle
Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: generator + drift CI + initial roles
Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS
is generated and CI-enforced. Every role change is a reviewable PR
with a permanent in-repo audit trail. No GitHub UI clicks, no shadow
state.
Initial roles:
engineering @aaltshuler owns crates/** + default (.github/,
scripts/, Cargo.*, openapi.json,
everything else not docs)
docs @aaltshuler @ragnorc owns docs/**, README.md, AGENTS.md,
CLAUDE.md, SECURITY.md
Per GitHub semantics, multiple owners on a CODEOWNERS line means "any
one satisfies the review" — for docs, either named member can approve.
Strict "N distinct approvers" would need a CI workaround (not wired
today; tracked for future hardening).
Components:
- .github/codeowners-roles.yml — source of truth. Edit this.
- .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC).
- .github/CODEOWNERS — generated. CI rejects hand-edits.
- .github/workflows/codeowners.yml — two checks:
* drift: re-render and assert CODEOWNERS matches.
* noedit: reject PRs that edit CODEOWNERS without editing the yml.
- docs/codeowners.md — explains the source-of-truth pattern, how to
change roles, how to add new roles.
- AGENTS.md topic-index row.
What's NOT in this PR:
- Branch protection on main (separate PR; needs `gh api` call against
the org).
- Required-reviewer enforcement (depends on branch protection landing).
- Required CI status checks (depends on branch protection landing).
- Scheduled rotation (the schedule: block in the yml + a weekly
workflow). Today's roles are stable; rotation isn't needed yet.
- Linear-as-source-of-truth integration (Approach 4 from the design
discussion; deferred).
Verified:
- Generator output is deterministic (idempotent re-runs).
- scripts/check-agents-md.sh OK (28 links, 28 docs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: fix catch-all ordering (Devin review #88)
Devin caught a real bug: GitHub CODEOWNERS uses "last match wins"
semantics, but the generator emitted the catch-all `*` AFTER specific
patterns. Net effect: `*` won for every file, silently nullifying the
docs role and never routing reviews to @ragnorc.
Fix is one-line — emit the default `*` line before iterating the
specific paths. Also:
- Added a regression assertion in the generator: after rendering, the
first non-comment line must start with `*` if a default is
configured. Generator exits non-zero otherwise. Catches the same
class of mistake in any future refactor.
- Rewrote the yml header comment, which incorrectly stated "keep
more-specific paths after broader patterns" (correct for GitHub
semantics but the generator was doing the opposite — so the comment
read as a description of behavior when it was actually a contradicted
intention).
Verified by re-rendering: `*` is now line 12, `crates/**` is line 14,
`docs/**` is line 15, etc. README.md matches both `*` and `README.md`;
`README.md` is later → wins → @aaltshuler + @ragnorc both assigned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The release.yml workflow builds binaries and updates Homebrew but never
published to crates.io — v0.4.0 and v0.4.1 are missing from the registry
even though the local Cargo.toml and the v0.4.1 tag are at 0.4.1.
This adds a separate workflow that:
- auto-publishes on every v* tag push (future releases self-publish)
- can be manually dispatched with a tag input (catch up on v0.4.1)
- is idempotent: skips a crate if its current crates.io version already
matches local, so a partial failure is safe to retry
- gates on CARGO_REGISTRY_TOKEN (already in repo secrets); skips cleanly
if the token is ever rotated out
Publishes in dependency order: omnigraph-compiler → omnigraph-engine →
omnigraph-server → omnigraph-cli. Path-only deps in Cargo.toml carry
explicit version fields, so cargo publish strips paths and resolves
against crates.io.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The standalone test_failpoints_feature job took 21min on first run
(cold cache; the omnigraph-engine crate has lance + datafusion deps
that make any fresh build expensive). Folding into Test Workspace
shares the warm cache so the failpoints invocation is incremental —
~30s vs 21min on subsequent runs, and within the workspace job's
existing budget.
The failpoints feature is gated behind a Cargo flag and only adds
the small `fail` crate dep + a few feature-gated code paths; it
doesn't change the dep tree of any other crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new findings from Cubic on commit 3223b51:
* **Pending edge cardinality counted within-input duplicates** (P2):
count_src_per_edge's pending walk added every row to the count,
including duplicate rows that finalize will collapse via
dedupe_merge_batches_by_id. A LoadMode::Merge with the same edge id
twice would over-count → spurious @card violation. Fix: when
dedupe_key_column is Some, walk pending in reverse, track seen keys
via HashSet, count only the kept (last-occurrence) rows. Mirrors
finalize-time dedupe so cardinality counts what stage_merge_insert
actually publishes.
* **scan_with_pending silently disabled merge-shadow when projection
omitted key_column** (P2): if a caller passed Some("id") as
key_column but their projection didn't include "id", the
filter_out_rows_where_string_in helper passed batches through
unchanged — silently degrading to union semantics. Fix: validate
up front that projection contains key_column when both are Some;
return a typed Lance error otherwise. Tightened the helper too:
missing column is now an internal error (was a silent passthrough).
* **Cascade-vs-explicit delete test was too weak** (P2): asserted
only that edge count decreased after delete. The cascade alone
could satisfy that even if the explicit second-delete silently
no-op'd. Strengthened: assert post_knows == 0, which only holds
when both ops landed (Bob→Diana would survive if op-2 no-op'd).
CI gap: also added test_failpoints_feature job to .github/workflows/ci.yml.
The workspace test runs without --features failpoints (the feature is
behind a Cargo flag), so the failpoints test suite was never exercised
by CI before now. The new job builds + runs
`cargo test -p omnigraph-engine --features failpoints --test failpoints`
on every full CI run, mirroring the test_aws_feature pattern.
New tests on tests/runs.rs:
* load_merge_mode_dedupes_within_pending_for_cardinality_count
(Cubic P2 #2 — pending-vs-pending dedup, distinct from the
load_merge_mode_dedupes_edge_for_cardinality_count test which
covers committed-vs-pending dedup).
* scan_with_pending_rejects_key_column_missing_from_projection
(Cubic P2 #3 — verifies the up-front validation rejects bad
callers and that the happy path still works correctly).
Local test results:
* tests/runs.rs: 23/23 passed
* tests/failpoints.rs --features failpoints: 7/7 passed (includes the
two new finalize→publisher residual tests landed in 3223b51).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.
- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
"no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
an arm64-only Homebrew formula. The on_macos block now declares
`depends_on arch: :arm64` so Intel `brew install` fails fast with
a clear architecture message instead of installing an arm64 binary
that errors at exec time.
Linux x86_64 build is unaffected.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the default pull_request checkout (refs/pull/N/merge) so tests
see the merged state. The openapi.json auto-commit now uses a separate
shallow clone of the PR branch, so the pushed commit contains only the
spec change rather than the merge-commit tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The separate openapi-sync workflow was duplicating the workspace build
(~15 min cold-cache compile), paying the cost twice per PR. Fold the
regen + auto-commit into the existing test job: one compile, shared
rust-cache, same drift-check semantics.
- Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then
commit the regenerated spec back to the PR branch
- Fork PRs / pushes: env var empty, test stays in strict drift-check mode
- openapi_spec_is_up_to_date treats empty env value as unset, so the
conditional workflow env expression works
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions doesn't expose the 'secrets' context in 'with:' when
calling a reusable workflow. The companion PR on the shared workflow
(ModernRelay/.github) moves the four AWS values into
on.workflow_call.secrets; this caller drops them from 'with:' and adds
'secrets: inherit' so all four flow through masked.
Trailing from PRs #33 and #34.
On a public repo, Actions variables are not masked in workflow logs.
The AWS role ARN and artifact bucket name embed the AWS account ID —
not catastrophic, but norm-preserving to keep them out of public logs.
Switch all four values (region, role, project, bucket) from
`${{ vars.* }}` to `${{ secrets.* }}`. When secrets are passed via
`with:` to a reusable workflow, GitHub's masking still applies because
the value is added to the run's mask list as soon as the secret
reference is resolved.
Followup to #33 — should have landed as secrets from the start.
Invokes the shared omnigraph-package reusable workflow twice per run —
once with default features, once with --features aws — producing two
ECR tags per source commit:
<sha> (default features)
<sha>-aws (--features aws → SecretsManagerTokenSource)
Manual-dispatch only for now. Neither release.yml nor release-edge.yml
currently invokes the CodeBuild-backed packaging path; this gives
operators a way to produce on-demand image variants without wiring
packaging into the tag/push cadence.
Prerequisites:
- Repo vars AWS_REGION, AWS_ROLE_TO_ASSUME, AWS_CODEBUILD_PACKAGE_PROJECT,
AWS_ARTIFACT_BUCKET must be set.
- Shared workflow must support the `features` and `image_tag_suffix`
inputs.
Uses @main as the shared-workflow ref until a versioned tag is cut.
Introduces an opt-in AWS Secrets Manager backend for bearer tokens,
behind the `aws` Cargo feature. Default builds (on-prem, local dev)
don't pull in the AWS SDK and don't pay its compile cost.
- New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager`
optional deps. Default features remain empty.
- New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by
fetching a JSON `{"actor_id": "token", ...}` payload from a named
Secrets Manager secret. Credentials resolve via the AWS default chain
(env, shared config, IMDSv2 instance role, ECS task role) so no
explicit plumbing is needed under an IAM role.
- New `resolve_token_source()` dispatches based on the
`OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set
but the binary was built without `--features aws`, returns a clear
rebuild instruction rather than silently falling back.
- `serve()` now uses `resolve_token_source()` and logs which source was
selected at startup.
- `parse_json_secret_payload()` is factored out as a free function so
the payload validation (trim whitespace, reject blank actor/token,
reject non-object) is unit-testable without the AWS SDK.
- New CI job `test_aws_feature` builds + tests with `--features aws`.
Not in this PR (follow-ups):
- Background refresh loop for rotation. `SecretsManagerTokenSource`
advertises `supports_refresh: true` but the AppState-level refresh
task isn't wired yet.
- Config-YAML dispatch (today the AWS source is selected via env var
only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`).
Tests:
- Default-feature build: 33 lib + 41 integration + 64 openapi.
- `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>