omnigraph-policy is a new crate this release cycle (Cedar policy
engine, MR-722). It wasn't added to the publish list when it was
created, so v0.5.0's tag-triggered publish run succeeded for
omnigraph-compiler but failed at omnigraph-engine:
failed to prepare local package for uploading
Caused by:
no matching package named `omnigraph-policy` found
location searched: crates.io index
required by package `omnigraph-engine v0.5.0`
omnigraph-policy has no internal omnigraph-* deps so it can publish
after omnigraph-compiler (either could go first). omnigraph-engine
depends on both; server on the three; cli on everything.
publish_if_new is idempotent — re-running with the v0.5.0 tag after
this lands will skip omnigraph-compiler (already published), then
publish policy + engine + server + cli.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)
Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).
## What this enables
1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
`None` for list-contains (the comment said *"Can't pushdown list
contains"*) because string SQL can't easily express it. With Expr,
it lowers to DataFusion's `array_has(col, value)` builtin via the
`nested_expressions` feature, and pushes down to Lance's scan layer
the same way Eq/Lt/etc. do. Pinned by the new regression test
`end_to_end::ir_filter_with_list_contains_pushes_down`.
2. **DataFusion 53's optimizer rules now reach our predicates.** Once
the Expr lands at the Lance scanner, DF's planner runs:
- `IN`-list vectorized eq kernel (DF #20528)
- `PhysicalExprSimplifier` (DF #20111)
- CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
- Push limit into hash join (DF #20228)
None of these were applicable before because the string SQL path
short-circuited the optimizer.
## Scope
This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:
- The Expand pushdown still serializes through `hydrate_nodes`'s
`extra_filter_sql: Option<&str>` parameter. Migrating it changes the
`TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
`Option<Expr>`) and the cascading call sites — out of scope for this
MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
Lance v7 bump per issue #112) will migrate that path to
`DeleteBuilder::execute_uncommitted` taking an Expr.
The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.
## Cargo
Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).
## Tests
- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)
The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:
HTTP error: error sending request
The "Dump RustFS logs on failure" step revealed the container was
dying at startup:
[FATAL] Server encountered an error and is shutting down:
Default root credentials are not allowed on non-loopback listeners;
set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
for local development only
`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.
This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.
Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
- Rotate the CI creds to less-default values (less coupling to
RustFS's "default" set definition)
- Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
error message
- Use a workflow service container with controlled lifecycle
Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip enforce_admins from true to false. Repo admins can now merge
their own PRs without waiting for code-owner review, by clicking
"Merge without waiting for requirements to be met" once CI is green.
The action is recorded in the audit log.
Non-admins still see full enforcement: code-owner review required,
1 approving review, required status checks must pass.
Rationale: as the solo owner of most CODEOWNERS scopes, the author
cannot satisfy GitHub's "non-self approver" rule on their own PRs,
which made every PR block on a second human. Admin bypass restores
the practical workflow while keeping the protection rules as the
default for everyone else.
Branch protection on main, declared as code rather than as opaque
GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this
PR lands and an admin runs the apply script, every PR to main must
satisfy code-owner review and the listed required checks.
Components:
- .github/branch-protection.json — the policy. Edit this to change
required checks, review counts, etc. Includes a _comment field for
human readers; the apply script strips it before PUT.
- scripts/apply-branch-protection.sh — idempotent apply via `gh api`.
Reads back current state for verification. Supports DRY_RUN=1.
- docs/branch-protection.md — explains the policy, how to apply, how
to change, why declared as code.
- AGENTS.md topic-index row.
Policy summary:
- Required status checks (strict): Classify Changes, Check AGENTS.md
Links, Test Workspace, Test omnigraph-server --features aws,
CODEOWNERS / drift, CODEOWNERS / noedit.
- Required approving reviews: 1, must be a code owner.
- Dismiss stale reviews on new commits.
- Required linear history (squash or rebase merges only).
- No force pushes, no deletions, no admin bypasses.
- Required conversation resolution.
What's NOT in this PR:
- Required signed commits — not yet; maintainers must enroll GPG/SSH
signing first or merges will block.
- Tag protection for v* tags — separate PR.
- Additional required checks (cargo deny, audit, fmt, clippy, CodeQL,
schema-lint MR-946) — separate PRs as each lands.
- The script is NOT run by CI. Branch-protection changes are admin
actions; CI-driven auto-apply would defeat the purpose. Manual
invocation is the audit point.
How to apply after merge:
./scripts/apply-branch-protection.sh
Requires gh-CLI auth with repo-admin permissions.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: generator + drift CI + initial roles
Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS
is generated and CI-enforced. Every role change is a reviewable PR
with a permanent in-repo audit trail. No GitHub UI clicks, no shadow
state.
Initial roles:
engineering @aaltshuler owns crates/** + default (.github/,
scripts/, Cargo.*, openapi.json,
everything else not docs)
docs @aaltshuler @ragnorc owns docs/**, README.md, AGENTS.md,
CLAUDE.md, SECURITY.md
Per GitHub semantics, multiple owners on a CODEOWNERS line means "any
one satisfies the review" — for docs, either named member can approve.
Strict "N distinct approvers" would need a CI workaround (not wired
today; tracked for future hardening).
Components:
- .github/codeowners-roles.yml — source of truth. Edit this.
- .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC).
- .github/CODEOWNERS — generated. CI rejects hand-edits.
- .github/workflows/codeowners.yml — two checks:
* drift: re-render and assert CODEOWNERS matches.
* noedit: reject PRs that edit CODEOWNERS without editing the yml.
- docs/codeowners.md — explains the source-of-truth pattern, how to
change roles, how to add new roles.
- AGENTS.md topic-index row.
What's NOT in this PR:
- Branch protection on main (separate PR; needs `gh api` call against
the org).
- Required-reviewer enforcement (depends on branch protection landing).
- Required CI status checks (depends on branch protection landing).
- Scheduled rotation (the schedule: block in the yml + a weekly
workflow). Today's roles are stable; rotation isn't needed yet.
- Linear-as-source-of-truth integration (Approach 4 from the design
discussion; deferred).
Verified:
- Generator output is deterministic (idempotent re-runs).
- scripts/check-agents-md.sh OK (28 links, 28 docs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: fix catch-all ordering (Devin review #88)
Devin caught a real bug: GitHub CODEOWNERS uses "last match wins"
semantics, but the generator emitted the catch-all `*` AFTER specific
patterns. Net effect: `*` won for every file, silently nullifying the
docs role and never routing reviews to @ragnorc.
Fix is one-line — emit the default `*` line before iterating the
specific paths. Also:
- Added a regression assertion in the generator: after rendering, the
first non-comment line must start with `*` if a default is
configured. Generator exits non-zero otherwise. Catches the same
class of mistake in any future refactor.
- Rewrote the yml header comment, which incorrectly stated "keep
more-specific paths after broader patterns" (correct for GitHub
semantics but the generator was doing the opposite — so the comment
read as a description of behavior when it was actually a contradicted
intention).
Verified by re-rendering: `*` is now line 12, `crates/**` is line 14,
`docs/**` is line 15, etc. README.md matches both `*` and `README.md`;
`README.md` is later → wins → @aaltshuler + @ragnorc both assigned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The release.yml workflow builds binaries and updates Homebrew but never
published to crates.io — v0.4.0 and v0.4.1 are missing from the registry
even though the local Cargo.toml and the v0.4.1 tag are at 0.4.1.
This adds a separate workflow that:
- auto-publishes on every v* tag push (future releases self-publish)
- can be manually dispatched with a tag input (catch up on v0.4.1)
- is idempotent: skips a crate if its current crates.io version already
matches local, so a partial failure is safe to retry
- gates on CARGO_REGISTRY_TOKEN (already in repo secrets); skips cleanly
if the token is ever rotated out
Publishes in dependency order: omnigraph-compiler → omnigraph-engine →
omnigraph-server → omnigraph-cli. Path-only deps in Cargo.toml carry
explicit version fields, so cargo publish strips paths and resolves
against crates.io.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The standalone test_failpoints_feature job took 21min on first run
(cold cache; the omnigraph-engine crate has lance + datafusion deps
that make any fresh build expensive). Folding into Test Workspace
shares the warm cache so the failpoints invocation is incremental —
~30s vs 21min on subsequent runs, and within the workspace job's
existing budget.
The failpoints feature is gated behind a Cargo flag and only adds
the small `fail` crate dep + a few feature-gated code paths; it
doesn't change the dep tree of any other crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new findings from Cubic on commit 3223b51:
* **Pending edge cardinality counted within-input duplicates** (P2):
count_src_per_edge's pending walk added every row to the count,
including duplicate rows that finalize will collapse via
dedupe_merge_batches_by_id. A LoadMode::Merge with the same edge id
twice would over-count → spurious @card violation. Fix: when
dedupe_key_column is Some, walk pending in reverse, track seen keys
via HashSet, count only the kept (last-occurrence) rows. Mirrors
finalize-time dedupe so cardinality counts what stage_merge_insert
actually publishes.
* **scan_with_pending silently disabled merge-shadow when projection
omitted key_column** (P2): if a caller passed Some("id") as
key_column but their projection didn't include "id", the
filter_out_rows_where_string_in helper passed batches through
unchanged — silently degrading to union semantics. Fix: validate
up front that projection contains key_column when both are Some;
return a typed Lance error otherwise. Tightened the helper too:
missing column is now an internal error (was a silent passthrough).
* **Cascade-vs-explicit delete test was too weak** (P2): asserted
only that edge count decreased after delete. The cascade alone
could satisfy that even if the explicit second-delete silently
no-op'd. Strengthened: assert post_knows == 0, which only holds
when both ops landed (Bob→Diana would survive if op-2 no-op'd).
CI gap: also added test_failpoints_feature job to .github/workflows/ci.yml.
The workspace test runs without --features failpoints (the feature is
behind a Cargo flag), so the failpoints test suite was never exercised
by CI before now. The new job builds + runs
`cargo test -p omnigraph-engine --features failpoints --test failpoints`
on every full CI run, mirroring the test_aws_feature pattern.
New tests on tests/runs.rs:
* load_merge_mode_dedupes_within_pending_for_cardinality_count
(Cubic P2 #2 — pending-vs-pending dedup, distinct from the
load_merge_mode_dedupes_edge_for_cardinality_count test which
covers committed-vs-pending dedup).
* scan_with_pending_rejects_key_column_missing_from_projection
(Cubic P2 #3 — verifies the up-front validation rejects bad
callers and that the happy path still works correctly).
Local test results:
* tests/runs.rs: 23/23 passed
* tests/failpoints.rs --features failpoints: 7/7 passed (includes the
two new finalize→publisher residual tests landed in 3223b51).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.
- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
"no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
an arm64-only Homebrew formula. The on_macos block now declares
`depends_on arch: :arm64` so Intel `brew install` fails fast with
a clear architecture message instead of installing an arm64 binary
that errors at exec time.
Linux x86_64 build is unaffected.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the default pull_request checkout (refs/pull/N/merge) so tests
see the merged state. The openapi.json auto-commit now uses a separate
shallow clone of the PR branch, so the pushed commit contains only the
spec change rather than the merge-commit tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The separate openapi-sync workflow was duplicating the workspace build
(~15 min cold-cache compile), paying the cost twice per PR. Fold the
regen + auto-commit into the existing test job: one compile, shared
rust-cache, same drift-check semantics.
- Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then
commit the regenerated spec back to the PR branch
- Fork PRs / pushes: env var empty, test stays in strict drift-check mode
- openapi_spec_is_up_to_date treats empty env value as unset, so the
conditional workflow env expression works
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions doesn't expose the 'secrets' context in 'with:' when
calling a reusable workflow. The companion PR on the shared workflow
(ModernRelay/.github) moves the four AWS values into
on.workflow_call.secrets; this caller drops them from 'with:' and adds
'secrets: inherit' so all four flow through masked.
Trailing from PRs #33 and #34.
On a public repo, Actions variables are not masked in workflow logs.
The AWS role ARN and artifact bucket name embed the AWS account ID —
not catastrophic, but norm-preserving to keep them out of public logs.
Switch all four values (region, role, project, bucket) from
`${{ vars.* }}` to `${{ secrets.* }}`. When secrets are passed via
`with:` to a reusable workflow, GitHub's masking still applies because
the value is added to the run's mask list as soon as the secret
reference is resolved.
Followup to #33 — should have landed as secrets from the start.
Invokes the shared omnigraph-package reusable workflow twice per run —
once with default features, once with --features aws — producing two
ECR tags per source commit:
<sha> (default features)
<sha>-aws (--features aws → SecretsManagerTokenSource)
Manual-dispatch only for now. Neither release.yml nor release-edge.yml
currently invokes the CodeBuild-backed packaging path; this gives
operators a way to produce on-demand image variants without wiring
packaging into the tag/push cadence.
Prerequisites:
- Repo vars AWS_REGION, AWS_ROLE_TO_ASSUME, AWS_CODEBUILD_PACKAGE_PROJECT,
AWS_ARTIFACT_BUCKET must be set.
- Shared workflow must support the `features` and `image_tag_suffix`
inputs.
Uses @main as the shared-workflow ref until a versioned tag is cut.
Introduces an opt-in AWS Secrets Manager backend for bearer tokens,
behind the `aws` Cargo feature. Default builds (on-prem, local dev)
don't pull in the AWS SDK and don't pay its compile cost.
- New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager`
optional deps. Default features remain empty.
- New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by
fetching a JSON `{"actor_id": "token", ...}` payload from a named
Secrets Manager secret. Credentials resolve via the AWS default chain
(env, shared config, IMDSv2 instance role, ECS task role) so no
explicit plumbing is needed under an IAM role.
- New `resolve_token_source()` dispatches based on the
`OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set
but the binary was built without `--features aws`, returns a clear
rebuild instruction rather than silently falling back.
- `serve()` now uses `resolve_token_source()` and logs which source was
selected at startup.
- `parse_json_secret_payload()` is factored out as a free function so
the payload validation (trim whitespace, reject blank actor/token,
reject non-object) is unit-testable without the AWS SDK.
- New CI job `test_aws_feature` builds + tests with `--features aws`.
Not in this PR (follow-ups):
- Background refresh loop for rotation. `SecretsManagerTokenSource`
advertises `supports_refresh: true` but the AppState-level refresh
task isn't wired yet.
- Config-YAML dispatch (today the AWS source is selected via env var
only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`).
Tests:
- Default-feature build: 33 lib + 41 integration + 64 openapi.
- `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>