Branch protection on main, declared as code rather than as opaque
GitHub UI state. Pairs with the CODEOWNERS chassis (#88): once this
PR lands and an admin runs the apply script, every PR to main must
satisfy code-owner review and the listed required checks.
Components:
- .github/branch-protection.json — the policy. Edit this to change
required checks, review counts, etc. Includes a _comment field for
human readers; the apply script strips it before PUT.
- scripts/apply-branch-protection.sh — idempotent apply via `gh api`.
Reads back current state for verification. Supports DRY_RUN=1.
- docs/branch-protection.md — explains the policy, how to apply, how
to change, why declared as code.
- AGENTS.md topic-index row.
Policy summary:
- Required status checks (strict): Classify Changes, Check AGENTS.md
Links, Test Workspace, Test omnigraph-server --features aws,
CODEOWNERS / drift, CODEOWNERS / noedit.
- Required approving reviews: 1, must be a code owner.
- Dismiss stale reviews on new commits.
- Required linear history (squash or rebase merges only).
- No force pushes, no deletions, no admin bypasses.
- Required conversation resolution.
What's NOT in this PR:
- Required signed commits — not yet; maintainers must enroll GPG/SSH
signing first or merges will block.
- Tag protection for v* tags — separate PR.
- Additional required checks (cargo deny, audit, fmt, clippy, CodeQL,
schema-lint MR-946) — separate PRs as each lands.
- The script is NOT run by CI. Branch-protection changes are admin
actions; CI-driven auto-apply would defeat the purpose. Manual
invocation is the audit point.
How to apply after merge:
./scripts/apply-branch-protection.sh
Requires gh-CLI auth with repo-admin permissions.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: generator + drift CI + initial roles
Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS
is generated and CI-enforced. Every role change is a reviewable PR
with a permanent in-repo audit trail. No GitHub UI clicks, no shadow
state.
Initial roles:
engineering @aaltshuler owns crates/** + default (.github/,
scripts/, Cargo.*, openapi.json,
everything else not docs)
docs @aaltshuler @ragnorc owns docs/**, README.md, AGENTS.md,
CLAUDE.md, SECURITY.md
Per GitHub semantics, multiple owners on a CODEOWNERS line means "any
one satisfies the review" — for docs, either named member can approve.
Strict "N distinct approvers" would need a CI workaround (not wired
today; tracked for future hardening).
Components:
- .github/codeowners-roles.yml — source of truth. Edit this.
- .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC).
- .github/CODEOWNERS — generated. CI rejects hand-edits.
- .github/workflows/codeowners.yml — two checks:
* drift: re-render and assert CODEOWNERS matches.
* noedit: reject PRs that edit CODEOWNERS without editing the yml.
- docs/codeowners.md — explains the source-of-truth pattern, how to
change roles, how to add new roles.
- AGENTS.md topic-index row.
What's NOT in this PR:
- Branch protection on main (separate PR; needs `gh api` call against
the org).
- Required-reviewer enforcement (depends on branch protection landing).
- Required CI status checks (depends on branch protection landing).
- Scheduled rotation (the schedule: block in the yml + a weekly
workflow). Today's roles are stable; rotation isn't needed yet.
- Linear-as-source-of-truth integration (Approach 4 from the design
discussion; deferred).
Verified:
- Generator output is deterministic (idempotent re-runs).
- scripts/check-agents-md.sh OK (28 links, 28 docs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* codeowners: fix catch-all ordering (Devin review #88)
Devin caught a real bug: GitHub CODEOWNERS uses "last match wins"
semantics, but the generator emitted the catch-all `*` AFTER specific
patterns. Net effect: `*` won for every file, silently nullifying the
docs role and never routing reviews to @ragnorc.
Fix is one-line — emit the default `*` line before iterating the
specific paths. Also:
- Added a regression assertion in the generator: after rendering, the
first non-comment line must start with `*` if a default is
configured. Generator exits non-zero otherwise. Catches the same
class of mistake in any future refactor.
- Rewrote the yml header comment, which incorrectly stated "keep
more-specific paths after broader patterns" (correct for GitHub
semantics but the generator was doing the opposite — so the comment
read as a description of behavior when it was actually a contradicted
intention).
Verified by re-rendering: `*` is now line 12, `crates/**` is line 14,
`docs/**` is line 15, etc. README.md matches both `*` and `README.md`;
`README.md` is later → wins → @aaltshuler + @ragnorc both assigned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The release.yml workflow builds binaries and updates Homebrew but never
published to crates.io — v0.4.0 and v0.4.1 are missing from the registry
even though the local Cargo.toml and the v0.4.1 tag are at 0.4.1.
This adds a separate workflow that:
- auto-publishes on every v* tag push (future releases self-publish)
- can be manually dispatched with a tag input (catch up on v0.4.1)
- is idempotent: skips a crate if its current crates.io version already
matches local, so a partial failure is safe to retry
- gates on CARGO_REGISTRY_TOKEN (already in repo secrets); skips cleanly
if the token is ever rotated out
Publishes in dependency order: omnigraph-compiler → omnigraph-engine →
omnigraph-server → omnigraph-cli. Path-only deps in Cargo.toml carry
explicit version fields, so cargo publish strips paths and resolves
against crates.io.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The standalone test_failpoints_feature job took 21min on first run
(cold cache; the omnigraph-engine crate has lance + datafusion deps
that make any fresh build expensive). Folding into Test Workspace
shares the warm cache so the failpoints invocation is incremental —
~30s vs 21min on subsequent runs, and within the workspace job's
existing budget.
The failpoints feature is gated behind a Cargo flag and only adds
the small `fail` crate dep + a few feature-gated code paths; it
doesn't change the dep tree of any other crate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new findings from Cubic on commit 3223b51:
* **Pending edge cardinality counted within-input duplicates** (P2):
count_src_per_edge's pending walk added every row to the count,
including duplicate rows that finalize will collapse via
dedupe_merge_batches_by_id. A LoadMode::Merge with the same edge id
twice would over-count → spurious @card violation. Fix: when
dedupe_key_column is Some, walk pending in reverse, track seen keys
via HashSet, count only the kept (last-occurrence) rows. Mirrors
finalize-time dedupe so cardinality counts what stage_merge_insert
actually publishes.
* **scan_with_pending silently disabled merge-shadow when projection
omitted key_column** (P2): if a caller passed Some("id") as
key_column but their projection didn't include "id", the
filter_out_rows_where_string_in helper passed batches through
unchanged — silently degrading to union semantics. Fix: validate
up front that projection contains key_column when both are Some;
return a typed Lance error otherwise. Tightened the helper too:
missing column is now an internal error (was a silent passthrough).
* **Cascade-vs-explicit delete test was too weak** (P2): asserted
only that edge count decreased after delete. The cascade alone
could satisfy that even if the explicit second-delete silently
no-op'd. Strengthened: assert post_knows == 0, which only holds
when both ops landed (Bob→Diana would survive if op-2 no-op'd).
CI gap: also added test_failpoints_feature job to .github/workflows/ci.yml.
The workspace test runs without --features failpoints (the feature is
behind a Cargo flag), so the failpoints test suite was never exercised
by CI before now. The new job builds + runs
`cargo test -p omnigraph-engine --features failpoints --test failpoints`
on every full CI run, mirroring the test_aws_feature pattern.
New tests on tests/runs.rs:
* load_merge_mode_dedupes_within_pending_for_cardinality_count
(Cubic P2 #2 — pending-vs-pending dedup, distinct from the
load_merge_mode_dedupes_edge_for_cardinality_count test which
covers committed-vs-pending dedup).
* scan_with_pending_rejects_key_column_missing_from_projection
(Cubic P2 #3 — verifies the up-front validation rejects bad
callers and that the happy path still works correctly).
Local test results:
* tests/runs.rs: 23/23 passed
* tests/failpoints.rs --features failpoints: 7/7 passed (includes the
two new finalize→publisher residual tests landed in 3223b51).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Splits the 990-line AGENTS.md into a 184-line map (architecture,
where-to-find index, always-on invariants, capability matrix,
maintenance contract) plus 18 new docs/*.md files holding the deep
content per topic (storage, schema and query languages, indexes,
embeddings, branches/commits, runs, merge, changes, execution, policy,
server, CLI reference, audit, errors, CI, constants, v0.3.1 notes).
Adds scripts/check-agents-md.sh and a check_agents_md CI job that
verifies every docs/ link in AGENTS.md resolves and every doc in the
canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.
- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
"no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
an arm64-only Homebrew formula. The on_macos block now declares
`depends_on arch: :arm64` so Intel `brew install` fails fast with
a clear architecture message instead of installing an arm64 binary
that errors at exec time.
Linux x86_64 build is unaffected.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the default pull_request checkout (refs/pull/N/merge) so tests
see the merged state. The openapi.json auto-commit now uses a separate
shallow clone of the PR branch, so the pushed commit contains only the
spec change rather than the merge-commit tree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The separate openapi-sync workflow was duplicating the workspace build
(~15 min cold-cache compile), paying the cost twice per PR. Fold the
regen + auto-commit into the existing test job: one compile, shared
rust-cache, same drift-check semantics.
- Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then
commit the regenerated spec back to the PR branch
- Fork PRs / pushes: env var empty, test stays in strict drift-check mode
- openapi_spec_is_up_to_date treats empty env value as unset, so the
conditional workflow env expression works
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub Actions doesn't expose the 'secrets' context in 'with:' when
calling a reusable workflow. The companion PR on the shared workflow
(ModernRelay/.github) moves the four AWS values into
on.workflow_call.secrets; this caller drops them from 'with:' and adds
'secrets: inherit' so all four flow through masked.
Trailing from PRs #33 and #34.
On a public repo, Actions variables are not masked in workflow logs.
The AWS role ARN and artifact bucket name embed the AWS account ID —
not catastrophic, but norm-preserving to keep them out of public logs.
Switch all four values (region, role, project, bucket) from
`${{ vars.* }}` to `${{ secrets.* }}`. When secrets are passed via
`with:` to a reusable workflow, GitHub's masking still applies because
the value is added to the run's mask list as soon as the secret
reference is resolved.
Followup to #33 — should have landed as secrets from the start.
Invokes the shared omnigraph-package reusable workflow twice per run —
once with default features, once with --features aws — producing two
ECR tags per source commit:
<sha> (default features)
<sha>-aws (--features aws → SecretsManagerTokenSource)
Manual-dispatch only for now. Neither release.yml nor release-edge.yml
currently invokes the CodeBuild-backed packaging path; this gives
operators a way to produce on-demand image variants without wiring
packaging into the tag/push cadence.
Prerequisites:
- Repo vars AWS_REGION, AWS_ROLE_TO_ASSUME, AWS_CODEBUILD_PACKAGE_PROJECT,
AWS_ARTIFACT_BUCKET must be set.
- Shared workflow must support the `features` and `image_tag_suffix`
inputs.
Uses @main as the shared-workflow ref until a versioned tag is cut.
Introduces an opt-in AWS Secrets Manager backend for bearer tokens,
behind the `aws` Cargo feature. Default builds (on-prem, local dev)
don't pull in the AWS SDK and don't pay its compile cost.
- New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager`
optional deps. Default features remain empty.
- New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by
fetching a JSON `{"actor_id": "token", ...}` payload from a named
Secrets Manager secret. Credentials resolve via the AWS default chain
(env, shared config, IMDSv2 instance role, ECS task role) so no
explicit plumbing is needed under an IAM role.
- New `resolve_token_source()` dispatches based on the
`OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set
but the binary was built without `--features aws`, returns a clear
rebuild instruction rather than silently falling back.
- `serve()` now uses `resolve_token_source()` and logs which source was
selected at startup.
- `parse_json_secret_payload()` is factored out as a free function so
the payload validation (trim whitespace, reject blank actor/token,
reject non-object) is unit-testable without the AWS SDK.
- New CI job `test_aws_feature` builds + tests with `--features aws`.
Not in this PR (follow-ups):
- Background refresh loop for rotation. `SecretsManagerTokenSource`
advertises `supports_refresh: true` but the AppState-level refresh
task isn't wired yet.
- Config-YAML dispatch (today the AWS source is selected via env var
only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`).
Tests:
- Default-feature build: 33 lib + 41 integration + 64 openapi.
- `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>