Commit graph

320 commits

Author SHA1 Message Date
Andrew Altshuler
f75b941a9e
Make schema apply atomic across crashes (#57)
Schema apply previously committed the manifest before writing the schema
source and IR contract files. A crash in that window left the manifest
pointing at the new schema while _schema.pg, _schema.ir.json, and
__schema_state.json still reflected the old one — a silent inconsistency
that subsequent reads hit as type errors.

Reorders the apply: write to staging filenames first, commit the
manifest, then atomically rename staging → final. On open, a recovery
sweep reconciles any leftover staging files against the manifest's table
set: pre-commit crashes get the staging files deleted, post-commit
crashes get the renames completed (idempotent — handles partial
renames). Property-only migrations where both schemas imply the same
table set return an operator-actionable error rather than guessing.

Adds rename_text + delete to StorageAdapter (atomic on local FS via
tokio::fs::rename; copy + delete on S3 — recovery is tolerant of the
non-atomic case). Failpoints test coverage at both crash boundaries plus
a partial-rename scenario.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:21:00 +03:00
Andrew Altshuler
372f793ad6
Drop macOS x86_64 build target (#55)
Stop producing the omnigraph-macos-x86_64 archive in both the
stable and edge release workflows. The macos-15-intel runner
build was the slowest of the matrix and Apple Silicon is now
the default Mac developer target.

- release.yml + release-edge.yml: drop the macos-15-intel matrix entry
- install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear
  "no prebuilt binary" error instead of attempting an absent download
- update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit
  an arm64-only Homebrew formula. The on_macos block now declares
  `depends_on arch: :arm64` so Intel `brew install` fails fast with
  a clear architecture message instead of installing an arm64 binary
  that errors at exec time.

Linux x86_64 build is unaffected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:19:26 +03:00
andrew
0469b6883e Ignore local-only working files
Keep machine-local state (.claude/, .worktrees/, local
omnigraph.yaml, CLAUDE.md, and schema design notes) from
showing up as untracked in git status.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:41:15 +03:00
Andrew Altshuler
7310f69928
Revert "Merge pull request #49 from ModernRelay/ragnorc/x-request-id" (#54)
This reverts commit b352fca13c, reversing
changes made to 748ad334a9.
2026-04-26 15:56:29 +03:00
Ragnor Comerford
b352fca13c
Merge pull request #49 from ModernRelay/ragnorc/x-request-id
Add X-Request-Id middleware
2026-04-26 12:33:33 +02:00
Ragnor Comerford
e14b203208
Reuse X_REQUEST_ID constant for inbound header lookup
Both Cursor Bugbot and Cubic flagged that the inbound `headers().get(...)`
call constructed `HeaderName::from_static("x-request-id")` inline instead
of reusing the `X_REQUEST_ID` constant defined at the top of the file.
The two were already kept in sync by both being `from_static("x-request-id")`,
but a future rename would have to touch both sites or risk silent drift
between read and write.

Also drops the now-unused `header` module import.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 12:05:19 +02:00
Ragnor Comerford
748ad334a9
Merge pull request #48 from ModernRelay/ragnorc/api-sdk-research
Polish OpenAPI spec for SDK generation
2026-04-26 11:52:46 +02:00
Ragnor Comerford
189caf893c
Merge pull request #47 from ModernRelay/perf/expand-dense-ids
perf(expand): dense u32 ids end-to-end (follow-up to #45)
2026-04-25 23:54:03 +02:00
Ragnor Comerford
284c9377c2
Add X-Request-Id middleware
Per-request ULID minted at the edge, exposed in request extensions and
on the response header. Caller-supplied X-Request-Id is echoed when
well-formed (1..=128 ASCII printable characters); otherwise rejected
and replaced with a fresh ULID so the value is always safe to log.

Companion to the TypeScript SDK redesign — clients now correlate logs
across the wire by reading X-Request-Id from response headers (and the
SDK already surfaces it on every OmnigraphError as `requestId`).

No spec change required; the header is a transport-layer concern.

Tests:
- mint a ULID when no header is provided
- echo a valid caller-supplied id
- reject overlong header (200 chars), mint a fresh ULID

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 22:56:17 +02:00
Ragnor Comerford
7809bf607e
Polish OpenAPI spec for SDK generation
Add operation descriptions and examples to utoipa annotations so the
generated TypeScript SDK has rich JSDoc, and so future Python/Go SDKs
and any /openapi.json docs UI benefit from the same effort.

- Doc comments on all 18 handlers (utoipa picks up summary/description)
- #[schema(example = ...)] on free-text fields (query_source,
  schema_source, NDJSON data) and i64 timestamps
- Destructive/irreversible warnings on change, applySchema, ingest,
  mergeBranches, deleteBranch, publishRun, abortRun

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 16:36:51 +02:00
Ragnor Comerford
7ea868485e
Update README.md 2026-04-25 16:16:24 +02:00
Ragnor Comerford
7101565929
Update README.md 2026-04-25 16:16:07 +02:00
Andrew Altshuler
74eb5a5380
Parallel per-type load writes + omnigraph optimize/cleanup CLI (#46)
* Parallel per-type load writes + omnigraph optimize/cleanup CLI

## MR-677.3 — parallel per-type load writes

The load path already groups records into one RecordBatch per type and
makes one Lance commit per table (loader::mod.rs:249-..), but those
commits ran sequentially. Wrap node and edge write loops in
`futures::stream::buffered(N)` against a new helper
`write_batches_concurrently`. Concurrency tunable via
`OMNIGRAPH_LOAD_CONCURRENCY` (default 8).

## MR-676 — `omnigraph optimize` and `omnigraph cleanup`

New CLI subcommands that walk every node + edge table in the repo:

- `omnigraph optimize <uri>` — runs Lance `compact_files` on each
  table to merge small fragments into fewer larger ones.
- `omnigraph cleanup <uri> --keep N | --older-than 7d --confirm` —
  runs Lance `cleanup_old_versions` to prune historical manifests +
  unique fragments. Requires `--confirm` because it's destructive.
  Supports both count-based and time-based retention (or both AND'd
  together). Time uses chrono `DateTime<Utc>` (added as a workspace
  dep, default-features off).

Both commands run their per-table loops in parallel (8-way bounded,
`OMNIGRAPH_MAINTENANCE_CONCURRENCY` env override). Smoke-tested
against the 114-table prod graph: optimize went 7m15s sequential
→ 1m28s parallel. cleanup --keep 1 removed 137 historical versions
across 114 tables in 1m57s without disrupting `/healthz` or query
responses.

Public API on `Omnigraph`:

  pub async fn optimize(&mut self) -> Result<Vec<TableOptimizeStats>>
  pub async fn cleanup(&mut self, opts: CleanupPolicyOptions)
      -> Result<Vec<TableCleanupStats>>

All 10 existing loader tests still pass.

Closes MR-676.
Partially addresses MR-677 (the .3 — parallel by type — piece;
MR-677.1 is for the `omnigraph embed` path, not load, since load
doesn't call Gemini directly. .2 was already in place).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-25 14:22:14 +03:00
Ragnor Comerford
53d7f47909
Pass dense u32 ids through expand instead of round-tripping via String
BFS now emits Vec<u32> dense ids directly with HashSet<u32> per-source
dedup. Only the deduped set is stringified for Lance's IN-list. The
post-hydrate alignment uses a dense-indexed Vec<Option<u32>> instead of
HashMap<&str, usize>, giving O(1) lookup without repeated string hashing.

End-to-end on the bench_expand harness (release, M-series):

  query     baseline    after     speedup
  1k  hop3   460.2 ms    23.7 ms     19x
  10k hop2     4.21 s   139.9 ms     30x
  10k hop3    40.59 s    898.5 ms    45x
  30k hop2    11.71 s    490.2 ms    24x
  30k hop3   197.38 s      3.22 s    61x

The cost lived in stringifying every (src,dst) pair and re-hashing the
strings during alignment; once dense ids stay dense, the BFS inner loop
and the final fan-out both collapse to integer ops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:07:25 +02:00
andrew
628bc2e607 Clean up bench_expand example
Remove vestigial code left from removed hasher variants: unused
BuildHasherDefault import, PhantomData suppression line, orphan planning
comments for Variant C/E. Also drop an unused `mut` on the PRNG closure
binding. No behavior change; compiles warning-free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 00:59:21 +03:00
Ragnor Comerford
d8e0bfeb22
Dedupe dst ids before hydrating nodes in execute_expand (#45)
The BFS in execute_expand emits one (src_idx, dst_id) pair per edge, so
dst_id_list contains heavy duplication when multi-hop traversals revisit
the same destination nodes. hydrate_nodes then built an
"id IN ('a', 'b', ...)" filter from the full list, passing it verbatim
to Lance. On a 30k-node Person graph, a 3-hop query produced a 15.4M-
entry IN-list against a 30k-row target — 512x more entries than unique
ids.

Deduplicate before the Lance scan; the post-hydrate alignment HashMap
already fans results back out to the original (src, dst) pairs, so
output is bit-identical.

Bench numbers (crates/omnigraph/examples/bench_expand.rs, min of 2-3
runs, release build):

  query         before     after    speedup
  1k   hop3     460 ms     28 ms     16x
  10k  hop2    4.21 s     188 ms     22x
  10k  hop3   40.59 s    1.30 s     31x
  30k  hop2   11.71 s    678 ms     17x
  30k  hop3  197.38 s    4.86 s     41x

All existing omnigraph-engine tests pass (72/72, 0 failures).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:56:18 +03:00
andrew
a1b00e2d06 Fix release.yml: move HOMEBREW_TAP_TOKEN guard into steps
GitHub Actions rejects `secrets.*` in job-level `if:` conditions at
runtime (job-level `if` is evaluated before secrets are available),
causing the workflow to abort in 0s with "workflow file issue" on
every trigger. Moving the guard into a step-level check that writes
`HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps
conditionally no-op when the tap token isn't configured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 19:24:41 +03:00
Andrew Altshuler
8649b2084f
Prepare v0.3.0 release (#44)
* Prepare v0.3.0 release

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate openapi.json

* ci: retrigger CI on latest openapi.json

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-21 19:11:34 +03:00
Andrew Altshuler
102ccc05f7
Merge pull request #43 from ModernRelay/fix/mr-674-ephemeral-run-branches
Delete __run__ branches on every terminal state (MR-674)
2026-04-21 14:35:49 +03:00
andrew
2df578eab8 Delete __run__ branches on every terminal state (MR-674)
Run branches are transactional scaffolding — the durable audit lives
on RunRecord. Invariant: every terminal state (Published, Aborted,
Failed) deletes the __run__ branch.

- Add `terminate_run` helper: appends terminal RunRecord, then
  deletes the run branch. Delete errors are swallowed — the record
  is authoritative; `cleanup_terminal_run_branches_for_target`
  retries on later `branch_delete` of the target.
- Wire into `publish_run_as`, `abort_run`, `fail_run`.
- Include `Failed` in the cleanup filter (was `Published | Aborted`
  only) for legacy-repo GC during branch_delete.
- Cleanup now checks `coordinator.all_branches()` first to skip
  branches already deleted by a concurrent handle — avoids Lance
  NotFound when two handles publish/clean up independently.
- Drop `Failed` from `ensure_branch_delete_safe` — post-fix, Failed
  means the branch is already gone, so there's no reason to block
  target deletion (MR-674 "Downstream effects").

Tests:
- New regression: `run_branches_do_not_accumulate_across_repeated_loads`
  — 10 loads + 1 abort → `branch_list() == ["main"]`.
- New `failed_load_deletes_run_branch` asserts Failed path cleans up.
- Rename `abort_run_keeps_target_unchanged_and_preserves_hidden_branch_for_inspection`
  → `abort_run_leaves_target_unchanged_and_deletes_run_branch`, invert
  the hidden-branch assertion.
- Rewrite `public_{load,mutation}_preserves_staged_edge_ids_on_publish`
  to capture staged IDs before publish instead of inspecting the run
  branch after (branch is gone now).
- Update MR-670 regression test to assert the run branch is *absent*
  after publish.

Deferred to follow-up: `--keep-run-branch` debug flag, `omnigraph run gc`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:15:39 +03:00
Andrew Altshuler
674eee16bb
Merge pull request #42 from ModernRelay/refactor/extract-compiler-tests-p3
Extract remaining crowded compiler tests (Phase 3)
2026-04-20 23:09:14 +03:00
andrew
54101f7e2c Extract remaining crowded compiler test modules
Files where inline tests crowded out production code (test/prod ratio
≥ 0.8) move to sibling files via `#[path]`. Files where production
dominates (query_input.rs, schema_plan.rs) stay inline — extracting
would add noise, not reduce it.

- ir/lower.rs: 1239 → 577 lines (ratio 1.15)
- catalog/mod.rs: 594 → 326 lines (ratio 0.83)
- query/lint.rs: 562 → 314 lines (ratio 0.80)

catalog/tests.rs uses the shorter name since it's inside a module
directory (no ambiguity with filename).

All 229 compiler tests green, identical count to before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 22:49:09 +03:00
Andrew Altshuler
f2c3b11508
Merge pull request #41 from ModernRelay/refactor/extract-compiler-tests
Extract compiler test modules to sibling files (Phase 2)
2026-04-20 19:15:45 +03:00
andrew
94849a50b4 Extract compiler test modules to sibling files
typecheck.rs, schema/parser.rs, and query/parser.rs each had
~1000-line inline `mod tests` blocks that overshadowed the production
code in the file. Move each to a sibling `*_tests.rs` using
`#[path = "..."] mod tests;`.

- typecheck.rs: 2865 → 1708 lines; typecheck_tests.rs: 1156 lines
- schema/parser.rs: 1950 → 994 lines; parser_tests.rs: 955 lines
- query/parser.rs: 1737 → 803 lines; parser_tests.rs: 933 lines

No visibility change — the sibling module still has `use super::*`
access to crate-privates. No semantic edits beyond de-indenting by
4 spaces (mechanical). All 229 compiler tests green, identical
count to before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:50:18 +03:00
Andrew Altshuler
789c0633e8
Merge pull request #40 from ModernRelay/refactor/extract-omnigraph-tests
Extract public-API tests from omnigraph.rs to integration tests
2026-04-20 14:15:16 +03:00
andrew
f05ea2c7c3 Extract public-API tests from omnigraph.rs to integration tests
The inline `mod tests` in crates/omnigraph/src/db/omnigraph.rs had grown
to ~620 lines, mixing tests that need crate-private access with tests
that only exercise the public API. Splits the latter out.

- tests/lifecycle.rs: 10 init/open/snapshot/drift tests
- tests/schema_apply.rs: 5 plan/apply tests
- omnigraph.rs: 10 tests remain inline because they use
  db.coordinator, db.table_store(), ManifestCoordinator,
  SCHEMA_APPLY_LOCK_BRANCH, or is_internal_run_branch — all
  crate-private and intentionally kept so.

No behavior change. Zero semantic edits to the tests themselves beyond
replacing db.snapshot() (pub(crate)) with snapshot_main helper at
integration-test boundaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:09:34 +03:00
Andrew Altshuler
b96fb8abe0
Merge pull request #39 from ModernRelay/fix/dockerfile-ecr-public-base
Dockerfile: switch base from Docker Hub to ECR Public
2026-04-20 13:47:01 +03:00
andrew
a92f0be9c8 Dockerfile: switch base from Docker Hub to ECR Public
AWS CodeBuild shares an outbound IP pool with many other AWS customers,
so anonymous Docker Hub pulls (100/6h per IP) rate-limit quickly. The
aws-feature variant in Package run 24642508475 hit 429 on debian:bookworm-slim.

ECR Public hosts the same official Debian images at
public.ecr.aws/debian/debian, has no pull rate limit, and is
anonymously accessible. Same upstream image, just mirrored on AWS.
2026-04-20 13:46:23 +03:00
Andrew Altshuler
a35698e952
Merge pull request #38 from ModernRelay/fix/mr-670-cleanup-run-branches
Clean up __run__ branch on publish, unblock schema apply (MR-670)
2026-04-20 13:32:49 +03:00
andrew
26012d156e Filter internal run branches in schema_apply (MR-670)
Published `__run__` branches are intentionally retained after publish
for post-publish inspection (runs.rs tests verify edge IDs match
between run branch and main). `apply_schema` was counting them as
"non-main" branches and refusing to run — permanently blocking schema
evolution after any load or change, with no CLI recovery path
(`branch_delete` rejects internal refs, `run abort` rejects Published
runs).

Fix: `apply_schema` filters `is_internal_system_branch` (covers both
`__run__*` and the schema-apply lock) rather than just the lock.
Run branches remain available for inspection.

Regression: test_apply_schema_succeeds_after_load_creates_published_run_branch
pins that schema apply succeeds after a load even while the run
branch is still present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:32:20 +03:00
Andrew Altshuler
a56f1d140a
Merge pull request #37 from ModernRelay/test/lance-mem-pool-size
Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml
2026-04-20 01:22:58 +03:00
andrew
dbde85b68d Raise LANCE_MEM_POOL_SIZE to 1 GB in .cargo/config.toml
Fixes flaky omnigraph-server integration suite under parallel cargo
test. Lance defaults to a 100 MB FairSpillPool per Omnigraph instance
(lance-datafusion/src/exec.rs:316). That's fine in prod (one server
process, bounded concurrent sorts) but too small when cargo test
spawns many Omnigraph instances in parallel, each running concurrent
BTree index builds during load.

Failure signature:
  Lance("create BTree index on node:Person(id): ... LanceError(IO):
  Not enough memory to continue external sort. ... 0.0 B remain
  available for the total pool")

Before: 10/41 OOM-fail on parallel run; passed with --test-threads=1.
After:  41/41 pass in parallel in ~3s.

[env] in .cargo/config.toml applies to cargo-launched processes only.
Shipped binaries (release tarballs, Docker images) are unaffected —
they inherit whatever the runtime env provides, defaulting to Lance's
100 MB when unset.
2026-04-19 22:27:49 +03:00
Ragnor Comerford
567ebe5f24
Merge pull request #24 from ModernRelay/ragnorc/explore-api
Add static OpenAPI spec and clean up operation IDs
2026-04-19 15:36:49 +02:00
Ragnor Comerford
bcddbdf485
Test merge commit; push openapi.json via separate clone
Restore the default pull_request checkout (refs/pull/N/merge) so tests
see the merged state. The openapi.json auto-commit now uses a separate
shallow clone of the PR branch, so the pushed commit contains only the
spec change rather than the merge-commit tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 12:10:40 +02:00
Andrew Altshuler
de1365b5d7
Merge pull request #36 from ModernRelay/fix/example-config-graphs-rename
Update example config to graphs / cli.graph (finishes MR-603)
2026-04-18 23:41:00 +03:00
andrew
206b5da20a example config: use graphs / cli.graph, matching the MR-603 rename
The target → graph rename shipped in PR #17 but omnigraph.example.yaml
still used the old form (`targets:` / `cli.target`). Since the serde
struct uses `rename = "graphs"` without a `targets` alias, the example
wouldn't deserialize against current code.

Update the example to the new form. No alias is being added — the
deserialization error for old configs is loud and clear, which is the
better migration signal for a young project.
2026-04-18 23:40:35 +03:00
Ragnor Comerford
a157f6a17c
Fold openapi.json auto-sync into main CI test job
The separate openapi-sync workflow was duplicating the workspace build
(~15 min cold-cache compile), paying the cost twice per PR. Fold the
regen + auto-commit into the existing test job: one compile, shared
rust-cache, same drift-check semantics.

- Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then
  commit the regenerated spec back to the PR branch
- Fork PRs / pushes: env var empty, test stays in strict drift-check mode
- openapi_spec_is_up_to_date treats empty env value as unset, so the
  conditional workflow env expression works

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:00:46 +02:00
Andrew Altshuler
dc5718fd43
Merge pull request #35 from ModernRelay/fix/package-caller-secrets-inherit
package caller: pass AWS secrets via secrets: inherit
2026-04-18 22:00:27 +03:00
andrew
987c51c376 package caller: pass AWS secrets via secrets: inherit
GitHub Actions doesn't expose the 'secrets' context in 'with:' when
calling a reusable workflow. The companion PR on the shared workflow
(ModernRelay/.github) moves the four AWS values into
on.workflow_call.secrets; this caller drops them from 'with:' and adds
'secrets: inherit' so all four flow through masked.

Trailing from PRs #33 and #34.
2026-04-18 21:54:08 +03:00
Andrew Altshuler
eeb890a4f5
Merge pull request #34 from ModernRelay/fix/package-workflow-use-secrets
package workflow: read AWS config from secrets, not variables
2026-04-18 21:45:47 +03:00
andrew
8086a0099c package workflow: read AWS config from secrets, not variables
On a public repo, Actions variables are not masked in workflow logs.
The AWS role ARN and artifact bucket name embed the AWS account ID —
not catastrophic, but norm-preserving to keep them out of public logs.

Switch all four values (region, role, project, bucket) from
`${{ vars.* }}` to `${{ secrets.* }}`. When secrets are passed via
`with:` to a reusable workflow, GitHub's masking still applies because
the value is added to the run's mask list as soon as the secret
reference is resolved.

Followup to #33 — should have landed as secrets from the start.
2026-04-18 21:43:12 +03:00
Ragnor Comerford
9de2079263
Merge remote-tracking branch 'origin/main' into ragnorc/explore-api
# Conflicts:
#	CONTRIBUTING.md
2026-04-18 20:24:39 +02:00
Andrew Altshuler
aa260cc2b9
Merge pull request #33 from ModernRelay/feat/package-workflow-dispatch
Add manual-dispatch Package workflow
2026-04-18 17:57:33 +03:00
andrew
807c1ba4dc Add manual-dispatch Package workflow for CodeBuild image builds
Invokes the shared omnigraph-package reusable workflow twice per run —
once with default features, once with --features aws — producing two
ECR tags per source commit:

  <sha>         (default features)
  <sha>-aws     (--features aws → SecretsManagerTokenSource)

Manual-dispatch only for now. Neither release.yml nor release-edge.yml
currently invokes the CodeBuild-backed packaging path; this gives
operators a way to produce on-demand image variants without wiring
packaging into the tag/push cadence.

Prerequisites:
- Repo vars AWS_REGION, AWS_ROLE_TO_ASSUME, AWS_CODEBUILD_PACKAGE_PROJECT,
  AWS_ARTIFACT_BUCKET must be set.
- Shared workflow must support the `features` and `image_tag_suffix`
  inputs.

Uses @main as the shared-workflow ref until a versioned tag is cut.
2026-04-18 16:29:43 +03:00
Andrew Altshuler
4c298bab12
Merge pull request #31 from ModernRelay/docs/aws-build-variant
Document AWS build variant and bearer-token sources
2026-04-18 05:32:25 +03:00
Andrew Altshuler
060a7e9ce9
Merge pull request #30 from ModernRelay/feat/aws-secrets-manager-token-source
Add aws feature + SecretsManagerTokenSource
2026-04-18 05:32:09 +03:00
Andrew Altshuler
2b493c0063
Merge pull request #29 from ModernRelay/refactor/token-source-trait
Extract TokenSource trait (prep for AWS backend)
2026-04-18 05:30:19 +03:00
Andrew Altshuler
c6e4b1aa01
Merge pull request #28 from ModernRelay/fix/bearer-auth-hardening
Harden bearer auth: constant-time compare, hashed at rest, authoritative actor_id
2026-04-18 05:20:01 +03:00
andrew
d830ebcb64 Document AWS build variant and bearer-token sources
- docs/deployment.md: new "Token sources" section listing the three
  bearer-token source precedences (AWS SM, JSON file/env, single token).
  New "Build Variants" section explaining default vs aws builds and
  their release-artifact naming. New "AWS Secrets Manager" section
  covering env var, secret payload format, IAM role credential
  discovery, and the hard error for feature-less builds.
- CONTRIBUTING.md: documents the `aws` feature and the two test
  commands contributors should run when touching auth code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 04:04:45 +03:00
andrew
7a3bf5c758 Add aws feature + SecretsManagerTokenSource backend
Introduces an opt-in AWS Secrets Manager backend for bearer tokens,
behind the `aws` Cargo feature. Default builds (on-prem, local dev)
don't pull in the AWS SDK and don't pay its compile cost.

- New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager`
  optional deps. Default features remain empty.
- New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by
  fetching a JSON `{"actor_id": "token", ...}` payload from a named
  Secrets Manager secret. Credentials resolve via the AWS default chain
  (env, shared config, IMDSv2 instance role, ECS task role) so no
  explicit plumbing is needed under an IAM role.
- New `resolve_token_source()` dispatches based on the
  `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set
  but the binary was built without `--features aws`, returns a clear
  rebuild instruction rather than silently falling back.
- `serve()` now uses `resolve_token_source()` and logs which source was
  selected at startup.
- `parse_json_secret_payload()` is factored out as a free function so
  the payload validation (trim whitespace, reject blank actor/token,
  reject non-object) is unit-testable without the AWS SDK.
- New CI job `test_aws_feature` builds + tests with `--features aws`.

Not in this PR (follow-ups):
- Background refresh loop for rotation. `SecretsManagerTokenSource`
  advertises `supports_refresh: true` but the AppState-level refresh
  task isn't wired yet.
- Config-YAML dispatch (today the AWS source is selected via env var
  only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`).

Tests:
- Default-feature build: 33 lib + 41 integration + 64 openapi.
- `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 03:48:51 +03:00