omnigraph

mirror of https://github.com/ModernRelay/omnigraph.git synced 2026-06-09 01:35:18 +02:00

Author	SHA1	Message	Date
Andrew Altshuler	1a9f8b1f7f	ci(publish-crates): include omnigraph-policy in the publish list (#116 ) omnigraph-policy is a new crate this release cycle (Cedar policy engine, MR-722). It wasn't added to the publish list when it was created, so v0.5.0's tag-triggered publish run succeeded for omnigraph-compiler but failed at omnigraph-engine: failed to prepare local package for uploading Caused by: no matching package named `omnigraph-policy` found location searched: crates.io index required by package `omnigraph-engine v0.5.0` omnigraph-policy has no internal omnigraph-* deps so it can publish after omnigraph-compiler (either could go first). omnigraph-engine depends on both; server on the three; cli on everything. publish_if_new is idempotent — re-running with the v0.5.0 tag after this lands will skip omnigraph-compiler (already published), then publish policy + engine + server + cli. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 14:09:58 +01:00
Andrew Altshuler	cb80fa40f1	exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113 ) * exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr) Switches `execute_node_scan` from string-flattened Lance SQL pushdown (`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`). ## What this enables 1. `CompOp::Contains` now pushes down. `ir_filter_to_sql` returned `None` for list-contains (the comment said "Can't pushdown list contains") because string SQL can't easily express it. With Expr, it lowers to DataFusion's `array_has(col, value)` builtin via the `nested_expressions` feature, and pushes down to Lance's scan layer the same way Eq/Lt/etc. do. Pinned by the new regression test `end_to_end::ir_filter_with_list_contains_pushes_down`. 2. DataFusion 53's optimizer rules now reach our predicates. Once the Expr lands at the Lance scanner, DF's planner runs: - `IN`-list vectorized eq kernel (DF #20528) - `PhysicalExprSimplifier` (DF #20111) - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097) - Push limit into hash join (DF #20228) None of these were applicable before because the string SQL path short-circuited the optimizer. ## Scope This is one of three string-flattened pushdown sites; the other two (`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL string path for now: - The Expand pushdown still serializes through `hydrate_nodes`'s `extra_filter_sql: Option<&str>` parameter. Migrating it changes the `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` → `Option<Expr>`) and the cascading call sites — out of scope for this MR. - The mutation delete predicate still goes through `Dataset::delete(&str)` in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the Lance v7 bump per issue #112) will migrate that path to `DeleteBuilder::execute_uncommitted` taking an Expr. The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql` helpers stay in place to serve the remaining string-SQL consumers (mutation predicates). They get retired when the other call sites migrate. ## Cargo Enables the `nested_expressions` feature on the `datafusion` workspace dep. Lance already pulls in `datafusion-functions-nested` transitively (it's listed in their feature set), so this just exposes the `datafusion::functions_nested::expr_fn::array_has` re-export. No transitive dep change (Cargo.lock unchanged). ## Tests - New: `ir_filter_with_list_contains_pushes_down` — pins the case that was previously impossible (`ir_filter_to_sql` returning `None`). - 906/906 workspace tests still pass. - 417/417 engine integration tests pass (was 416 + the new one). - 19/19 failpoints (recovery canary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break) The RustFS S3 Integration job started failing 2026-05-23 with all 3 tests panicking on the first PUT: HTTP error: error sending request The "Dump RustFS logs on failure" step revealed the container was dying at startup: [FATAL] Server encountered an error and is shutting down: Default root credentials are not allowed on non-loopback listeners; set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values, bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true for local development only `rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as "default" values. PR #111 passed yesterday because it ran against beta.3; today's runs against beta.4 fail at container startup. This is unrelated to PR #113's Expr-pushdown refactor — the bump just happened to hit the same week. Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The right long-term fix is one of: - Rotate the CI creds to less-default values (less coupling to RustFS's "default" set definition) - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the error message - Use a workflow service container with controlled lifecycle Deferred — pinning is the minimal restore. Also incidentally documents which version we tested against, which `:latest` never did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-23 12:47:33 +01:00
Andrew Altshuler	730712b73f	codeowners: yml source of truth + generator + drift CI (#88 ) * codeowners: generator + drift CI + initial roles Source-of-truth approach to CODEOWNERS: yml is hand-edited, CODEOWNERS is generated and CI-enforced. Every role change is a reviewable PR with a permanent in-repo audit trail. No GitHub UI clicks, no shadow state. Initial roles: engineering @aaltshuler owns crates/** + default (.github/, scripts/, Cargo., openapi.json, everything else not docs) docs @aaltshuler @ragnorc owns docs/, README.md, AGENTS.md, CLAUDE.md, SECURITY.md Per GitHub semantics, multiple owners on a CODEOWNERS line means "any one satisfies the review" — for docs, either named member can approve. Strict "N distinct approvers" would need a CI workaround (not wired today; tracked for future hardening). Components: - .github/codeowners-roles.yml — source of truth. Edit this. - .github/scripts/render-codeowners.py — generator (PyYAML; ~100 LoC). - .github/CODEOWNERS — generated. CI rejects hand-edits. - .github/workflows/codeowners.yml — two checks: drift: re-render and assert CODEOWNERS matches. * noedit: reject PRs that edit CODEOWNERS without editing the yml. - docs/codeowners.md — explains the source-of-truth pattern, how to change roles, how to add new roles. - AGENTS.md topic-index row. What's NOT in this PR: - Branch protection on main (separate PR; needs `gh api` call against the org). - Required-reviewer enforcement (depends on branch protection landing). - Required CI status checks (depends on branch protection landing). - Scheduled rotation (the schedule: block in the yml + a weekly workflow). Today's roles are stable; rotation isn't needed yet. - Linear-as-source-of-truth integration (Approach 4 from the design discussion; deferred). Verified: - Generator output is deterministic (idempotent re-runs). - scripts/check-agents-md.sh OK (28 links, 28 docs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * codeowners: fix catch-all ordering (Devin review #88) Devin caught a real bug: GitHub CODEOWNERS uses "last match wins" semantics, but the generator emitted the catch-all `` AFTER specific patterns. Net effect: `` won for every file, silently nullifying the docs role and never routing reviews to @ragnorc. Fix is one-line — emit the default `` line before iterating the specific paths. Also: - Added a regression assertion in the generator: after rendering, the first non-comment line must start with `` if a default is configured. Generator exits non-zero otherwise. Catches the same class of mistake in any future refactor. - Rewrote the yml header comment, which incorrectly stated "keep more-specific paths after broader patterns" (correct for GitHub semantics but the generator was doing the opposite — so the comment read as a description of behavior when it was actually a contradicted intention). Verified by re-rendering: `` is now line 12, `crates/` is line 14, `docs/` is line 15, etc. README.md matches both `` and `README.md`; `README.md` is later → wins → @aaltshuler + @ragnorc both assigned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 17:26:06 +03:00
Andrew Altshuler	92e3886cfa	ci: add publish-crates workflow for crates.io releases (#74 ) The release.yml workflow builds binaries and updates Homebrew but never published to crates.io — v0.4.0 and v0.4.1 are missing from the registry even though the local Cargo.toml and the v0.4.1 tag are at 0.4.1. This adds a separate workflow that: - auto-publishes on every v* tag push (future releases self-publish) - can be manually dispatched with a tag input (catch up on v0.4.1) - is idempotent: skips a crate if its current crates.io version already matches local, so a partial failure is safe to retry - gates on CARGO_REGISTRY_TOKEN (already in repo secrets); skips cleanly if the token is ever rotated out Publishes in dependency order: omnigraph-compiler → omnigraph-engine → omnigraph-server → omnigraph-cli. Path-only deps in Cargo.toml carry explicit version fields, so cargo publish strips paths and resolves against crates.io. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 15:48:37 +03:00
Ragnor Comerford	675568ce85	ci: fold failpoints test into Test Workspace job The standalone test_failpoints_feature job took 21min on first run (cold cache; the omnigraph-engine crate has lance + datafusion deps that make any fresh build expensive). Folding into Test Workspace shares the warm cache so the failpoints invocation is incremental — ~30s vs 21min on subsequent runs, and within the workspace job's existing budget. The failpoints feature is gated behind a Cargo flag and only adds the small `fail` crate dep + a few feature-gated code paths; it doesn't change the dep tree of any other crate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 21:15:14 +02:00
Ragnor Comerford	052b6e680f	MR-794 step 2: address PR #68 follow-up review (Cubic) — pending dedupe + projection guard + CI Three new findings from Cubic on commit `3223b51`: * Pending edge cardinality counted within-input duplicates (P2): count_src_per_edge's pending walk added every row to the count, including duplicate rows that finalize will collapse via dedupe_merge_batches_by_id. A LoadMode::Merge with the same edge id twice would over-count → spurious @card violation. Fix: when dedupe_key_column is Some, walk pending in reverse, track seen keys via HashSet, count only the kept (last-occurrence) rows. Mirrors finalize-time dedupe so cardinality counts what stage_merge_insert actually publishes. * scan_with_pending silently disabled merge-shadow when projection omitted key_column (P2): if a caller passed Some("id") as key_column but their projection didn't include "id", the filter_out_rows_where_string_in helper passed batches through unchanged — silently degrading to union semantics. Fix: validate up front that projection contains key_column when both are Some; return a typed Lance error otherwise. Tightened the helper too: missing column is now an internal error (was a silent passthrough). * Cascade-vs-explicit delete test was too weak (P2): asserted only that edge count decreased after delete. The cascade alone could satisfy that even if the explicit second-delete silently no-op'd. Strengthened: assert post_knows == 0, which only holds when both ops landed (Bob→Diana would survive if op-2 no-op'd). CI gap: also added test_failpoints_feature job to .github/workflows/ci.yml. The workspace test runs without --features failpoints (the feature is behind a Cargo flag), so the failpoints test suite was never exercised by CI before now. The new job builds + runs `cargo test -p omnigraph-engine --features failpoints --test failpoints` on every full CI run, mirroring the test_aws_feature pattern. New tests on tests/runs.rs: * load_merge_mode_dedupes_within_pending_for_cardinality_count (Cubic P2 #2 — pending-vs-pending dedup, distinct from the load_merge_mode_dedupes_edge_for_cardinality_count test which covers committed-vs-pending dedup). * scan_with_pending_rejects_key_column_missing_from_projection (Cubic P2 #3 — verifies the up-front validation rejects bad callers and that the happy path still works correctly). Local test results: * tests/runs.rs: 23/23 passed * tests/failpoints.rs --features failpoints: 7/7 passed (includes the two new finalize→publisher residual tests landed in `3223b51`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 20:47:45 +02:00
Ragnor Comerford	a9430978fb	Merge pull request #60 from ModernRelay/ragnorc/omnigraph-spec Add AGENTS.md (map) + docs/ knowledge base + CI link check	2026-04-29 00:15:19 +02:00
Ragnor Comerford	a335d98854	Refactor AGENTS.md from encyclopedia to map; move spec into docs/ Splits the 990-line AGENTS.md into a 184-line map (architecture, where-to-find index, always-on invariants, capability matrix, maintenance contract) plus 18 new docs/*.md files holding the deep content per topic (storage, schema and query languages, indexes, embeddings, branches/commits, runs, merge, changes, execution, policy, server, CLI reference, audit, errors, CI, constants, v0.3.1 notes). Adds scripts/check-agents-md.sh and a check_agents_md CI job that verifies every docs/ link in AGENTS.md resolves and every doc in the canonical set is linked. CLAUDE.md remains a symlink to AGENTS.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 23:31:08 +02:00
Andrew Altshuler	372f793ad6	Drop macOS x86_64 build target (#55 ) Stop producing the omnigraph-macos-x86_64 archive in both the stable and edge release workflows. The macos-15-intel runner build was the slowest of the matrix and Apple Silicon is now the default Mac developer target. - release.yml + release-edge.yml: drop the macos-15-intel matrix entry - install.sh: drop the Darwin/x86_64 case so Intel Macs get a clear "no prebuilt binary" error instead of attempting an absent download - update-homebrew-formula.sh: drop the MACOS_X86_* variables and emit an arm64-only Homebrew formula. The on_macos block now declares `depends_on arch: :arm64` so Intel `brew install` fails fast with a clear architecture message instead of installing an arm64 binary that errors at exec time. Linux x86_64 build is unaffected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 18:19:26 +03:00
andrew	a1b00e2d06	Fix release.yml: move HOMEBREW_TAP_TOKEN guard into steps GitHub Actions rejects `secrets.*` in job-level `if:` conditions at runtime (job-level `if` is evaluated before secrets are available), causing the workflow to abort in 0s with "workflow file issue" on every trigger. Moving the guard into a step-level check that writes `HOMEBREW_TAP_SKIP` to GITHUB_ENV lets the rest of the steps conditionally no-op when the tap token isn't configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 19:24:41 +03:00
Ragnor Comerford	567ebe5f24	Merge pull request #24 from ModernRelay/ragnorc/explore-api Add static OpenAPI spec and clean up operation IDs	2026-04-19 15:36:49 +02:00
Ragnor Comerford	bcddbdf485	Test merge commit; push openapi.json via separate clone Restore the default pull_request checkout (refs/pull/N/merge) so tests see the merged state. The openapi.json auto-commit now uses a separate shallow clone of the PR branch, so the pushed commit contains only the spec change rather than the merge-commit tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 12:10:40 +02:00
Ragnor Comerford	a157f6a17c	Fold openapi.json auto-sync into main CI test job The separate openapi-sync workflow was duplicating the workspace build (~15 min cold-cache compile), paying the cost twice per PR. Fold the regen + auto-commit into the existing test job: one compile, shared rust-cache, same drift-check semantics. - Same-repo PRs: OMNIGRAPH_UPDATE_OPENAPI=1 during the test run, then commit the regenerated spec back to the PR branch - Fork PRs / pushes: env var empty, test stays in strict drift-check mode - openapi_spec_is_up_to_date treats empty env value as unset, so the conditional workflow env expression works Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 21:00:46 +02:00
andrew	987c51c376	package caller: pass AWS secrets via secrets: inherit GitHub Actions doesn't expose the 'secrets' context in 'with:' when calling a reusable workflow. The companion PR on the shared workflow (ModernRelay/.github) moves the four AWS values into on.workflow_call.secrets; this caller drops them from 'with:' and adds 'secrets: inherit' so all four flow through masked. Trailing from PRs #33 and #34.	2026-04-18 21:54:08 +03:00
andrew	8086a0099c	package workflow: read AWS config from secrets, not variables On a public repo, Actions variables are not masked in workflow logs. The AWS role ARN and artifact bucket name embed the AWS account ID — not catastrophic, but norm-preserving to keep them out of public logs. Switch all four values (region, role, project, bucket) from `${{ vars.* }}` to `${{ secrets.* }}`. When secrets are passed via `with:` to a reusable workflow, GitHub's masking still applies because the value is added to the run's mask list as soon as the secret reference is resolved. Followup to #33 — should have landed as secrets from the start.	2026-04-18 21:43:12 +03:00
Ragnor Comerford	9de2079263	Merge remote-tracking branch 'origin/main' into ragnorc/explore-api # Conflicts: # CONTRIBUTING.md	2026-04-18 20:24:39 +02:00
andrew	807c1ba4dc	Add manual-dispatch Package workflow for CodeBuild image builds Invokes the shared omnigraph-package reusable workflow twice per run — once with default features, once with --features aws — producing two ECR tags per source commit: <sha> (default features) <sha>-aws (--features aws → SecretsManagerTokenSource) Manual-dispatch only for now. Neither release.yml nor release-edge.yml currently invokes the CodeBuild-backed packaging path; this gives operators a way to produce on-demand image variants without wiring packaging into the tag/push cadence. Prerequisites: - Repo vars AWS_REGION, AWS_ROLE_TO_ASSUME, AWS_CODEBUILD_PACKAGE_PROJECT, AWS_ARTIFACT_BUCKET must be set. - Shared workflow must support the `features` and `image_tag_suffix` inputs. Uses @main as the shared-workflow ref until a versioned tag is cut.	2026-04-18 16:29:43 +03:00
andrew	7a3bf5c758	Add aws feature + SecretsManagerTokenSource backend Introduces an opt-in AWS Secrets Manager backend for bearer tokens, behind the `aws` Cargo feature. Default builds (on-prem, local dev) don't pull in the AWS SDK and don't pay its compile cost. - New Cargo feature `aws` gates the `aws-config` + `aws-sdk-secretsmanager` optional deps. Default features remain empty. - New `auth::aws::SecretsManagerTokenSource` implements `TokenSource` by fetching a JSON `{"actor_id": "token", ...}` payload from a named Secrets Manager secret. Credentials resolve via the AWS default chain (env, shared config, IMDSv2 instance role, ECS task role) so no explicit plumbing is needed under an IAM role. - New `resolve_token_source()` dispatches based on the `OMNIGRAPH_SERVER_BEARER_TOKENS_AWS_SECRET` env var. If the var is set but the binary was built without `--features aws`, returns a clear rebuild instruction rather than silently falling back. - `serve()` now uses `resolve_token_source()` and logs which source was selected at startup. - `parse_json_secret_payload()` is factored out as a free function so the payload validation (trim whitespace, reject blank actor/token, reject non-object) is unit-testable without the AWS SDK. - New CI job `test_aws_feature` builds + tests with `--features aws`. Not in this PR (follow-ups): - Background refresh loop for rotation. `SecretsManagerTokenSource` advertises `supports_refresh: true` but the AppState-level refresh task isn't wired yet. - Config-YAML dispatch (today the AWS source is selected via env var only; eventually `server.bearer_tokens.source` in `omnigraph.yaml`). Tests: - Default-feature build: 33 lib + 41 integration + 64 openapi. - `--features aws` build: 32 lib (one test is cfg-gated) + 41 + 64. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 03:48:51 +03:00
Ragnor Comerford	dda9728473	Add openapi.json auto-sync workflow	2026-04-17 19:09:36 +02:00
andrew	ad7027c7e9	Automate Homebrew tap updates on release tags	2026-04-15 17:57:21 +03:00
andrew	33bdab1fcb	Prepare v0.2.2 release	2026-04-14 20:13:00 +03:00
andrew	ff83e97cb5	Scope RustFS CI to relevant changes	2026-04-12 15:33:41 +03:00
andrew	af7a74bf2c	Skip heavy CI on text-only changes	2026-04-11 15:22:11 +03:00
andrew	446075f333	Update workflow actions and add Homebrew install docs	2026-04-11 04:01:39 +03:00
andrew	816b24d05e	Fix public binary install flow	2026-04-11 02:19:21 +03:00
andrew	cbb312e74f	Split binary and source install flows	2026-04-10 23:26:09 +03:00
andrew	338289656a	Initial public Omnigraph repository	2026-04-10 20:49:41 +03:00

27 commits