omnigraph/.github/workflows/ci.yml
Andrew Altshuler cb80fa40f1
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)

Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).

## What this enables

1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
   `None` for list-contains (the comment said *"Can't pushdown list
   contains"*) because string SQL can't easily express it. With Expr,
   it lowers to DataFusion's `array_has(col, value)` builtin via the
   `nested_expressions` feature, and pushes down to Lance's scan layer
   the same way Eq/Lt/etc. do. Pinned by the new regression test
   `end_to_end::ir_filter_with_list_contains_pushes_down`.

2. **DataFusion 53's optimizer rules now reach our predicates.** Once
   the Expr lands at the Lance scanner, DF's planner runs:
   - `IN`-list vectorized eq kernel (DF #20528)
   - `PhysicalExprSimplifier` (DF #20111)
   - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
   - Push limit into hash join (DF #20228)
   None of these were applicable before because the string SQL path
   short-circuited the optimizer.

## Scope

This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:

- The Expand pushdown still serializes through `hydrate_nodes`'s
  `extra_filter_sql: Option<&str>` parameter. Migrating it changes the
  `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
  `Option<Expr>`) and the cascading call sites — out of scope for this
  MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
  in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
  Lance v7 bump per issue #112) will migrate that path to
  `DeleteBuilder::execute_uncommitted` taking an Expr.

The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.

## Cargo

Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).

## Tests

- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
  was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)

The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:

  HTTP error: error sending request

The "Dump RustFS logs on failure" step revealed the container was
dying at startup:

  [FATAL] Server encountered an error and is shutting down:
  Default root credentials are not allowed on non-loopback listeners;
  set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
  bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
  for local development only

`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.

This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.

Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
  - Rotate the CI creds to less-default values (less coupling to
    RustFS's "default" set definition)
  - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
    error message
  - Use a workflow service container with controlled lifecycle

Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:47:33 +01:00

341 lines
12 KiB
YAML

name: CI
on:
pull_request:
push:
branches:
- main
tags:
- "v*"
workflow_dispatch:
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
classify_changes:
name: Classify Changes
runs-on: ubuntu-latest
permissions:
contents: read
outputs:
run_full_ci: ${{ steps.filter.outputs.run_full_ci }}
run_rustfs_ci: ${{ steps.filter.outputs.run_rustfs_ci }}
steps:
- name: Checkout source
uses: actions/checkout@v5.0.1
with:
fetch-depth: 0
- name: Detect text-only changes
id: filter
env:
BEFORE_SHA: ${{ github.event.before }}
EVENT_NAME: ${{ github.event_name }}
PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
REF_TYPE: ${{ github.ref_type }}
run: |
set -euo pipefail
if [[ "$EVENT_NAME" == "workflow_dispatch" || "$REF_TYPE" == "tag" ]]; then
echo "run_full_ci=true" >> "$GITHUB_OUTPUT"
echo "run_rustfs_ci=true" >> "$GITHUB_OUTPUT"
exit 0
fi
if [[ "$EVENT_NAME" == "pull_request" ]]; then
base="$PR_BASE_SHA"
head="$PR_HEAD_SHA"
else
base="$BEFORE_SHA"
head="$GITHUB_SHA"
if [[ "$base" == "0000000000000000000000000000000000000000" ]]; then
base="$(git rev-parse "${head}^" 2>/dev/null || true)"
fi
fi
if [[ -z "${base:-}" ]]; then
echo "run_full_ci=true" >> "$GITHUB_OUTPUT"
echo "run_rustfs_ci=true" >> "$GITHUB_OUTPUT"
exit 0
fi
mapfile -t changed < <(git diff --name-only "$base" "$head")
if [[ "${#changed[@]}" -eq 0 ]]; then
echo "run_full_ci=true" >> "$GITHUB_OUTPUT"
echo "run_rustfs_ci=true" >> "$GITHUB_OUTPUT"
exit 0
fi
run_full_ci=false
run_rustfs_ci=false
for path in "${changed[@]}"; do
case "$path" in
*.md|*.mdx|*.txt|*.rst|*.adoc) ;;
*)
run_full_ci=true
;;
esac
if [[ "$EVENT_NAME" != "pull_request" ]]; then
run_rustfs_ci=true
continue
fi
case "$path" in
.github/workflows/ci.yml|Cargo.toml|Cargo.lock|crates/*/Cargo.toml) run_rustfs_ci=true ;;
crates/omnigraph/src/storage.rs) run_rustfs_ci=true ;;
crates/omnigraph/src/db/manifest.rs|crates/omnigraph/src/db/manifest/*) run_rustfs_ci=true ;;
crates/omnigraph/tests/s3_storage.rs|crates/omnigraph/tests/helpers/*) run_rustfs_ci=true ;;
crates/omnigraph-server/tests/server.rs) run_rustfs_ci=true ;;
crates/omnigraph-cli/tests/system_local.rs) run_rustfs_ci=true ;;
esac
done
printf 'Changed files:\n'
printf ' %s\n' "${changed[@]}"
echo "run_full_ci=$run_full_ci" >> "$GITHUB_OUTPUT"
echo "run_rustfs_ci=$run_rustfs_ci" >> "$GITHUB_OUTPUT"
check_agents_md:
name: Check AGENTS.md Links
runs-on: ubuntu-latest
permissions:
contents: read
steps:
- name: Checkout source
uses: actions/checkout@v5.0.1
- name: Verify AGENTS.md ↔ docs/ cross-links
run: bash scripts/check-agents-md.sh
test:
name: Test Workspace
needs: classify_changes
runs-on: ubuntu-latest
timeout-minutes: 45
permissions:
contents: write
env:
CARGO_TERM_COLOR: always
steps:
- name: Skip heavy CI for text-only changes
if: needs.classify_changes.outputs.run_full_ci != 'true'
run: echo "Text-only change detected; skipping workspace test run."
# Default checkout: on pull_request this gives us the merge commit
# (refs/pull/N/merge), which is what we want to test. For same-repo PRs
# the regenerated openapi.json is pushed to the head branch below via a
# separate shallow clone.
- name: Checkout source
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: actions/checkout@v5.0.1
- name: Install system dependencies
if: needs.classify_changes.outputs.run_full_ci == 'true'
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler libprotobuf-dev
- name: Install Rust stable
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
- name: Cache Rust build data
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: Swatinem/rust-cache@v2
with:
workspaces: |
. -> target
- name: Run workspace tests
if: needs.classify_changes.outputs.run_full_ci == 'true'
# On same-repo PRs, regenerate openapi.json as part of the drift test
# so the following step can commit the update. Elsewhere the env var
# is empty, leaving the drift test in strict-check mode.
env:
OMNIGRAPH_UPDATE_OPENAPI: ${{ (github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository) && '1' || '' }}
run: cargo test --workspace --locked
- name: Run failpoints feature test
if: needs.classify_changes.outputs.run_full_ci == 'true'
# Run after the workspace test so the build cache is warm —
# enabling --features failpoints is just an incremental rebuild
# of omnigraph-engine + the small `fail` crate, not the full
# dep tree (lance, datafusion). A separate job with its own
# cache key would be a fresh ~20min build on first run; this
# is ~30s on a warm cache.
run: cargo test --locked -p omnigraph-engine --features failpoints --test failpoints
- name: Commit regenerated openapi.json to PR branch
if: |
needs.classify_changes.outputs.run_full_ci == 'true' &&
github.event_name == 'pull_request' &&
github.event.pull_request.head.repo.full_name == github.repository
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# The workspace was checked out at the PR's merge commit so tests
# see the merged state. Pushing the regenerated openapi.json back
# to the PR branch is done via a separate shallow clone so the
# pushed commit contains only the spec change, not the merge state.
if git diff --quiet -- openapi.json; then
echo "openapi.json is already in sync."
exit 0
fi
tmp=$(mktemp -d)
git clone --depth 1 --branch "${{ github.head_ref }}" \
"https://x-access-token:${GITHUB_TOKEN}@github.com/${{ github.repository }}.git" \
"$tmp"
cp openapi.json "$tmp/openapi.json"
cd "$tmp"
if git diff --quiet -- openapi.json; then
echo "openapi.json matches PR branch; nothing to push."
exit 0
fi
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add openapi.json
git commit -m "chore: regenerate openapi.json"
git push
test_aws_feature:
name: Test omnigraph-server --features aws
needs: classify_changes
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: read
env:
CARGO_TERM_COLOR: always
steps:
- name: Skip for text-only changes
if: needs.classify_changes.outputs.run_full_ci != 'true'
run: echo "Text-only change detected; skipping aws feature build."
- name: Checkout source
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: actions/checkout@v5.0.1
- name: Install system dependencies
if: needs.classify_changes.outputs.run_full_ci == 'true'
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler libprotobuf-dev
- name: Install Rust stable
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
- name: Cache Rust build data
if: needs.classify_changes.outputs.run_full_ci == 'true'
uses: Swatinem/rust-cache@v2
with:
workspaces: |
. -> target
key: aws-feature
- name: Build omnigraph-server with aws feature
if: needs.classify_changes.outputs.run_full_ci == 'true'
run: cargo build --locked -p omnigraph-server --features aws
- name: Test omnigraph-server with aws feature
if: needs.classify_changes.outputs.run_full_ci == 'true'
run: cargo test --locked -p omnigraph-server --features aws
rustfs_integration:
name: RustFS S3 Integration
needs:
- classify_changes
- test
if: needs.classify_changes.outputs.run_rustfs_ci == 'true'
runs-on: ubuntu-latest
timeout-minutes: 75
permissions:
contents: read
env:
AWS_ACCESS_KEY_ID: rustfsadmin
AWS_SECRET_ACCESS_KEY: rustfsadmin
AWS_REGION: us-east-1
AWS_ENDPOINT_URL: http://127.0.0.1:9000
AWS_ENDPOINT_URL_S3: http://127.0.0.1:9000
AWS_ALLOW_HTTP: "true"
AWS_S3_FORCE_PATH_STYLE: "true"
OMNIGRAPH_S3_TEST_BUCKET: omnigraph-ci
OMNIGRAPH_S3_TEST_PREFIX: github-actions
CARGO_TERM_COLOR: always
steps:
- name: Checkout source
uses: actions/checkout@v5.0.1
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler libprotobuf-dev python3-pip
- name: Install Rust stable
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
- name: Cache Rust build data
uses: Swatinem/rust-cache@v2
with:
workspaces: |
. -> target
- name: Start RustFS
# Pinned to 1.0.0-beta.3 (2026-05-14) — the last known-good tag.
# `rustfs/rustfs:latest` (1.0.0-beta.4, 2026-05-21) added a
# credentials-policy check that refuses to start when
# AWS_ACCESS_KEY_ID/SECRET_ACCESS_KEY are values it considers
# "default" (rustfsadmin/rustfsadmin in our case). Bumping to
# beta.4+ requires either rotating those creds to less-default
# values or setting RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
# — deliberate work, not an emergency. Pin first; upgrade later.
run: |
docker rm -f rustfs >/dev/null 2>&1 || true
docker run -d \
--name rustfs \
-p 9000:9000 \
-p 9001:9001 \
-e RUSTFS_ACCESS_KEY="${AWS_ACCESS_KEY_ID}" \
-e RUSTFS_SECRET_KEY="${AWS_SECRET_ACCESS_KEY}" \
rustfs/rustfs:1.0.0-beta.3 \
/data
- name: Install AWS CLI
run: |
python3 -m pip install --user awscli
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
- name: Create RustFS test bucket
run: |
for _ in $(seq 1 30); do
if aws --endpoint-url "${AWS_ENDPOINT_URL_S3}" s3api list-buckets >/dev/null 2>&1; then
break
fi
sleep 2
done
aws --endpoint-url "${AWS_ENDPOINT_URL_S3}" \
s3api create-bucket \
--bucket "${OMNIGRAPH_S3_TEST_BUCKET}" >/dev/null 2>&1 || true
- name: Run RustFS storage tests
run: cargo test --locked -p omnigraph-engine --test s3_storage -- --nocapture
- name: Run RustFS server smoke
run: cargo test --locked -p omnigraph-server --test server server_opens_s3_repo_directly_and_serves_snapshot_and_read -- --nocapture
- name: Run RustFS CLI smoke
run: cargo test --locked -p omnigraph-cli --test system_local local_cli_s3_end_to_end_init_load_read_flow -- --nocapture
- name: Dump RustFS logs on failure
if: failure()
run: docker logs rustfs