omnigraph/crates/omnigraph/tests
Andrew Altshuler cb80fa40f1
exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113)
* exec/query: pushdown IR filters via DataFusion Expr (Scanner::filter_expr)

Switches `execute_node_scan` from string-flattened Lance SQL pushdown
(`build_lance_filter` + `scanner.filter(&str)`) to structured DataFusion
Expr pushdown (`build_lance_filter_expr` + `scanner.filter_expr(Expr)`).

## What this enables

1. **`CompOp::Contains` now pushes down.** `ir_filter_to_sql` returned
   `None` for list-contains (the comment said *"Can't pushdown list
   contains"*) because string SQL can't easily express it. With Expr,
   it lowers to DataFusion's `array_has(col, value)` builtin via the
   `nested_expressions` feature, and pushes down to Lance's scan layer
   the same way Eq/Lt/etc. do. Pinned by the new regression test
   `end_to_end::ir_filter_with_list_contains_pushes_down`.

2. **DataFusion 53's optimizer rules now reach our predicates.** Once
   the Expr lands at the Lance scanner, DF's planner runs:
   - `IN`-list vectorized eq kernel (DF #20528)
   - `PhysicalExprSimplifier` (DF #20111)
   - CASE WHEN x THEN y ELSE NULL shortcut (DF #20097)
   - Push limit into hash join (DF #20228)
   None of these were applicable before because the string SQL path
   short-circuited the optimizer.

## Scope

This is one of three string-flattened pushdown sites; the other two
(`hydrate_nodes`/Expand pushdown at query.rs:771-796 and the mutation
delete path in `exec/mutation.rs::predicate_to_sql`) stay on the SQL
string path for now:

- The Expand pushdown still serializes through `hydrate_nodes`'s
  `extra_filter_sql: Option<&str>` parameter. Migrating it changes the
  `TableStorage` trait surface (`scan_stream(filter: Option<&str>)` →
  `Option<Expr>`) and the cascading call sites — out of scope for this
  MR.
- The mutation delete predicate still goes through `Dataset::delete(&str)`
  in Lance 6.0.1. MR-A (delete two-phase via Lance #6658, gated on the
  Lance v7 bump per issue #112) will migrate that path to
  `DeleteBuilder::execute_uncommitted` taking an Expr.

The existing `ir_filter_to_sql` / `ir_expr_to_sql` / `literal_to_sql`
helpers stay in place to serve the remaining string-SQL consumers
(mutation predicates). They get retired when the other call sites
migrate.

## Cargo

Enables the `nested_expressions` feature on the `datafusion` workspace
dep. Lance already pulls in `datafusion-functions-nested` transitively
(it's listed in their feature set), so this just exposes the
`datafusion::functions_nested::expr_fn::array_has` re-export. No
transitive dep change (Cargo.lock unchanged).

## Tests

- New: `ir_filter_with_list_contains_pushes_down` — pins the case that
  was previously impossible (`ir_filter_to_sql` returning `None`).
- 906/906 workspace tests still pass.
- 417/417 engine integration tests pass (was 416 + the new one).
- 19/19 failpoints (recovery canary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: pin rustfs/rustfs to 1.0.0-beta.3 (last known-good before creds-policy break)

The RustFS S3 Integration job started failing 2026-05-23 with all 3
tests panicking on the first PUT:

  HTTP error: error sending request

The "Dump RustFS logs on failure" step revealed the container was
dying at startup:

  [FATAL] Server encountered an error and is shutting down:
  Default root credentials are not allowed on non-loopback listeners;
  set RUSTFS_ACCESS_KEY and RUSTFS_SECRET_KEY to non-default values,
  bind to loopback, or set RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true
  for local development only

`rustfs/rustfs:latest` was updated 2026-05-21 (1.0.0-beta.4) with a
credentials-policy check that rejects `rustfsadmin`/`rustfsadmin` as
"default" values. PR #111 passed yesterday because it ran against
beta.3; today's runs against beta.4 fail at container startup.

This is unrelated to PR #113's Expr-pushdown refactor — the bump
just happened to hit the same week.

Pin to 1.0.0-beta.3 (2026-05-14, last tag before the change). The
right long-term fix is one of:
  - Rotate the CI creds to less-default values (less coupling to
    RustFS's "default" set definition)
  - Set `RUSTFS_ALLOW_INSECURE_DEFAULT_CREDENTIALS=true` per the
    error message
  - Use a workflow service container with controlled lifecycle

Deferred — pinning is the minimal restore. Also incidentally
documents *which* version we tested against, which `:latest` never
did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 12:47:33 +01:00
..
fixtures Merge pull request #6 from ModernRelay/claude/omnigraph-aggregates-a53rG 2026-04-13 10:26:07 +02:00
helpers recovery: align merge sidecar branch with active_branch + record rollback drift 2026-05-05 19:33:32 +02:00
aggregation.rs Implement aggregate execution with wide-batch model 2026-04-12 20:59:13 +00:00
branching.rs chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
changes.rs Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
composite_flow.rs tests: pin refresh() deadlock after schema_apply (red) 2026-05-08 17:46:07 +02:00
consistency.rs engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957) (#109) 2026-05-22 18:19:54 +01:00
end_to_end.rs exec/query: structured Expr pushdown via Scanner::filter_expr (unblocks CompOp::Contains) (#113) 2026-05-23 12:47:33 +01:00
export.rs Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
failpoints.rs chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
forbidden_apis.rs recovery: rename composite test, strip ticket references, address review 2026-05-03 13:56:36 +02:00
lance_surface_guards.rs chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
lance_version_columns.rs Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
lifecycle.rs Extract public-API tests from omnigraph.rs to integration tests 2026-04-20 14:09:34 +03:00
maintenance.rs Strengthen cleanup-then-optimize sequencing test with postconditions 2026-05-12 23:36:01 +03:00
merge_truth_table.rs MR-786: merge-pair truth table with exhaustive op-variant matrix (#81) 2026-05-12 22:36:01 +03:00
point_in_time.rs Initial public Omnigraph repository 2026-04-10 20:49:41 +03:00
policy_engine_chassis.rs tests: policy chassis e2e gap-fills (MR-722 follow-up) (#106) 2026-05-18 22:25:04 +03:00
recovery.rs recovery: record RolledForward audit on stale-after-success sidecar 2026-05-05 20:12:43 +02:00
runs.rs engine: opt MergeInsertBuilder into FirstSeen for Lance dup-rowid bug (MR-957) (#109) 2026-05-22 18:19:54 +01:00
s3_storage.rs chore: scrub Linear ticket numbers and review-bot mentions from code comments 2026-05-01 22:45:38 +02:00
schema_apply.rs schema-lint chassis v1.2: --allow-data-loss flag + Hard mode (MR-694) — completes v1 (#100) 2026-05-16 22:12:46 +03:00
search.rs chore(lance): bump 4.0.0 → 6.0.1 (DataFusion 52→53, Arrow 57→58) (#111) 2026-05-23 00:42:29 +01:00
staged_writes.rs tests: pin stable-row-id preservation across stage_overwrite 2026-05-12 16:56:58 -07:00
traversal.rs Add comprehensive tests from morphological matrix analysis 2026-04-13 15:31:08 +02:00
validators.rs Enforce schema validators on every write path (#59) 2026-04-28 04:51:10 +03:00