We have only two direct DataFusion consumers; everything else is
transitive through Lance.
| Site | Role | State |
|---|---|---|
| `crates/omnigraph/src/exec/query.rs::build_lance_filter_expr` (and helpers `ir_filter_to_expr`, `ir_expr_to_expr`, `literal_to_expr`) | Lower typed IR filters to a DataFusion `Expr` and apply via `Scanner::filter_expr` | **Structured** (PR #113) |
| `crates/omnigraph/src/table_store.rs::scan_pending_batches` | Run SQL against an in-memory `MemTable` registered in a fresh `SessionContext` to filter the in-flight `MutationStaging.pending` batches | String SQL — small enough that the migration cost outweighs the benefit; out of scope |
We have **no custom `impl ExecutionPlan`**, no exhaustive `match` on
`ScalarValue`, no direct `tantivy`/`tokenizer` imports. Three classes of
| `nested_expressions` feature enabled | PR #113 | Made `datafusion::functions_nested::expr_fn::array_has` (and the rest of the nested-type expr-fns) reachable from our code. |
| `execute_node_scan` → `Scanner::filter_expr(Expr)` | PR #113 | Killed string-flattened pushdown on the bulk of the read path. **`CompOp::Contains` now pushes down** (via `array_has`) — previously returned `None` from `ir_filter_to_sql` and fell through to in-memory post-scan filtering. DF 53 optimizer rules now act on our predicates instead of being short-circuited by the string SQL detour. |
## Passive wins active on DF 53
These activated automatically when PR #111 landed. They apply to any
predicate / plan that reaches DataFusion (now including our
`execute_node_scan` filters via the structured Expr path):
| DF PR | Win | Where it bites us |
|---|---|---|
| [#20528](https://github.com/apache/datafusion/pull/20528) | Vectorized `IN`-list eq kernel | `id IN (…)` predicates in cascade-delete (`exec/merge.rs:1016`) and the structured Expr path |
| [#20111](https://github.com/apache/datafusion/pull/20111) | `PhysicalExprSimplifier` constant-folds before exec | All predicates handed to Lance via `Scanner::filter_expr` |
| [#20097](https://github.com/apache/datafusion/pull/20097) | `CASE WHEN x THEN y ELSE NULL` shortcut | Any generated CASE expressions in our predicates |
| [#20228](https://github.com/apache/datafusion/pull/20228) | Push limit into hash join | Anti-join (`not { … }`) lowered to `JoinType::LeftAnti` with a query-level `LIMIT N` |
| [#19918](https://github.com/apache/datafusion/pull/19918) | `HashJoinExec::try_new``null_aware` flag | Correct `NOT IN` semantics when our anti-join involves nullable columns |
| **`hydrate_nodes` (Expand-time pushdown) → `Expr`** | Medium (~2 days) | The Expand pipeline (`exec/query.rs::hydrate_nodes`) still serializes through its `extra_filter_sql: Option<&str>` parameter. Migrating it pushes structured pushdown into `TableStorage::scan_stream(filter: Option<&str>)` → `Option<Expr>`, which cascades through 6+ call sites (`scan_stream_with`, `count_rows`, `count_rows_with_staged`). Largest remaining tech-debt slice on the structured-Expr refactor. |
| **Mutation delete predicate → `Expr`** via `DeleteBuilder::execute_uncommitted` (Lance [#6658](https://github.com/lance-format/lance/issues/6658)) | Our 6.0.1 → 7.0.0 bump | **Upstream gate now satisfied:** the API shipped in `v7.0.0-beta.10` and is in Lance **7.0.0 stable** (2026-05-28). The only remaining gate is the repo's own Lance bump (still pinned 6.0.1). Couples with **MR-A** (delete two-phase migration — tracked at [issue #112](https://github.com/ModernRelay/omnigraph/issues/112)). The DF Expr move at this site is half the work; the rest is retiring the parse-time D₂ rule and extending recovery sidecar coverage. |
| **`DeleteBuilder::from_expr(...)`** (Lance #6343, v5.0) | Same | The structured Expr variant of the inline delete path. Useful only while the inline `delete_where` residual still exists; supplanted by the staged form above once MR-A lands. |
### Tier 3 — future-shape (require owning more of the planner)
| Item | DF PR | Notes |
|---|---|---|
| Extension planner for `TableScan` | [#20548](https://github.com/apache/datafusion/pull/20548) | Would let us plug a custom planner converting `IROp::NodeScan` directly into a DF logical plan, bypassing the Lance string-SQL detour entirely. Big change. Only worth it if we own more of the physical plan. |
| `ExpressionPlacement` enum | [#20065](https://github.com/apache/datafusion/pull/20065) | Optimizer-hint enum letting the planner decide whether an expression evaluates in scan / filter / projection. Relevant only if we own optimizer rules. We don't. |
| `ExtractLeafExpressions` optimizer rule for `get_field` pushdown | [#20117](https://github.com/apache/datafusion/pull/20117) | Applies automatically when we use struct projections in pushdown. We don't generate them today. |
| `feat: support Set Comparison Subquery` | [#19109](https://github.com/apache/datafusion/pull/19109) | New subquery shape. Not relevant today — we don't lower to subqueries. |