omnigraph/crates
Ragnor Comerford d8e0bfeb22
Dedupe dst ids before hydrating nodes in execute_expand (#45)
The BFS in execute_expand emits one (src_idx, dst_id) pair per edge, so
dst_id_list contains heavy duplication when multi-hop traversals revisit
the same destination nodes. hydrate_nodes then built an
"id IN ('a', 'b', ...)" filter from the full list, passing it verbatim
to Lance. On a 30k-node Person graph, a 3-hop query produced a 15.4M-
entry IN-list against a 30k-row target — 512x more entries than unique
ids.

Deduplicate before the Lance scan; the post-hydrate alignment HashMap
already fans results back out to the original (src, dst) pairs, so
output is bit-identical.

Bench numbers (crates/omnigraph/examples/bench_expand.rs, min of 2-3
runs, release build):

  query         before     after    speedup
  1k   hop3     460 ms     28 ms     16x
  10k  hop2    4.21 s     188 ms     22x
  10k  hop3   40.59 s    1.30 s     31x
  30k  hop2   11.71 s    678 ms     17x
  30k  hop3  197.38 s    4.86 s     41x

All existing omnigraph-engine tests pass (72/72, 0 failures).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:56:18 +03:00
..
omnigraph Dedupe dst ids before hydrating nodes in execute_expand (#45) 2026-04-25 00:56:18 +03:00
omnigraph-cli Prepare v0.3.0 release (#44) 2026-04-21 19:11:34 +03:00
omnigraph-compiler Prepare v0.3.0 release (#44) 2026-04-21 19:11:34 +03:00
omnigraph-server Prepare v0.3.0 release (#44) 2026-04-21 19:11:34 +03:00