Convert the SPARQL algebra evaluator from eager list-based evaluation to
lazy async generators so results stream incrementally. This lets Slice
terminate early (via generator cleanup) and avoids materialising full
result sets for streamable operators like Project, Filter, Union, and
Extend. Blocking operators (Join, LeftJoin, OrderBy, Group) materialise
at their boundary then yield.
Add bind join optimization for Join nodes where one side is small
(VALUES/ToMultiSet): instead of materialising both sides independently
and hash-joining, iterate the small side's bindings and evaluate the
large side with those bindings pre-seeded. This turns wildcard BGP
queries into selective ones — e.g. VALUES ?x { <uri> } joined with a
BGP now queries the triple store with ?x bound rather than fetching
all triples.
Add TriplesClient.query_gen() async generator that wraps the existing
streaming callback API via an asyncio.Queue bridge, yielding individual
Triple objects as batches arrive.
Add streaming request path in the SPARQL query service that batches
solutions from the live async generator and sends them as they fill.
Fix FILTER IN/NOT IN: rdflib represents these as RelationalExpression
nodes with op="IN", not as Builtin_IN — handle both representations.
Fix Builtin_IN/Builtin_NOTIN dispatch ordering so the specific handlers
are checked before the generic Builtin_ prefix match.
Fix VALUES handling for rdflib's two representations: positional
(var/value) and dict-based (res).
Add 30+ SPARQL 1.1 built-in functions and the MINUS algebra operator to the
custom SPARQL query backend.
String functions:
- SUBSTR (2-arg and 3-arg forms), STRBEFORE, STRAFTER
- REPLACE (regex with flags), ENCODE_FOR_URI
Numeric functions:
- FLOOR, CEIL, ROUND, ABS
Date/time accessors:
- YEAR, MONTH, DAY, HOURS, MINUTES, SECONDS
- NOW, TZ
Hash functions:
- MD5, SHA1, SHA256, SHA512
Term constructors:
- IRI/URI, BNODE, UUID, STRUUID
Other functions:
- LANGMATCHES, RAND
- EXISTS / NOT EXISTS (with async pre-evaluation to bridge the
sync expression evaluator and async algebra evaluator)
Algebra:
- MINUS set-difference operator
- HAVING already works via rdflib's Filter mapping (verified)
Fix SPARQL ORDER handling
Includes 653 lines of new unit tests covering all added functionality
across expressions, solutions, and algebra layers.
Fix threading of workspace paramater:
- The SPARQL algebra evaluator was threading a workspace parameter
through every function and passing it to TriplesClient.query(),
which doesn't accept it. Workspace isolation is handled by pub/sub
topic routing — the TriplesClient is already scoped to a
workspace-specific flow, same as GraphRAG. Passing workspace
explicitly was both incorrect and unnecessary.
Update tests:
- tests/unit/test_query/test_sparql_algebra.py (new) — Tests
_query_pattern, _eval_bgp, and evaluate() with various algebra
nodes. Key tests assert workspace is never in tc.query() kwargs,
plus correctness tests for BGP, JOIN, UNION, SLICE, DISTINCT, and
edge cases.
- tests/unit/test_retrieval/test_graph_rag.py — Added
test_triples_query_never_passes_workspace (checks query()) and
test_follow_edges_never_passes_workspace (checks query_stream()).