mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
SPARQL query service (#754)
SPARQL 1.1 query service wrapping pub/sub triples interface Add a backend-agnostic SPARQL query service that parses SPARQL queries using rdflib, decomposes them into triple pattern lookups via the existing TriplesClient pub/sub interface, and performs in-memory joins, filters, and projections. Includes: - SPARQL parser, algebra evaluator, expression evaluator, solution sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND, VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates) - FlowProcessor service with TriplesClientSpec - Gateway dispatcher, request/response translators, API spec - Python SDK method (FlowInstance.sparql_query) - CLI command (tg-invoke-sparql-query) - Tech spec (docs/tech-specs/sparql-query.md) New unit tests for SPARQL query
This commit is contained in:
parent
62c30a3a50
commit
d9dc4cbab5
23 changed files with 3498 additions and 3 deletions
4
Makefile
4
Makefile
|
|
@ -77,8 +77,8 @@ some-containers:
|
|||
-t ${CONTAINER_BASE}/trustgraph-base:${VERSION} .
|
||||
${DOCKER} build -f containers/Containerfile.flow \
|
||||
-t ${CONTAINER_BASE}/trustgraph-flow:${VERSION} .
|
||||
${DOCKER} build -f containers/Containerfile.unstructured \
|
||||
-t ${CONTAINER_BASE}/trustgraph-unstructured:${VERSION} .
|
||||
# ${DOCKER} build -f containers/Containerfile.unstructured \
|
||||
# -t ${CONTAINER_BASE}/trustgraph-unstructured:${VERSION} .
|
||||
# ${DOCKER} build -f containers/Containerfile.vertexai \
|
||||
# -t ${CONTAINER_BASE}/trustgraph-vertexai:${VERSION} .
|
||||
# ${DOCKER} build -f containers/Containerfile.mcp \
|
||||
|
|
|
|||
268
docs/tech-specs/sparql-query.md
Normal file
268
docs/tech-specs/sparql-query.md
Normal file
|
|
@ -0,0 +1,268 @@
|
|||
# SPARQL Query Service Technical Specification
|
||||
|
||||
## Overview
|
||||
|
||||
A pub/sub-hosted SPARQL query service that accepts SPARQL queries, decomposes
|
||||
them into triple pattern lookups via the existing triples query pub/sub
|
||||
interface, performs in-memory joins/filters/projections, and returns SPARQL
|
||||
result bindings.
|
||||
|
||||
This makes the triple store queryable using a standard graph query language
|
||||
without coupling to any specific backend (Neo4j, Cassandra, FalkorDB, etc.).
|
||||
|
||||
## Goals
|
||||
|
||||
- **SPARQL 1.1 support**: SELECT, ASK, CONSTRUCT, DESCRIBE queries
|
||||
- **Backend-agnostic**: query via the pub/sub triples interface, not direct
|
||||
database access
|
||||
- **Standard service pattern**: FlowProcessor with ConsumerSpec/ProducerSpec,
|
||||
using TriplesClientSpec to call the triples query service
|
||||
- **Correct SPARQL semantics**: proper BGP evaluation, joins, OPTIONAL, UNION,
|
||||
FILTER, BIND, aggregation, solution modifiers (ORDER BY, LIMIT, OFFSET,
|
||||
DISTINCT)
|
||||
|
||||
## Background
|
||||
|
||||
The triples query service provides a single-pattern lookup: given optional
|
||||
(s, p, o) values, return matching triples. This is the equivalent of one
|
||||
triple pattern in a SPARQL Basic Graph Pattern.
|
||||
|
||||
To evaluate a full SPARQL query, we need to:
|
||||
1. Parse the SPARQL string into an algebra tree
|
||||
2. Walk the algebra tree, issuing triple pattern lookups for each BGP pattern
|
||||
3. Join results across patterns (nested-loop or hash join)
|
||||
4. Apply filters, optionals, unions, and aggregations in-memory
|
||||
5. Project and return the requested variables
|
||||
|
||||
rdflib (already a dependency) provides a SPARQL 1.1 parser and algebra
|
||||
compiler. We use rdflib to parse queries into algebra trees, then evaluate
|
||||
the algebra ourselves using the triples query client as the data source.
|
||||
|
||||
## Technical Design
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
pub/sub
|
||||
[Client] ──request──> [SPARQL Query Service] ──triples-request──> [Triples Query Service]
|
||||
[Client] <─response── [SPARQL Query Service] <─triples-response── [Triples Query Service]
|
||||
```
|
||||
|
||||
The service is a FlowProcessor that:
|
||||
- Consumes SPARQL query requests
|
||||
- Uses TriplesClientSpec to issue triple pattern lookups
|
||||
- Evaluates the SPARQL algebra in-memory
|
||||
- Produces result responses
|
||||
|
||||
### Components
|
||||
|
||||
1. **SPARQL Query Service (FlowProcessor)**
|
||||
- ConsumerSpec for incoming SPARQL requests
|
||||
- ProducerSpec for outgoing results
|
||||
- TriplesClientSpec for calling the triples query service
|
||||
- Delegates parsing and evaluation to the components below
|
||||
|
||||
Module: `trustgraph-flow/trustgraph/query/sparql/service.py`
|
||||
|
||||
2. **SPARQL Parser (rdflib wrapper)**
|
||||
- Uses `rdflib.plugins.sparql.prepareQuery` / `parseQuery` and
|
||||
`rdflib.plugins.sparql.algebra.translateQuery` to produce an algebra tree
|
||||
- Extracts PREFIX declarations, query type (SELECT/ASK/CONSTRUCT/DESCRIBE),
|
||||
and the algebra root
|
||||
|
||||
Module: `trustgraph-flow/trustgraph/query/sparql/parser.py`
|
||||
|
||||
3. **Algebra Evaluator**
|
||||
- Recursive evaluator over the rdflib algebra tree
|
||||
- Each algebra node type maps to an evaluation function
|
||||
- BGP nodes issue triple pattern queries via TriplesClient
|
||||
- Join/Filter/Optional/Union etc. operate on in-memory solution sequences
|
||||
|
||||
Module: `trustgraph-flow/trustgraph/query/sparql/algebra.py`
|
||||
|
||||
4. **Solution Sequence**
|
||||
- A solution is a dict mapping variable names to Term values
|
||||
- Solution sequences are lists of solutions
|
||||
- Join: hash join on shared variables
|
||||
- LeftJoin (OPTIONAL): hash join preserving unmatched left rows
|
||||
- Union: concatenation
|
||||
- Filter: evaluate SPARQL expressions against each solution
|
||||
- Projection/Distinct/Order/Slice: standard post-processing
|
||||
|
||||
Module: `trustgraph-flow/trustgraph/query/sparql/solutions.py`
|
||||
|
||||
### Data Models
|
||||
|
||||
#### Request
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SparqlQueryRequest:
|
||||
user: str = ""
|
||||
collection: str = ""
|
||||
query: str = "" # SPARQL query string
|
||||
limit: int = 10000 # Safety limit on results
|
||||
```
|
||||
|
||||
#### Response
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SparqlQueryResponse:
|
||||
error: Error | None = None
|
||||
query_type: str = "" # "select", "ask", "construct", "describe"
|
||||
|
||||
# For SELECT queries
|
||||
variables: list[str] = field(default_factory=list)
|
||||
bindings: list[SparqlBinding] = field(default_factory=list)
|
||||
|
||||
# For ASK queries
|
||||
ask_result: bool = False
|
||||
|
||||
# For CONSTRUCT/DESCRIBE queries
|
||||
triples: list[Triple] = field(default_factory=list)
|
||||
|
||||
@dataclass
|
||||
class SparqlBinding:
|
||||
values: list[Term | None] = field(default_factory=list)
|
||||
```
|
||||
|
||||
### BGP Evaluation Strategy
|
||||
|
||||
For each triple pattern in a BGP:
|
||||
- Extract bound terms (concrete IRIs/literals) and variables
|
||||
- Call `TriplesClient.query_stream(s, p, o)` with bound terms, None for
|
||||
variables
|
||||
- Map returned triples back to variable bindings
|
||||
|
||||
For multi-pattern BGPs, join solutions incrementally:
|
||||
- Order patterns by selectivity (patterns with more bound terms first)
|
||||
- For each subsequent pattern, substitute bound variables from the current
|
||||
solution sequence before querying
|
||||
- This avoids full cross-products and reduces the number of triples queries
|
||||
|
||||
### Streaming and Early Termination
|
||||
|
||||
The triples query service supports streaming responses (batched delivery via
|
||||
`TriplesClient.query_stream`). The SPARQL evaluator should use streaming
|
||||
from the start, not as an optimisation. This is important because:
|
||||
|
||||
- **Early termination**: when the SPARQL query has a LIMIT, or when only one
|
||||
solution is needed (ASK queries), we can stop consuming triples as soon as
|
||||
we have enough results. Without streaming, a wildcard pattern like
|
||||
`?s ?p ?o` would fetch the entire graph before we could apply the limit.
|
||||
- **Memory efficiency**: results are processed batch-by-batch rather than
|
||||
materialising the full result set in memory before joining.
|
||||
|
||||
The batch callback in `query_stream` returns a boolean to signal completion.
|
||||
The evaluator should signal completion (return True) as soon as sufficient
|
||||
solutions have been produced, allowing the underlying pub/sub consumer to
|
||||
stop pulling batches.
|
||||
|
||||
### Parallel BGP Execution (Phase 2 Optimisation)
|
||||
|
||||
Within a BGP, patterns that share variables benefit from sequential
|
||||
evaluation with bound-variable substitution (query results from earlier
|
||||
patterns narrow later queries). However, patterns with no shared variables
|
||||
are independent and could be issued concurrently via `asyncio.gather`.
|
||||
|
||||
A practical approach for a future optimisation pass:
|
||||
- Analyse BGP patterns and identify connected components (groups of
|
||||
patterns linked by shared variables)
|
||||
- Execute independent components in parallel
|
||||
- Within each component, evaluate patterns sequentially with substitution
|
||||
|
||||
This is not needed for correctness -- the sequential approach works for all
|
||||
cases -- but could significantly reduce latency for queries with independent
|
||||
pattern groups. Flagged as a phase 2 optimisation.
|
||||
|
||||
### FILTER Expression Evaluation
|
||||
|
||||
rdflib's algebra represents FILTER expressions as expression trees. We
|
||||
evaluate these against each solution row, supporting:
|
||||
- Comparison operators (=, !=, <, >, <=, >=)
|
||||
- Logical operators (&&, ||, !)
|
||||
- SPARQL built-in functions (isIRI, isLiteral, isBlank, str, lang,
|
||||
datatype, bound, regex, etc.)
|
||||
- Arithmetic operators (+, -, *, /)
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Schema and service skeleton** -- define SparqlQueryRequest/Response
|
||||
dataclasses, create the FlowProcessor subclass with ConsumerSpec,
|
||||
ProducerSpec, and TriplesClientSpec wired up. Verify it starts and
|
||||
connects.
|
||||
|
||||
2. **SPARQL parsing** -- wrap rdflib's parser to produce algebra trees from
|
||||
SPARQL strings. Handle parse errors gracefully. Unit test with a range of
|
||||
query shapes.
|
||||
|
||||
3. **BGP evaluation** -- implement single-pattern and multi-pattern BGP
|
||||
evaluation using TriplesClient. This is the core building block. Test
|
||||
with simple SELECT WHERE { ?s ?p ?o } queries.
|
||||
|
||||
4. **Joins and solution sequences** -- implement hash join, left join (for
|
||||
OPTIONAL), and union. Test with multi-pattern queries.
|
||||
|
||||
5. **FILTER evaluation** -- implement the expression evaluator for FILTER
|
||||
clauses. Start with comparisons and logical operators, then add built-in
|
||||
functions incrementally.
|
||||
|
||||
6. **Solution modifiers** -- DISTINCT, ORDER BY, LIMIT, OFFSET, projection.
|
||||
|
||||
7. **ASK / CONSTRUCT / DESCRIBE** -- extend beyond SELECT. ASK is trivial
|
||||
(non-empty result = true). CONSTRUCT builds triples from a template.
|
||||
DESCRIBE fetches all triples for matched resources.
|
||||
|
||||
8. **Aggregation** -- GROUP BY, HAVING, COUNT, SUM, AVG, MIN, MAX,
|
||||
GROUP_CONCAT, SAMPLE.
|
||||
|
||||
9. **BIND, VALUES, subqueries** -- remaining SPARQL 1.1 features.
|
||||
|
||||
10. **API gateway integration** -- add SparqlQueryRequestor dispatcher,
|
||||
request/response translators, and API endpoint so that the SPARQL
|
||||
service is accessible via the HTTP gateway.
|
||||
|
||||
11. **SDK support** -- add `sparql_query()` method to FlowInstance in the
|
||||
Python API SDK, following the same pattern as `triples_query()`.
|
||||
|
||||
12. **CLI command** -- add a `tg-sparql-query` CLI command that takes a
|
||||
SPARQL query string (or reads from a file/stdin), submits it via the
|
||||
SDK, and prints results in a readable format (table for SELECT,
|
||||
true/false for ASK, Turtle for CONSTRUCT/DESCRIBE).
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
In-memory join over pub/sub round-trips will be slower than native SPARQL on
|
||||
a graph database. Key mitigations:
|
||||
|
||||
- **Streaming with early termination**: use `query_stream` so that
|
||||
limit-bound queries don't fetch entire result sets. A `SELECT ... LIMIT 1`
|
||||
against a wildcard pattern fetches one batch, not the whole graph.
|
||||
- **Bound-variable substitution**: when evaluating BGP patterns sequentially,
|
||||
substitute known bindings into subsequent patterns to issue narrow queries
|
||||
rather than broad ones followed by in-memory filtering.
|
||||
- **Parallel independent patterns** (phase 2): patterns with no shared
|
||||
variables can be issued concurrently.
|
||||
- **Query complexity limits**: may need a cap on the number of triple pattern
|
||||
queries issued per SPARQL query to prevent runaway evaluation.
|
||||
|
||||
### Named Graph Mapping
|
||||
|
||||
SPARQL's `GRAPH ?g { ... }` and `GRAPH <uri> { ... }` clauses map to the
|
||||
triples query service's graph filter parameter:
|
||||
|
||||
- `GRAPH <uri> { ?s ?p ?o }` — pass `g=uri` to the triples query
|
||||
- Patterns outside any GRAPH clause — pass `g=""` (default graph only)
|
||||
- `GRAPH ?g { ?s ?p ?o }` — pass `g="*"` (all graphs), then bind `?g` from
|
||||
the returned triple's graph field
|
||||
|
||||
The triples query interface does not support a wildcard graph natively in
|
||||
the SPARQL sense, but `g="*"` (all graphs) combined with client-side
|
||||
filtering on the returned graph values achieves the same effect.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **SPARQL 1.2**: rdflib's parser support for 1.2 features (property paths
|
||||
are already in 1.1; 1.2 adds lateral joins, ADJUST, etc.). Start with
|
||||
1.1 and extend as rdflib support matures.
|
||||
145
specs/api/paths/flow/sparql-query.yaml
Normal file
145
specs/api/paths/flow/sparql-query.yaml
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
post:
|
||||
tags:
|
||||
- Flow Services
|
||||
summary: SPARQL query - execute SPARQL 1.1 queries against the knowledge graph
|
||||
description: |
|
||||
Execute a SPARQL 1.1 query against the knowledge graph.
|
||||
|
||||
## Supported Query Types
|
||||
|
||||
- **SELECT**: Returns variable bindings as a table of results
|
||||
- **ASK**: Returns true/false for existence checks
|
||||
- **CONSTRUCT**: Returns a set of triples built from a template
|
||||
- **DESCRIBE**: Returns triples describing matched resources
|
||||
|
||||
## SPARQL Features
|
||||
|
||||
Supports standard SPARQL 1.1 features including:
|
||||
- Basic Graph Patterns (BGPs) with triple pattern matching
|
||||
- OPTIONAL, UNION, FILTER
|
||||
- BIND, VALUES
|
||||
- ORDER BY, LIMIT, OFFSET, DISTINCT
|
||||
- GROUP BY with aggregates (COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT)
|
||||
- Built-in functions (isIRI, STR, REGEX, CONTAINS, etc.)
|
||||
|
||||
## Query Examples
|
||||
|
||||
Find all entities of a type:
|
||||
```sparql
|
||||
SELECT ?s ?label WHERE {
|
||||
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.com/Person> .
|
||||
?s <http://www.w3.org/2000/01/rdf-schema#label> ?label .
|
||||
}
|
||||
LIMIT 10
|
||||
```
|
||||
|
||||
Check if an entity exists:
|
||||
```sparql
|
||||
ASK { <http://example.com/alice> ?p ?o }
|
||||
```
|
||||
|
||||
operationId: sparqlQueryService
|
||||
security:
|
||||
- bearerAuth: []
|
||||
parameters:
|
||||
- name: flow
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
description: Flow instance ID
|
||||
example: my-flow
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required:
|
||||
- query
|
||||
properties:
|
||||
query:
|
||||
type: string
|
||||
description: SPARQL 1.1 query string
|
||||
user:
|
||||
type: string
|
||||
default: trustgraph
|
||||
description: User/keyspace identifier
|
||||
collection:
|
||||
type: string
|
||||
default: default
|
||||
description: Collection identifier
|
||||
limit:
|
||||
type: integer
|
||||
default: 10000
|
||||
description: Safety limit on number of results
|
||||
examples:
|
||||
selectQuery:
|
||||
summary: SELECT query
|
||||
value:
|
||||
query: "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10"
|
||||
user: trustgraph
|
||||
collection: default
|
||||
askQuery:
|
||||
summary: ASK query
|
||||
value:
|
||||
query: "ASK { <http://example.com/alice> ?p ?o }"
|
||||
responses:
|
||||
'200':
|
||||
description: Successful response
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
query-type:
|
||||
type: string
|
||||
enum: [select, ask, construct, describe]
|
||||
variables:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: Variable names (SELECT only)
|
||||
bindings:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
properties:
|
||||
values:
|
||||
type: array
|
||||
items:
|
||||
$ref: '../../components/schemas/common/RdfValue.yaml'
|
||||
description: Result rows (SELECT only)
|
||||
ask-result:
|
||||
type: boolean
|
||||
description: Boolean result (ASK only)
|
||||
triples:
|
||||
type: array
|
||||
description: Result triples (CONSTRUCT/DESCRIBE only)
|
||||
error:
|
||||
type: object
|
||||
properties:
|
||||
type:
|
||||
type: string
|
||||
message:
|
||||
type: string
|
||||
examples:
|
||||
selectResult:
|
||||
summary: SELECT result
|
||||
value:
|
||||
query-type: select
|
||||
variables: [s, p, o]
|
||||
bindings:
|
||||
- values:
|
||||
- {t: i, i: "http://example.com/alice"}
|
||||
- {t: i, i: "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
|
||||
- {t: i, i: "http://example.com/Person"}
|
||||
askResult:
|
||||
summary: ASK result
|
||||
value:
|
||||
query-type: ask
|
||||
ask-result: true
|
||||
'401':
|
||||
$ref: '../../components/responses/Unauthorized.yaml'
|
||||
'500':
|
||||
$ref: '../../components/responses/Error.yaml'
|
||||
424
tests/unit/test_query/test_sparql_expressions.py
Normal file
424
tests/unit/test_query/test_sparql_expressions.py
Normal file
|
|
@ -0,0 +1,424 @@
|
|||
"""
|
||||
Tests for SPARQL FILTER expression evaluator.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.schema import Term, IRI, LITERAL, BLANK
|
||||
from trustgraph.query.sparql.expressions import (
|
||||
evaluate_expression, _effective_boolean, _to_string, _to_numeric,
|
||||
_comparable_value,
|
||||
)
|
||||
|
||||
|
||||
# --- Helpers ---
|
||||
|
||||
def iri(v):
|
||||
return Term(type=IRI, iri=v)
|
||||
|
||||
def lit(v, datatype="", language=""):
|
||||
return Term(type=LITERAL, value=v, datatype=datatype, language=language)
|
||||
|
||||
def blank(v):
|
||||
return Term(type=BLANK, id=v)
|
||||
|
||||
XSD = "http://www.w3.org/2001/XMLSchema#"
|
||||
|
||||
|
||||
class TestEvaluateExpression:
|
||||
"""Test expression evaluation with rdflib algebra nodes."""
|
||||
|
||||
def test_variable_bound(self):
|
||||
from rdflib.term import Variable
|
||||
result = evaluate_expression(Variable("x"), {"x": lit("hello")})
|
||||
assert result.value == "hello"
|
||||
|
||||
def test_variable_unbound(self):
|
||||
from rdflib.term import Variable
|
||||
result = evaluate_expression(Variable("x"), {})
|
||||
assert result is None
|
||||
|
||||
def test_uriref_constant(self):
|
||||
from rdflib import URIRef
|
||||
result = evaluate_expression(
|
||||
URIRef("http://example.com/a"), {}
|
||||
)
|
||||
assert result.type == IRI
|
||||
assert result.iri == "http://example.com/a"
|
||||
|
||||
def test_literal_constant(self):
|
||||
from rdflib import Literal
|
||||
result = evaluate_expression(Literal("hello"), {})
|
||||
assert result.type == LITERAL
|
||||
assert result.value == "hello"
|
||||
|
||||
def test_boolean_constant(self):
|
||||
assert evaluate_expression(True, {}) is True
|
||||
assert evaluate_expression(False, {}) is False
|
||||
|
||||
def test_numeric_constant(self):
|
||||
assert evaluate_expression(42, {}) == 42
|
||||
assert evaluate_expression(3.14, {}) == 3.14
|
||||
|
||||
def test_none_returns_true(self):
|
||||
assert evaluate_expression(None, {}) is True
|
||||
|
||||
|
||||
class TestRelationalExpressions:
|
||||
"""Test comparison operators via CompValue nodes."""
|
||||
|
||||
def _make_relational(self, left, op, right):
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
return CompValue("RelationalExpression",
|
||||
expr=left, op=op, other=right)
|
||||
|
||||
def test_equal_literals(self):
|
||||
from rdflib import Literal
|
||||
expr = self._make_relational(Literal("a"), "=", Literal("a"))
|
||||
assert evaluate_expression(expr, {}) is True
|
||||
|
||||
def test_not_equal_literals(self):
|
||||
from rdflib import Literal
|
||||
expr = self._make_relational(Literal("a"), "!=", Literal("b"))
|
||||
assert evaluate_expression(expr, {}) is True
|
||||
|
||||
def test_less_than(self):
|
||||
from rdflib import Literal
|
||||
expr = self._make_relational(Literal("a"), "<", Literal("b"))
|
||||
assert evaluate_expression(expr, {}) is True
|
||||
|
||||
def test_greater_than(self):
|
||||
from rdflib import Literal
|
||||
expr = self._make_relational(Literal("b"), ">", Literal("a"))
|
||||
assert evaluate_expression(expr, {}) is True
|
||||
|
||||
def test_equal_with_variables(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_relational(Variable("x"), "=", Variable("y"))
|
||||
sol = {"x": lit("same"), "y": lit("same")}
|
||||
assert evaluate_expression(expr, sol) is True
|
||||
|
||||
def test_unequal_with_variables(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_relational(Variable("x"), "=", Variable("y"))
|
||||
sol = {"x": lit("one"), "y": lit("two")}
|
||||
assert evaluate_expression(expr, sol) is False
|
||||
|
||||
def test_none_operand_returns_false(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_relational(Variable("x"), "=", Literal("a"))
|
||||
assert evaluate_expression(expr, {}) is False
|
||||
|
||||
|
||||
class TestLogicalExpressions:
|
||||
|
||||
def _make_and(self, exprs):
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
return CompValue("ConditionalAndExpression",
|
||||
expr=exprs[0], other=exprs[1:])
|
||||
|
||||
def _make_or(self, exprs):
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
return CompValue("ConditionalOrExpression",
|
||||
expr=exprs[0], other=exprs[1:])
|
||||
|
||||
def _make_not(self, expr):
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
return CompValue("UnaryNot", expr=expr)
|
||||
|
||||
def test_and_true_true(self):
|
||||
result = evaluate_expression(self._make_and([True, True]), {})
|
||||
assert result is True
|
||||
|
||||
def test_and_true_false(self):
|
||||
result = evaluate_expression(self._make_and([True, False]), {})
|
||||
assert result is False
|
||||
|
||||
def test_or_false_true(self):
|
||||
result = evaluate_expression(self._make_or([False, True]), {})
|
||||
assert result is True
|
||||
|
||||
def test_or_false_false(self):
|
||||
result = evaluate_expression(self._make_or([False, False]), {})
|
||||
assert result is False
|
||||
|
||||
def test_not_true(self):
|
||||
result = evaluate_expression(self._make_not(True), {})
|
||||
assert result is False
|
||||
|
||||
def test_not_false(self):
|
||||
result = evaluate_expression(self._make_not(False), {})
|
||||
assert result is True
|
||||
|
||||
|
||||
class TestBuiltinFunctions:
|
||||
|
||||
def _make_builtin(self, name, **kwargs):
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
return CompValue(f"Builtin_{name}", **kwargs)
|
||||
|
||||
def test_bound_true(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("BOUND", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": lit("hi")}) is True
|
||||
|
||||
def test_bound_false(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("BOUND", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {}) is False
|
||||
|
||||
def test_isiri_true(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isIRI", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": iri("http://x")}) is True
|
||||
|
||||
def test_isiri_false(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isIRI", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is False
|
||||
|
||||
def test_isliteral_true(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isLITERAL", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_isliteral_false(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isLITERAL", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": iri("http://x")}) is False
|
||||
|
||||
def test_isblank_true(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isBLANK", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": blank("b1")}) is True
|
||||
|
||||
def test_isblank_false(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("isBLANK", arg=Variable("x"))
|
||||
assert evaluate_expression(expr, {"x": iri("http://x")}) is False
|
||||
|
||||
def test_str(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("STR", arg=Variable("x"))
|
||||
result = evaluate_expression(expr, {"x": iri("http://example.com/a")})
|
||||
assert result.type == LITERAL
|
||||
assert result.value == "http://example.com/a"
|
||||
|
||||
def test_lang(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("LANG", arg=Variable("x"))
|
||||
result = evaluate_expression(
|
||||
expr, {"x": lit("hello", language="en")}
|
||||
)
|
||||
assert result.value == "en"
|
||||
|
||||
def test_lang_no_tag(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("LANG", arg=Variable("x"))
|
||||
result = evaluate_expression(expr, {"x": lit("hello")})
|
||||
assert result.value == ""
|
||||
|
||||
def test_datatype(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("DATATYPE", arg=Variable("x"))
|
||||
result = evaluate_expression(
|
||||
expr, {"x": lit("42", datatype=XSD + "integer")}
|
||||
)
|
||||
assert result.type == IRI
|
||||
assert result.iri == XSD + "integer"
|
||||
|
||||
def test_strlen(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("STRLEN", arg=Variable("x"))
|
||||
result = evaluate_expression(expr, {"x": lit("hello")})
|
||||
assert result == 5
|
||||
|
||||
def test_ucase(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("UCASE", arg=Variable("x"))
|
||||
result = evaluate_expression(expr, {"x": lit("hello")})
|
||||
assert result.value == "HELLO"
|
||||
|
||||
def test_lcase(self):
|
||||
from rdflib.term import Variable
|
||||
expr = self._make_builtin("LCASE", arg=Variable("x"))
|
||||
result = evaluate_expression(expr, {"x": lit("HELLO")})
|
||||
assert result.value == "hello"
|
||||
|
||||
def test_contains_true(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("CONTAINS",
|
||||
arg1=Variable("x"), arg2=Literal("ell"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_contains_false(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("CONTAINS",
|
||||
arg1=Variable("x"), arg2=Literal("xyz"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is False
|
||||
|
||||
def test_strstarts_true(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("STRSTARTS",
|
||||
arg1=Variable("x"), arg2=Literal("hel"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_strends_true(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("STRENDS",
|
||||
arg1=Variable("x"), arg2=Literal("llo"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_regex_match(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("REGEX",
|
||||
text=Variable("x"),
|
||||
pattern=Literal("^hel"),
|
||||
flags=None)
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_regex_case_insensitive(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("REGEX",
|
||||
text=Variable("x"),
|
||||
pattern=Literal("HELLO"),
|
||||
flags=Literal("i"))
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is True
|
||||
|
||||
def test_regex_no_match(self):
|
||||
from rdflib.term import Variable
|
||||
from rdflib import Literal
|
||||
expr = self._make_builtin("REGEX",
|
||||
text=Variable("x"),
|
||||
pattern=Literal("^world"),
|
||||
flags=None)
|
||||
assert evaluate_expression(expr, {"x": lit("hello")}) is False
|
||||
|
||||
|
||||
class TestEffectiveBoolean:
|
||||
|
||||
def test_true(self):
|
||||
assert _effective_boolean(True) is True
|
||||
|
||||
def test_false(self):
|
||||
assert _effective_boolean(False) is False
|
||||
|
||||
def test_none(self):
|
||||
assert _effective_boolean(None) is False
|
||||
|
||||
def test_nonzero_int(self):
|
||||
assert _effective_boolean(42) is True
|
||||
|
||||
def test_zero_int(self):
|
||||
assert _effective_boolean(0) is False
|
||||
|
||||
def test_nonempty_string(self):
|
||||
assert _effective_boolean("hello") is True
|
||||
|
||||
def test_empty_string(self):
|
||||
assert _effective_boolean("") is False
|
||||
|
||||
def test_iri_term(self):
|
||||
assert _effective_boolean(iri("http://x")) is True
|
||||
|
||||
def test_nonempty_literal(self):
|
||||
assert _effective_boolean(lit("hello")) is True
|
||||
|
||||
def test_empty_literal(self):
|
||||
assert _effective_boolean(lit("")) is False
|
||||
|
||||
def test_boolean_literal_true(self):
|
||||
assert _effective_boolean(
|
||||
lit("true", datatype=XSD + "boolean")
|
||||
) is True
|
||||
|
||||
def test_boolean_literal_false(self):
|
||||
assert _effective_boolean(
|
||||
lit("false", datatype=XSD + "boolean")
|
||||
) is False
|
||||
|
||||
def test_numeric_literal_nonzero(self):
|
||||
assert _effective_boolean(
|
||||
lit("42", datatype=XSD + "integer")
|
||||
) is True
|
||||
|
||||
def test_numeric_literal_zero(self):
|
||||
assert _effective_boolean(
|
||||
lit("0", datatype=XSD + "integer")
|
||||
) is False
|
||||
|
||||
|
||||
class TestToString:
|
||||
|
||||
def test_none(self):
|
||||
assert _to_string(None) == ""
|
||||
|
||||
def test_string(self):
|
||||
assert _to_string("hello") == "hello"
|
||||
|
||||
def test_iri_term(self):
|
||||
assert _to_string(iri("http://example.com")) == "http://example.com"
|
||||
|
||||
def test_literal_term(self):
|
||||
assert _to_string(lit("hello")) == "hello"
|
||||
|
||||
def test_blank_term(self):
|
||||
assert _to_string(blank("b1")) == "b1"
|
||||
|
||||
|
||||
class TestToNumeric:
|
||||
|
||||
def test_none(self):
|
||||
assert _to_numeric(None) is None
|
||||
|
||||
def test_int(self):
|
||||
assert _to_numeric(42) == 42
|
||||
|
||||
def test_float(self):
|
||||
assert _to_numeric(3.14) == 3.14
|
||||
|
||||
def test_integer_literal(self):
|
||||
assert _to_numeric(lit("42")) == 42
|
||||
|
||||
def test_decimal_literal(self):
|
||||
assert _to_numeric(lit("3.14")) == 3.14
|
||||
|
||||
def test_non_numeric_literal(self):
|
||||
assert _to_numeric(lit("hello")) is None
|
||||
|
||||
def test_numeric_string(self):
|
||||
assert _to_numeric("42") == 42
|
||||
|
||||
def test_non_numeric_string(self):
|
||||
assert _to_numeric("abc") is None
|
||||
|
||||
|
||||
class TestComparableValue:
|
||||
|
||||
def test_none(self):
|
||||
assert _comparable_value(None) == (0, "")
|
||||
|
||||
def test_int(self):
|
||||
assert _comparable_value(42) == (2, 42)
|
||||
|
||||
def test_iri(self):
|
||||
assert _comparable_value(iri("http://x")) == (4, "http://x")
|
||||
|
||||
def test_literal(self):
|
||||
assert _comparable_value(lit("hello")) == (3, "hello")
|
||||
|
||||
def test_numeric_literal(self):
|
||||
assert _comparable_value(lit("42")) == (2, 42)
|
||||
|
||||
def test_ordering(self):
|
||||
vals = [lit("b"), lit("a"), lit("c")]
|
||||
sorted_vals = sorted(vals, key=_comparable_value)
|
||||
assert sorted_vals[0].value == "a"
|
||||
assert sorted_vals[1].value == "b"
|
||||
assert sorted_vals[2].value == "c"
|
||||
205
tests/unit/test_query/test_sparql_parser.py
Normal file
205
tests/unit/test_query/test_sparql_parser.py
Normal file
|
|
@ -0,0 +1,205 @@
|
|||
"""
|
||||
Tests for the SPARQL parser module.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.query.sparql.parser import (
|
||||
parse_sparql, ParseError, rdflib_term_to_term, term_to_rdflib,
|
||||
)
|
||||
from trustgraph.schema import Term, IRI, LITERAL, BLANK
|
||||
|
||||
|
||||
class TestParseSparql:
|
||||
"""Tests for parse_sparql function."""
|
||||
|
||||
def test_select_query_type(self):
|
||||
parsed = parse_sparql("SELECT ?s ?p ?o WHERE { ?s ?p ?o }")
|
||||
assert parsed.query_type == "select"
|
||||
|
||||
def test_select_variables(self):
|
||||
parsed = parse_sparql("SELECT ?s ?p ?o WHERE { ?s ?p ?o }")
|
||||
assert parsed.variables == ["s", "p", "o"]
|
||||
|
||||
def test_select_subset_variables(self):
|
||||
parsed = parse_sparql("SELECT ?s ?o WHERE { ?s ?p ?o }")
|
||||
assert parsed.variables == ["s", "o"]
|
||||
|
||||
def test_ask_query_type(self):
|
||||
parsed = parse_sparql(
|
||||
"ASK { <http://example.com/a> ?p ?o }"
|
||||
)
|
||||
assert parsed.query_type == "ask"
|
||||
|
||||
def test_ask_no_variables(self):
|
||||
parsed = parse_sparql(
|
||||
"ASK { <http://example.com/a> ?p ?o }"
|
||||
)
|
||||
assert parsed.variables == []
|
||||
|
||||
def test_construct_query_type(self):
|
||||
parsed = parse_sparql(
|
||||
"CONSTRUCT { ?s <http://example.com/knows> ?o } "
|
||||
"WHERE { ?s <http://example.com/friendOf> ?o }"
|
||||
)
|
||||
assert parsed.query_type == "construct"
|
||||
|
||||
def test_describe_query_type(self):
|
||||
parsed = parse_sparql(
|
||||
"DESCRIBE <http://example.com/alice>"
|
||||
)
|
||||
assert parsed.query_type == "describe"
|
||||
|
||||
def test_select_with_limit(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT ?s WHERE { ?s ?p ?o } LIMIT 10"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
assert parsed.variables == ["s"]
|
||||
|
||||
def test_select_with_distinct(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT DISTINCT ?s WHERE { ?s ?p ?o }"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
assert parsed.variables == ["s"]
|
||||
|
||||
def test_select_with_filter(self):
|
||||
parsed = parse_sparql(
|
||||
'SELECT ?s ?label WHERE { '
|
||||
' ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label . '
|
||||
' FILTER(CONTAINS(STR(?label), "test")) '
|
||||
'}'
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
assert parsed.variables == ["s", "label"]
|
||||
|
||||
def test_select_with_optional(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT ?s ?p ?o ?label WHERE { "
|
||||
" ?s ?p ?o . "
|
||||
" OPTIONAL { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } "
|
||||
"}"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
assert set(parsed.variables) == {"s", "p", "o", "label"}
|
||||
|
||||
def test_select_with_union(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT ?s ?label WHERE { "
|
||||
" { ?s <http://example.com/name> ?label } "
|
||||
" UNION "
|
||||
" { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } "
|
||||
"}"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
|
||||
def test_select_with_order_by(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT ?s ?label WHERE { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label } "
|
||||
"ORDER BY ?label"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
|
||||
def test_select_with_group_by(self):
|
||||
parsed = parse_sparql(
|
||||
"SELECT ?p (COUNT(?o) AS ?count) WHERE { ?s ?p ?o } "
|
||||
"GROUP BY ?p ORDER BY DESC(?count)"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
|
||||
def test_select_with_prefixes(self):
|
||||
parsed = parse_sparql(
|
||||
"PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
|
||||
"SELECT ?s ?label WHERE { ?s rdfs:label ?label }"
|
||||
)
|
||||
assert parsed.query_type == "select"
|
||||
assert parsed.variables == ["s", "label"]
|
||||
|
||||
def test_algebra_not_none(self):
|
||||
parsed = parse_sparql("SELECT ?s WHERE { ?s ?p ?o }")
|
||||
assert parsed.algebra is not None
|
||||
|
||||
def test_parse_error_invalid_sparql(self):
|
||||
with pytest.raises(ParseError):
|
||||
parse_sparql("NOT VALID SPARQL AT ALL")
|
||||
|
||||
def test_parse_error_incomplete_query(self):
|
||||
with pytest.raises(ParseError):
|
||||
parse_sparql("SELECT ?s WHERE {")
|
||||
|
||||
def test_parse_error_message(self):
|
||||
with pytest.raises(ParseError, match="SPARQL parse error"):
|
||||
parse_sparql("GIBBERISH")
|
||||
|
||||
|
||||
class TestRdflibTermToTerm:
|
||||
"""Tests for rdflib-to-Term conversion."""
|
||||
|
||||
def test_uriref_to_term(self):
|
||||
from rdflib import URIRef
|
||||
term = rdflib_term_to_term(URIRef("http://example.com/alice"))
|
||||
assert term.type == IRI
|
||||
assert term.iri == "http://example.com/alice"
|
||||
|
||||
def test_literal_to_term(self):
|
||||
from rdflib import Literal
|
||||
term = rdflib_term_to_term(Literal("hello"))
|
||||
assert term.type == LITERAL
|
||||
assert term.value == "hello"
|
||||
|
||||
def test_typed_literal_to_term(self):
|
||||
from rdflib import Literal, URIRef
|
||||
term = rdflib_term_to_term(
|
||||
Literal("42", datatype=URIRef("http://www.w3.org/2001/XMLSchema#integer"))
|
||||
)
|
||||
assert term.type == LITERAL
|
||||
assert term.value == "42"
|
||||
assert term.datatype == "http://www.w3.org/2001/XMLSchema#integer"
|
||||
|
||||
def test_lang_literal_to_term(self):
|
||||
from rdflib import Literal
|
||||
term = rdflib_term_to_term(Literal("hello", lang="en"))
|
||||
assert term.type == LITERAL
|
||||
assert term.value == "hello"
|
||||
assert term.language == "en"
|
||||
|
||||
def test_bnode_to_term(self):
|
||||
from rdflib import BNode
|
||||
term = rdflib_term_to_term(BNode("b1"))
|
||||
assert term.type == BLANK
|
||||
assert term.id == "b1"
|
||||
|
||||
|
||||
class TestTermToRdflib:
|
||||
"""Tests for Term-to-rdflib conversion."""
|
||||
|
||||
def test_iri_term_to_uriref(self):
|
||||
from rdflib import URIRef
|
||||
result = term_to_rdflib(Term(type=IRI, iri="http://example.com/x"))
|
||||
assert isinstance(result, URIRef)
|
||||
assert str(result) == "http://example.com/x"
|
||||
|
||||
def test_literal_term_to_literal(self):
|
||||
from rdflib import Literal
|
||||
result = term_to_rdflib(Term(type=LITERAL, value="hello"))
|
||||
assert isinstance(result, Literal)
|
||||
assert str(result) == "hello"
|
||||
|
||||
def test_typed_literal_roundtrip(self):
|
||||
from rdflib import URIRef
|
||||
original = Term(
|
||||
type=LITERAL, value="42",
|
||||
datatype="http://www.w3.org/2001/XMLSchema#integer"
|
||||
)
|
||||
rdflib_term = term_to_rdflib(original)
|
||||
assert rdflib_term.datatype == URIRef("http://www.w3.org/2001/XMLSchema#integer")
|
||||
|
||||
def test_lang_literal_roundtrip(self):
|
||||
original = Term(type=LITERAL, value="bonjour", language="fr")
|
||||
rdflib_term = term_to_rdflib(original)
|
||||
assert rdflib_term.language == "fr"
|
||||
|
||||
def test_blank_term_to_bnode(self):
|
||||
from rdflib import BNode
|
||||
result = term_to_rdflib(Term(type=BLANK, id="b1"))
|
||||
assert isinstance(result, BNode)
|
||||
345
tests/unit/test_query/test_sparql_solutions.py
Normal file
345
tests/unit/test_query/test_sparql_solutions.py
Normal file
|
|
@ -0,0 +1,345 @@
|
|||
"""
|
||||
Tests for SPARQL solution sequence operations.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from trustgraph.schema import Term, IRI, LITERAL
|
||||
from trustgraph.query.sparql.solutions import (
|
||||
hash_join, left_join, union, project, distinct,
|
||||
order_by, slice_solutions, _terms_equal, _compatible,
|
||||
)
|
||||
|
||||
|
||||
# --- Test helpers ---
|
||||
|
||||
def iri(v):
|
||||
return Term(type=IRI, iri=v)
|
||||
|
||||
def lit(v):
|
||||
return Term(type=LITERAL, value=v)
|
||||
|
||||
|
||||
# --- Fixtures ---
|
||||
|
||||
@pytest.fixture
|
||||
def alice():
|
||||
return iri("http://example.com/alice")
|
||||
|
||||
@pytest.fixture
|
||||
def bob():
|
||||
return iri("http://example.com/bob")
|
||||
|
||||
@pytest.fixture
|
||||
def carol():
|
||||
return iri("http://example.com/carol")
|
||||
|
||||
@pytest.fixture
|
||||
def knows():
|
||||
return iri("http://example.com/knows")
|
||||
|
||||
@pytest.fixture
|
||||
def name_alice():
|
||||
return lit("Alice")
|
||||
|
||||
@pytest.fixture
|
||||
def name_bob():
|
||||
return lit("Bob")
|
||||
|
||||
|
||||
class TestTermsEqual:
|
||||
|
||||
def test_equal_iris(self):
|
||||
assert _terms_equal(iri("http://x.com/a"), iri("http://x.com/a"))
|
||||
|
||||
def test_unequal_iris(self):
|
||||
assert not _terms_equal(iri("http://x.com/a"), iri("http://x.com/b"))
|
||||
|
||||
def test_equal_literals(self):
|
||||
assert _terms_equal(lit("hello"), lit("hello"))
|
||||
|
||||
def test_unequal_literals(self):
|
||||
assert not _terms_equal(lit("hello"), lit("world"))
|
||||
|
||||
def test_iri_vs_literal(self):
|
||||
assert not _terms_equal(iri("hello"), lit("hello"))
|
||||
|
||||
def test_none_none(self):
|
||||
assert _terms_equal(None, None)
|
||||
|
||||
def test_none_vs_term(self):
|
||||
assert not _terms_equal(None, iri("http://x.com/a"))
|
||||
|
||||
|
||||
class TestCompatible:
|
||||
|
||||
def test_no_shared_variables(self):
|
||||
assert _compatible({"a": iri("http://x")}, {"b": iri("http://y")})
|
||||
|
||||
def test_shared_variable_same_value(self, alice):
|
||||
assert _compatible({"s": alice, "x": lit("1")}, {"s": alice, "y": lit("2")})
|
||||
|
||||
def test_shared_variable_different_value(self, alice, bob):
|
||||
assert not _compatible({"s": alice}, {"s": bob})
|
||||
|
||||
def test_empty_solutions(self):
|
||||
assert _compatible({}, {})
|
||||
|
||||
def test_empty_vs_nonempty(self, alice):
|
||||
assert _compatible({}, {"s": alice})
|
||||
|
||||
|
||||
class TestHashJoin:
|
||||
|
||||
def test_join_on_shared_variable(self, alice, bob, name_alice, name_bob):
|
||||
left = [
|
||||
{"s": alice, "p": iri("http://example.com/knows"), "o": bob},
|
||||
{"s": bob, "p": iri("http://example.com/knows"), "o": alice},
|
||||
]
|
||||
right = [
|
||||
{"s": alice, "label": name_alice},
|
||||
{"s": bob, "label": name_bob},
|
||||
]
|
||||
result = hash_join(left, right)
|
||||
assert len(result) == 2
|
||||
# Check that joined solutions have all variables
|
||||
for sol in result:
|
||||
assert "s" in sol
|
||||
assert "p" in sol
|
||||
assert "o" in sol
|
||||
assert "label" in sol
|
||||
|
||||
def test_join_no_shared_variables_cross_product(self, alice, bob):
|
||||
left = [{"a": alice}]
|
||||
right = [{"b": bob}, {"b": alice}]
|
||||
result = hash_join(left, right)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_join_no_matches(self, alice, bob):
|
||||
left = [{"s": alice}]
|
||||
right = [{"s": bob}]
|
||||
result = hash_join(left, right)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_join_empty_left(self, alice):
|
||||
result = hash_join([], [{"s": alice}])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_join_empty_right(self, alice):
|
||||
result = hash_join([{"s": alice}], [])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_join_multiple_matches(self, alice, name_alice):
|
||||
left = [
|
||||
{"s": alice, "p": iri("http://e.com/a")},
|
||||
{"s": alice, "p": iri("http://e.com/b")},
|
||||
]
|
||||
right = [{"s": alice, "label": name_alice}]
|
||||
result = hash_join(left, right)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_join_preserves_values(self, alice, name_alice):
|
||||
left = [{"s": alice, "x": lit("1")}]
|
||||
right = [{"s": alice, "y": lit("2")}]
|
||||
result = hash_join(left, right)
|
||||
assert len(result) == 1
|
||||
assert result[0]["x"].value == "1"
|
||||
assert result[0]["y"].value == "2"
|
||||
|
||||
|
||||
class TestLeftJoin:
|
||||
|
||||
def test_left_join_with_matches(self, alice, bob, name_alice):
|
||||
left = [{"s": alice}, {"s": bob}]
|
||||
right = [{"s": alice, "label": name_alice}]
|
||||
result = left_join(left, right)
|
||||
assert len(result) == 2
|
||||
# Alice has label
|
||||
alice_sols = [s for s in result if s["s"].iri == "http://example.com/alice"]
|
||||
assert len(alice_sols) == 1
|
||||
assert "label" in alice_sols[0]
|
||||
# Bob preserved without label
|
||||
bob_sols = [s for s in result if s["s"].iri == "http://example.com/bob"]
|
||||
assert len(bob_sols) == 1
|
||||
assert "label" not in bob_sols[0]
|
||||
|
||||
def test_left_join_no_matches(self, alice, bob):
|
||||
left = [{"s": alice}]
|
||||
right = [{"s": bob, "label": lit("Bob")}]
|
||||
result = left_join(left, right)
|
||||
assert len(result) == 1
|
||||
assert result[0]["s"].iri == "http://example.com/alice"
|
||||
assert "label" not in result[0]
|
||||
|
||||
def test_left_join_empty_right(self, alice):
|
||||
left = [{"s": alice}]
|
||||
result = left_join(left, [])
|
||||
assert len(result) == 1
|
||||
|
||||
def test_left_join_empty_left(self):
|
||||
result = left_join([], [{"s": iri("http://x")}])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_left_join_with_filter(self, alice, bob):
|
||||
left = [{"s": alice}, {"s": bob}]
|
||||
right = [
|
||||
{"s": alice, "val": lit("yes")},
|
||||
{"s": bob, "val": lit("no")},
|
||||
]
|
||||
# Filter: only keep joins where val == "yes"
|
||||
result = left_join(
|
||||
left, right,
|
||||
filter_fn=lambda sol: sol.get("val") and sol["val"].value == "yes"
|
||||
)
|
||||
assert len(result) == 2
|
||||
# Alice matches filter
|
||||
alice_sols = [s for s in result if s["s"].iri == "http://example.com/alice"]
|
||||
assert "val" in alice_sols[0]
|
||||
assert alice_sols[0]["val"].value == "yes"
|
||||
# Bob doesn't match filter, preserved without val
|
||||
bob_sols = [s for s in result if s["s"].iri == "http://example.com/bob"]
|
||||
assert "val" not in bob_sols[0]
|
||||
|
||||
|
||||
class TestUnion:
|
||||
|
||||
def test_union_concatenates(self, alice, bob):
|
||||
left = [{"s": alice}]
|
||||
right = [{"s": bob}]
|
||||
result = union(left, right)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_union_preserves_order(self, alice, bob):
|
||||
left = [{"s": alice}]
|
||||
right = [{"s": bob}]
|
||||
result = union(left, right)
|
||||
assert result[0]["s"].iri == "http://example.com/alice"
|
||||
assert result[1]["s"].iri == "http://example.com/bob"
|
||||
|
||||
def test_union_empty_left(self, alice):
|
||||
result = union([], [{"s": alice}])
|
||||
assert len(result) == 1
|
||||
|
||||
def test_union_both_empty(self):
|
||||
result = union([], [])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_union_allows_duplicates(self, alice):
|
||||
result = union([{"s": alice}], [{"s": alice}])
|
||||
assert len(result) == 2
|
||||
|
||||
|
||||
class TestProject:
|
||||
|
||||
def test_project_keeps_selected(self, alice, name_alice):
|
||||
solutions = [{"s": alice, "label": name_alice, "extra": lit("x")}]
|
||||
result = project(solutions, ["s", "label"])
|
||||
assert len(result) == 1
|
||||
assert "s" in result[0]
|
||||
assert "label" in result[0]
|
||||
assert "extra" not in result[0]
|
||||
|
||||
def test_project_missing_variable(self, alice):
|
||||
solutions = [{"s": alice}]
|
||||
result = project(solutions, ["s", "missing"])
|
||||
assert len(result) == 1
|
||||
assert "s" in result[0]
|
||||
assert "missing" not in result[0]
|
||||
|
||||
def test_project_empty(self):
|
||||
result = project([], ["s"])
|
||||
assert len(result) == 0
|
||||
|
||||
|
||||
class TestDistinct:
|
||||
|
||||
def test_removes_duplicates(self, alice):
|
||||
solutions = [{"s": alice}, {"s": alice}, {"s": alice}]
|
||||
result = distinct(solutions)
|
||||
assert len(result) == 1
|
||||
|
||||
def test_keeps_different(self, alice, bob):
|
||||
solutions = [{"s": alice}, {"s": bob}]
|
||||
result = distinct(solutions)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_empty(self):
|
||||
result = distinct([])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_multi_variable_distinct(self, alice, bob):
|
||||
solutions = [
|
||||
{"s": alice, "o": bob},
|
||||
{"s": alice, "o": bob},
|
||||
{"s": alice, "o": alice},
|
||||
]
|
||||
result = distinct(solutions)
|
||||
assert len(result) == 2
|
||||
|
||||
|
||||
class TestOrderBy:
|
||||
|
||||
def test_order_by_ascending(self):
|
||||
solutions = [
|
||||
{"label": lit("Charlie")},
|
||||
{"label": lit("Alice")},
|
||||
{"label": lit("Bob")},
|
||||
]
|
||||
key_fns = [(lambda sol: sol.get("label"), True)]
|
||||
result = order_by(solutions, key_fns)
|
||||
assert result[0]["label"].value == "Alice"
|
||||
assert result[1]["label"].value == "Bob"
|
||||
assert result[2]["label"].value == "Charlie"
|
||||
|
||||
def test_order_by_descending(self):
|
||||
solutions = [
|
||||
{"label": lit("Alice")},
|
||||
{"label": lit("Charlie")},
|
||||
{"label": lit("Bob")},
|
||||
]
|
||||
key_fns = [(lambda sol: sol.get("label"), False)]
|
||||
result = order_by(solutions, key_fns)
|
||||
assert result[0]["label"].value == "Charlie"
|
||||
assert result[1]["label"].value == "Bob"
|
||||
assert result[2]["label"].value == "Alice"
|
||||
|
||||
def test_order_by_empty(self):
|
||||
result = order_by([], [(lambda sol: sol.get("x"), True)])
|
||||
assert len(result) == 0
|
||||
|
||||
def test_order_by_no_keys(self, alice):
|
||||
solutions = [{"s": alice}]
|
||||
result = order_by(solutions, [])
|
||||
assert len(result) == 1
|
||||
|
||||
|
||||
class TestSlice:
|
||||
|
||||
def test_limit(self, alice, bob, carol):
|
||||
solutions = [{"s": alice}, {"s": bob}, {"s": carol}]
|
||||
result = slice_solutions(solutions, limit=2)
|
||||
assert len(result) == 2
|
||||
|
||||
def test_offset(self, alice, bob, carol):
|
||||
solutions = [{"s": alice}, {"s": bob}, {"s": carol}]
|
||||
result = slice_solutions(solutions, offset=1)
|
||||
assert len(result) == 2
|
||||
assert result[0]["s"].iri == "http://example.com/bob"
|
||||
|
||||
def test_offset_and_limit(self, alice, bob, carol):
|
||||
solutions = [{"s": alice}, {"s": bob}, {"s": carol}]
|
||||
result = slice_solutions(solutions, offset=1, limit=1)
|
||||
assert len(result) == 1
|
||||
assert result[0]["s"].iri == "http://example.com/bob"
|
||||
|
||||
def test_limit_zero(self, alice):
|
||||
result = slice_solutions([{"s": alice}], limit=0)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_offset_beyond_length(self, alice):
|
||||
result = slice_solutions([{"s": alice}], offset=10)
|
||||
assert len(result) == 0
|
||||
|
||||
def test_no_slice(self, alice, bob):
|
||||
solutions = [{"s": alice}, {"s": bob}]
|
||||
result = slice_solutions(solutions)
|
||||
assert len(result) == 2
|
||||
|
|
@ -1122,6 +1122,45 @@ class FlowInstance:
|
|||
|
||||
return result
|
||||
|
||||
def sparql_query(
|
||||
self, query, user="trustgraph", collection="default",
|
||||
limit=10000
|
||||
):
|
||||
"""
|
||||
Execute a SPARQL query against the knowledge graph.
|
||||
|
||||
Args:
|
||||
query: SPARQL 1.1 query string
|
||||
user: User/keyspace identifier (default: "trustgraph")
|
||||
collection: Collection identifier (default: "default")
|
||||
limit: Safety limit on results (default: 10000)
|
||||
|
||||
Returns:
|
||||
dict with query results. Structure depends on query type:
|
||||
- SELECT: {"query-type": "select", "variables": [...], "bindings": [...]}
|
||||
- ASK: {"query-type": "ask", "ask-result": bool}
|
||||
- CONSTRUCT/DESCRIBE: {"query-type": "construct", "triples": [...]}
|
||||
|
||||
Raises:
|
||||
ProtocolException: If an error occurs
|
||||
"""
|
||||
|
||||
input = {
|
||||
"query": query,
|
||||
"user": user,
|
||||
"collection": collection,
|
||||
"limit": limit,
|
||||
}
|
||||
|
||||
response = self.request("service/sparql", input)
|
||||
|
||||
if "error" in response and response["error"]:
|
||||
error_type = response["error"].get("type", "unknown")
|
||||
error_message = response["error"].get("message", "Unknown error")
|
||||
raise ProtocolException(f"{error_type}: {error_message}")
|
||||
|
||||
return response
|
||||
|
||||
def nlp_query(self, question, max_results=100):
|
||||
"""
|
||||
Convert a natural language question to a GraphQL query.
|
||||
|
|
|
|||
|
|
@ -27,6 +27,7 @@ from .translators.nlp_query import QuestionToStructuredQueryRequestTranslator, Q
|
|||
from .translators.structured_query import StructuredQueryRequestTranslator, StructuredQueryResponseTranslator
|
||||
from .translators.diagnosis import StructuredDataDiagnosisRequestTranslator, StructuredDataDiagnosisResponseTranslator
|
||||
from .translators.collection import CollectionManagementRequestTranslator, CollectionManagementResponseTranslator
|
||||
from .translators.sparql_query import SparqlQueryRequestTranslator, SparqlQueryResponseTranslator
|
||||
|
||||
# Register all service translators
|
||||
TranslatorRegistry.register_service(
|
||||
|
|
@ -149,6 +150,12 @@ TranslatorRegistry.register_service(
|
|||
CollectionManagementResponseTranslator()
|
||||
)
|
||||
|
||||
TranslatorRegistry.register_service(
|
||||
"sparql-query",
|
||||
SparqlQueryRequestTranslator(),
|
||||
SparqlQueryResponseTranslator()
|
||||
)
|
||||
|
||||
# Register single-direction translators for document loading
|
||||
TranslatorRegistry.register_request("document", DocumentTranslator())
|
||||
TranslatorRegistry.register_request("text-document", TextDocumentTranslator())
|
||||
|
|
|
|||
111
trustgraph-base/trustgraph/messaging/translators/sparql_query.py
Normal file
111
trustgraph-base/trustgraph/messaging/translators/sparql_query.py
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
from typing import Dict, Any, Tuple
|
||||
from ...schema import (
|
||||
SparqlQueryRequest, SparqlQueryResponse, SparqlBinding,
|
||||
Error, Term, Triple, IRI, LITERAL, BLANK,
|
||||
)
|
||||
from .base import MessageTranslator
|
||||
from .primitives import TermTranslator, TripleTranslator
|
||||
|
||||
|
||||
class SparqlQueryRequestTranslator(MessageTranslator):
|
||||
"""Translator for SparqlQueryRequest schema objects."""
|
||||
|
||||
def decode(self, data: Dict[str, Any]) -> SparqlQueryRequest:
|
||||
return SparqlQueryRequest(
|
||||
user=data.get("user", "trustgraph"),
|
||||
collection=data.get("collection", "default"),
|
||||
query=data.get("query", ""),
|
||||
limit=int(data.get("limit", 10000)),
|
||||
)
|
||||
|
||||
def encode(self, obj: SparqlQueryRequest) -> Dict[str, Any]:
|
||||
return {
|
||||
"user": obj.user,
|
||||
"collection": obj.collection,
|
||||
"query": obj.query,
|
||||
"limit": obj.limit,
|
||||
}
|
||||
|
||||
|
||||
class SparqlQueryResponseTranslator(MessageTranslator):
|
||||
"""Translator for SparqlQueryResponse schema objects."""
|
||||
|
||||
def __init__(self):
|
||||
self.term_translator = TermTranslator()
|
||||
self.triple_translator = TripleTranslator()
|
||||
|
||||
def decode(self, data: Dict[str, Any]) -> SparqlQueryResponse:
|
||||
raise NotImplementedError(
|
||||
"Response translation to schema not typically needed"
|
||||
)
|
||||
|
||||
def _encode_term(self, v):
|
||||
"""Encode a Term, handling both Term objects and dicts from
|
||||
pub/sub deserialization."""
|
||||
if v is None:
|
||||
return None
|
||||
if isinstance(v, dict):
|
||||
# Reconstruct Term from dict (pub/sub deserializes nested
|
||||
# dataclasses as dicts)
|
||||
term = Term(
|
||||
type=v.get("type", ""),
|
||||
iri=v.get("iri", ""),
|
||||
id=v.get("id", ""),
|
||||
value=v.get("value", ""),
|
||||
datatype=v.get("datatype", ""),
|
||||
language=v.get("language", ""),
|
||||
)
|
||||
return self.term_translator.encode(term)
|
||||
return self.term_translator.encode(v)
|
||||
|
||||
def _encode_error(self, error):
|
||||
"""Encode an Error, handling both Error objects and dicts."""
|
||||
if isinstance(error, dict):
|
||||
return {
|
||||
"type": error.get("type", ""),
|
||||
"message": error.get("message", ""),
|
||||
}
|
||||
return {
|
||||
"type": error.type,
|
||||
"message": error.message,
|
||||
}
|
||||
|
||||
def encode(self, obj: SparqlQueryResponse) -> Dict[str, Any]:
|
||||
result = {
|
||||
"query-type": obj.query_type,
|
||||
}
|
||||
|
||||
if obj.error:
|
||||
result["error"] = self._encode_error(obj.error)
|
||||
|
||||
if obj.query_type == "select":
|
||||
result["variables"] = obj.variables
|
||||
bindings = []
|
||||
for binding in obj.bindings:
|
||||
# binding may be a SparqlBinding or a dict
|
||||
if isinstance(binding, dict):
|
||||
values = binding.get("values", [])
|
||||
else:
|
||||
values = binding.values
|
||||
bindings.append({
|
||||
"values": [
|
||||
self._encode_term(v) for v in values
|
||||
]
|
||||
})
|
||||
result["bindings"] = bindings
|
||||
|
||||
elif obj.query_type == "ask":
|
||||
result["ask-result"] = obj.ask_result
|
||||
|
||||
elif obj.query_type in ("construct", "describe"):
|
||||
result["triples"] = [
|
||||
self.triple_translator.encode(t)
|
||||
for t in obj.triples
|
||||
]
|
||||
|
||||
return result
|
||||
|
||||
def encode_with_completion(
|
||||
self, obj: SparqlQueryResponse
|
||||
) -> Tuple[Dict[str, Any], bool]:
|
||||
return self.encode(obj), True
|
||||
|
|
@ -14,3 +14,4 @@ from .diagnosis import *
|
|||
from .collection import *
|
||||
from .storage import *
|
||||
from .tool_service import *
|
||||
from .sparql_query import *
|
||||
40
trustgraph-base/trustgraph/schema/services/sparql_query.py
Normal file
40
trustgraph-base/trustgraph/schema/services/sparql_query.py
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
from dataclasses import dataclass, field
|
||||
|
||||
from ..core.primitives import Error, Term, Triple
|
||||
from ..core.topic import queue
|
||||
|
||||
############################################################################
|
||||
|
||||
# SPARQL query
|
||||
|
||||
@dataclass
|
||||
class SparqlBinding:
|
||||
"""A single row of SPARQL SELECT results.
|
||||
Values are ordered to match the variables list in SparqlQueryResponse.
|
||||
"""
|
||||
values: list[Term | None] = field(default_factory=list)
|
||||
|
||||
@dataclass
|
||||
class SparqlQueryRequest:
|
||||
user: str = ""
|
||||
collection: str = ""
|
||||
query: str = "" # SPARQL query string
|
||||
limit: int = 10000 # Safety limit on results
|
||||
|
||||
@dataclass
|
||||
class SparqlQueryResponse:
|
||||
error: Error | None = None
|
||||
query_type: str = "" # "select", "ask", "construct", "describe"
|
||||
|
||||
# For SELECT queries
|
||||
variables: list[str] = field(default_factory=list)
|
||||
bindings: list[SparqlBinding] = field(default_factory=list)
|
||||
|
||||
# For ASK queries
|
||||
ask_result: bool = False
|
||||
|
||||
# For CONSTRUCT/DESCRIBE queries
|
||||
triples: list[Triple] = field(default_factory=list)
|
||||
|
||||
sparql_query_request_queue = queue('sparql-query', cls='request')
|
||||
sparql_query_response_queue = queue('sparql-query', cls='response')
|
||||
|
|
@ -51,6 +51,7 @@ tg-invoke-document-embeddings = "trustgraph.cli.invoke_document_embeddings:main"
|
|||
tg-invoke-mcp-tool = "trustgraph.cli.invoke_mcp_tool:main"
|
||||
tg-invoke-nlp-query = "trustgraph.cli.invoke_nlp_query:main"
|
||||
tg-invoke-rows-query = "trustgraph.cli.invoke_rows_query:main"
|
||||
tg-invoke-sparql-query = "trustgraph.cli.invoke_sparql_query:main"
|
||||
tg-invoke-row-embeddings = "trustgraph.cli.invoke_row_embeddings:main"
|
||||
tg-invoke-prompt = "trustgraph.cli.invoke_prompt:main"
|
||||
tg-invoke-structured-query = "trustgraph.cli.invoke_structured_query:main"
|
||||
|
|
|
|||
230
trustgraph-cli/trustgraph/cli/invoke_sparql_query.py
Normal file
230
trustgraph-cli/trustgraph/cli/invoke_sparql_query.py
Normal file
|
|
@ -0,0 +1,230 @@
|
|||
"""
|
||||
Execute a SPARQL query against the TrustGraph knowledge graph.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import json
|
||||
import sys
|
||||
from trustgraph.api import Api
|
||||
|
||||
default_url = os.getenv("TRUSTGRAPH_URL", 'http://localhost:8088/')
|
||||
default_user = 'trustgraph'
|
||||
default_collection = 'default'
|
||||
|
||||
|
||||
def format_select(response, output_format):
|
||||
"""Format SELECT query results."""
|
||||
variables = response.get("variables", [])
|
||||
bindings = response.get("bindings", [])
|
||||
|
||||
if not bindings:
|
||||
return "No results."
|
||||
|
||||
if output_format == "json":
|
||||
rows = []
|
||||
for binding in bindings:
|
||||
row = {}
|
||||
for var, val in zip(variables, binding.get("values", [])):
|
||||
if val is None:
|
||||
row[var] = None
|
||||
elif val.get("t") == "i":
|
||||
row[var] = val.get("i", "")
|
||||
elif val.get("t") == "l":
|
||||
row[var] = val.get("v", "")
|
||||
else:
|
||||
row[var] = val.get("v", val.get("i", ""))
|
||||
rows.append(row)
|
||||
return json.dumps(rows, indent=2)
|
||||
|
||||
# Table format
|
||||
col_widths = [len(v) for v in variables]
|
||||
rows = []
|
||||
for binding in bindings:
|
||||
row = []
|
||||
for i, val in enumerate(binding.get("values", [])):
|
||||
if val is None:
|
||||
cell = ""
|
||||
elif val.get("t") == "i":
|
||||
cell = val.get("i", "")
|
||||
elif val.get("t") == "l":
|
||||
cell = val.get("v", "")
|
||||
else:
|
||||
cell = val.get("v", val.get("i", ""))
|
||||
row.append(cell)
|
||||
if i < len(col_widths):
|
||||
col_widths[i] = max(col_widths[i], len(cell))
|
||||
rows.append(row)
|
||||
|
||||
# Build table
|
||||
header = " | ".join(
|
||||
v.ljust(col_widths[i]) for i, v in enumerate(variables)
|
||||
)
|
||||
separator = "-+-".join("-" * w for w in col_widths)
|
||||
lines = [header, separator]
|
||||
for row in rows:
|
||||
line = " | ".join(
|
||||
cell.ljust(col_widths[i]) if i < len(col_widths) else cell
|
||||
for i, cell in enumerate(row)
|
||||
)
|
||||
lines.append(line)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def format_triples(response, output_format):
|
||||
"""Format CONSTRUCT/DESCRIBE results."""
|
||||
triples = response.get("triples", [])
|
||||
|
||||
if not triples:
|
||||
return "No triples."
|
||||
|
||||
if output_format == "json":
|
||||
return json.dumps(triples, indent=2)
|
||||
|
||||
lines = []
|
||||
for t in triples:
|
||||
s = _term_str(t.get("s"))
|
||||
p = _term_str(t.get("p"))
|
||||
o = _term_str(t.get("o"))
|
||||
lines.append(f"{s} {p} {o} .")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _term_str(val):
|
||||
"""Convert a wire-format term to a display string."""
|
||||
if val is None:
|
||||
return "?"
|
||||
t = val.get("t", "")
|
||||
if t == "i":
|
||||
return f"<{val.get('i', '')}>"
|
||||
elif t == "l":
|
||||
v = val.get("v", "")
|
||||
dt = val.get("d", "")
|
||||
lang = val.get("l", "")
|
||||
if lang:
|
||||
return f'"{v}"@{lang}'
|
||||
elif dt:
|
||||
return f'"{v}"^^<{dt}>'
|
||||
return f'"{v}"'
|
||||
return str(val)
|
||||
|
||||
|
||||
def sparql_query(url, token, flow_id, query, user, collection, limit,
|
||||
output_format):
|
||||
|
||||
api = Api(url=url, token=token).flow().id(flow_id)
|
||||
|
||||
resp = api.sparql_query(
|
||||
query=query,
|
||||
user=user,
|
||||
collection=collection,
|
||||
limit=limit,
|
||||
)
|
||||
|
||||
query_type = resp.get("query-type", "select")
|
||||
|
||||
if query_type == "select":
|
||||
print(format_select(resp, output_format))
|
||||
elif query_type == "ask":
|
||||
print("true" if resp.get("ask-result") else "false")
|
||||
elif query_type in ("construct", "describe"):
|
||||
print(format_triples(resp, output_format))
|
||||
else:
|
||||
print(json.dumps(resp, indent=2))
|
||||
|
||||
|
||||
def main():
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='tg-invoke-sparql-query',
|
||||
description=__doc__,
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-u', '--url',
|
||||
default=default_url,
|
||||
help=f'API URL (default: {default_url})',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-t', '--token',
|
||||
default=os.getenv("TRUSTGRAPH_TOKEN"),
|
||||
help='API bearer token (default: TRUSTGRAPH_TOKEN env var)',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-f', '--flow-id',
|
||||
default="default",
|
||||
help='Flow ID (default: default)',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-q', '--query',
|
||||
help='SPARQL query string',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-i', '--input',
|
||||
help='Read SPARQL query from file (use - for stdin)',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-U', '--user',
|
||||
default=default_user,
|
||||
help=f'User ID (default: {default_user})',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-C', '--collection',
|
||||
default=default_collection,
|
||||
help=f'Collection ID (default: {default_collection})',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-l', '--limit',
|
||||
type=int,
|
||||
default=10000,
|
||||
help='Result limit (default: 10000)',
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--format',
|
||||
choices=['table', 'json'],
|
||||
default='table',
|
||||
help='Output format (default: table)',
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Get query from argument or file
|
||||
query = args.query
|
||||
if not query and args.input:
|
||||
if args.input == '-':
|
||||
query = sys.stdin.read()
|
||||
else:
|
||||
with open(args.input) as f:
|
||||
query = f.read()
|
||||
|
||||
if not query:
|
||||
parser.error("Either -q/--query or -i/--input is required")
|
||||
|
||||
try:
|
||||
|
||||
sparql_query(
|
||||
url=args.url,
|
||||
token=args.token,
|
||||
flow_id=args.flow_id,
|
||||
query=query,
|
||||
user=args.user,
|
||||
collection=args.collection,
|
||||
limit=args.limit,
|
||||
output_format=args.format,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Exception: {e}", flush=True, file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -101,6 +101,7 @@ pdf-ocr-mistral = "trustgraph.decoding.mistral_ocr:run"
|
|||
prompt-template = "trustgraph.prompt.template:run"
|
||||
rev-gateway = "trustgraph.rev_gateway:run"
|
||||
run-processing = "trustgraph.processing:run"
|
||||
sparql-query = "trustgraph.query.sparql:run"
|
||||
structured-query = "trustgraph.retrieval.structured_query:run"
|
||||
structured-diag = "trustgraph.retrieval.structured_diag:run"
|
||||
text-completion-azure = "trustgraph.model.text_completion.azure:run"
|
||||
|
|
|
|||
|
|
@ -22,6 +22,7 @@ from . document_rag import DocumentRagRequestor
|
|||
from . triples_query import TriplesQueryRequestor
|
||||
from . rows_query import RowsQueryRequestor
|
||||
from . nlp_query import NLPQueryRequestor
|
||||
from . sparql_query import SparqlQueryRequestor
|
||||
from . structured_query import StructuredQueryRequestor
|
||||
from . structured_diag import StructuredDiagRequestor
|
||||
from . embeddings import EmbeddingsRequestor
|
||||
|
|
@ -65,6 +66,7 @@ request_response_dispatchers = {
|
|||
"structured-query": StructuredQueryRequestor,
|
||||
"structured-diag": StructuredDiagRequestor,
|
||||
"row-embeddings": RowEmbeddingsQueryRequestor,
|
||||
"sparql": SparqlQueryRequestor,
|
||||
}
|
||||
|
||||
global_dispatchers = {
|
||||
|
|
|
|||
30
trustgraph-flow/trustgraph/gateway/dispatch/sparql_query.py
Normal file
30
trustgraph-flow/trustgraph/gateway/dispatch/sparql_query.py
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
from ... schema import SparqlQueryRequest, SparqlQueryResponse
|
||||
from ... messaging import TranslatorRegistry
|
||||
|
||||
from . requestor import ServiceRequestor
|
||||
|
||||
class SparqlQueryRequestor(ServiceRequestor):
|
||||
def __init__(
|
||||
self, backend, request_queue, response_queue, timeout,
|
||||
consumer, subscriber,
|
||||
):
|
||||
|
||||
super(SparqlQueryRequestor, self).__init__(
|
||||
backend=backend,
|
||||
request_queue=request_queue,
|
||||
response_queue=response_queue,
|
||||
request_schema=SparqlQueryRequest,
|
||||
response_schema=SparqlQueryResponse,
|
||||
subscription = subscriber,
|
||||
consumer_name = consumer,
|
||||
timeout=timeout,
|
||||
)
|
||||
|
||||
self.request_translator = TranslatorRegistry.get_request_translator("sparql-query")
|
||||
self.response_translator = TranslatorRegistry.get_response_translator("sparql-query")
|
||||
|
||||
def to_request(self, body):
|
||||
return self.request_translator.decode(body)
|
||||
|
||||
def from_response(self, message):
|
||||
return self.response_translator.encode_with_completion(message)
|
||||
1
trustgraph-flow/trustgraph/query/sparql/__init__.py
Normal file
1
trustgraph-flow/trustgraph/query/sparql/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
|||
from . service import *
|
||||
6
trustgraph-flow/trustgraph/query/sparql/__main__.py
Normal file
6
trustgraph-flow/trustgraph/query/sparql/__main__.py
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
#!/usr/bin/env python3
|
||||
|
||||
from . service import run
|
||||
|
||||
if __name__ == '__main__':
|
||||
run()
|
||||
541
trustgraph-flow/trustgraph/query/sparql/algebra.py
Normal file
541
trustgraph-flow/trustgraph/query/sparql/algebra.py
Normal file
|
|
@ -0,0 +1,541 @@
|
|||
"""
|
||||
SPARQL algebra evaluator.
|
||||
|
||||
Recursively evaluates an rdflib SPARQL algebra tree by issuing triple
|
||||
pattern queries via TriplesClient (streaming) and performing in-memory
|
||||
joins, filters, and projections.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from collections import defaultdict
|
||||
|
||||
from rdflib.term import Variable, URIRef, Literal, BNode
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
|
||||
from ... schema import Term, Triple, IRI, LITERAL, BLANK
|
||||
from ... knowledge import Uri
|
||||
from ... knowledge import Literal as KgLiteral
|
||||
from . parser import rdflib_term_to_term
|
||||
from . solutions import (
|
||||
hash_join, left_join, union, project, distinct,
|
||||
order_by, slice_solutions, _term_key,
|
||||
)
|
||||
from . expressions import evaluate_expression, _effective_boolean
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class EvaluationError(Exception):
|
||||
"""Raised when SPARQL evaluation fails."""
|
||||
pass
|
||||
|
||||
|
||||
async def evaluate(node, triples_client, user, collection, limit=10000):
|
||||
"""
|
||||
Evaluate a SPARQL algebra node.
|
||||
|
||||
Args:
|
||||
node: rdflib CompValue algebra node
|
||||
triples_client: TriplesClient instance for triple pattern queries
|
||||
user: user/keyspace identifier
|
||||
collection: collection identifier
|
||||
limit: safety limit on results
|
||||
|
||||
Returns:
|
||||
list of solutions (dicts mapping variable names to Term values)
|
||||
"""
|
||||
if not isinstance(node, CompValue):
|
||||
logger.warning(f"Expected CompValue, got {type(node)}: {node}")
|
||||
return [{}]
|
||||
|
||||
name = node.name
|
||||
handler = _HANDLERS.get(name)
|
||||
|
||||
if handler is None:
|
||||
logger.warning(f"Unsupported algebra node: {name}")
|
||||
return [{}]
|
||||
|
||||
return await handler(node, triples_client, user, collection, limit)
|
||||
|
||||
|
||||
# --- Node handlers ---
|
||||
|
||||
async def _eval_select_query(node, tc, user, collection, limit):
|
||||
"""Evaluate a SelectQuery node."""
|
||||
return await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
|
||||
async def _eval_project(node, tc, user, collection, limit):
|
||||
"""Evaluate a Project node (SELECT variable projection)."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
variables = [str(v) for v in node.PV]
|
||||
return project(solutions, variables)
|
||||
|
||||
|
||||
async def _eval_bgp(node, tc, user, collection, limit):
|
||||
"""
|
||||
Evaluate a Basic Graph Pattern.
|
||||
|
||||
Issues streaming triple pattern queries and joins results. Patterns
|
||||
are ordered by selectivity (more bound terms first) and evaluated
|
||||
sequentially with bound-variable substitution.
|
||||
"""
|
||||
triples = node.triples
|
||||
if not triples:
|
||||
return [{}]
|
||||
|
||||
# Sort patterns by selectivity: more bound terms = more selective
|
||||
def selectivity(pattern):
|
||||
return sum(1 for t in pattern if not isinstance(t, Variable))
|
||||
|
||||
sorted_patterns = sorted(
|
||||
enumerate(triples), key=lambda x: -selectivity(x[1])
|
||||
)
|
||||
|
||||
solutions = [{}]
|
||||
|
||||
for _, pattern in sorted_patterns:
|
||||
s_tmpl, p_tmpl, o_tmpl = pattern
|
||||
|
||||
new_solutions = []
|
||||
|
||||
for sol in solutions:
|
||||
# Substitute known bindings into the pattern
|
||||
s_val = _resolve_term(s_tmpl, sol)
|
||||
p_val = _resolve_term(p_tmpl, sol)
|
||||
o_val = _resolve_term(o_tmpl, sol)
|
||||
|
||||
# Query the triples store
|
||||
results = await _query_pattern(
|
||||
tc, s_val, p_val, o_val, user, collection, limit
|
||||
)
|
||||
|
||||
# Map results back to variable bindings,
|
||||
# converting Uri/Literal to Term objects
|
||||
for triple in results:
|
||||
binding = dict(sol)
|
||||
if isinstance(s_tmpl, Variable):
|
||||
binding[str(s_tmpl)] = _to_term(triple.s)
|
||||
if isinstance(p_tmpl, Variable):
|
||||
binding[str(p_tmpl)] = _to_term(triple.p)
|
||||
if isinstance(o_tmpl, Variable):
|
||||
binding[str(o_tmpl)] = _to_term(triple.o)
|
||||
new_solutions.append(binding)
|
||||
|
||||
solutions = new_solutions
|
||||
|
||||
if not solutions:
|
||||
break
|
||||
|
||||
return solutions[:limit]
|
||||
|
||||
|
||||
async def _eval_join(node, tc, user, collection, limit):
|
||||
"""Evaluate a Join node."""
|
||||
left = await evaluate(node.p1, tc, user, collection, limit)
|
||||
right = await evaluate(node.p2, tc, user, collection, limit)
|
||||
return hash_join(left, right)[:limit]
|
||||
|
||||
|
||||
async def _eval_left_join(node, tc, user, collection, limit):
|
||||
"""Evaluate a LeftJoin node (OPTIONAL)."""
|
||||
left_sols = await evaluate(node.p1, tc, user, collection, limit)
|
||||
right_sols = await evaluate(node.p2, tc, user, collection, limit)
|
||||
|
||||
filter_fn = None
|
||||
if hasattr(node, "expr") and node.expr is not None:
|
||||
expr = node.expr
|
||||
if not (isinstance(expr, CompValue) and expr.name == "TrueFilter"):
|
||||
filter_fn = lambda sol: _effective_boolean(
|
||||
evaluate_expression(expr, sol)
|
||||
)
|
||||
|
||||
return left_join(left_sols, right_sols, filter_fn)[:limit]
|
||||
|
||||
|
||||
async def _eval_union(node, tc, user, collection, limit):
|
||||
"""Evaluate a Union node."""
|
||||
left = await evaluate(node.p1, tc, user, collection, limit)
|
||||
right = await evaluate(node.p2, tc, user, collection, limit)
|
||||
return union(left, right)[:limit]
|
||||
|
||||
|
||||
async def _eval_filter(node, tc, user, collection, limit):
|
||||
"""Evaluate a Filter node."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
expr = node.expr
|
||||
return [
|
||||
sol for sol in solutions
|
||||
if _effective_boolean(evaluate_expression(expr, sol))
|
||||
]
|
||||
|
||||
|
||||
async def _eval_distinct(node, tc, user, collection, limit):
|
||||
"""Evaluate a Distinct node."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
return distinct(solutions)
|
||||
|
||||
|
||||
async def _eval_reduced(node, tc, user, collection, limit):
|
||||
"""Evaluate a Reduced node (like Distinct but implementation-defined)."""
|
||||
# Treat same as Distinct
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
return distinct(solutions)
|
||||
|
||||
|
||||
async def _eval_order_by(node, tc, user, collection, limit):
|
||||
"""Evaluate an OrderBy node."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
key_fns = []
|
||||
for cond in node.expr:
|
||||
if isinstance(cond, CompValue) and cond.name == "OrderCondition":
|
||||
ascending = cond.order != "DESC"
|
||||
expr = cond.expr
|
||||
key_fns.append((
|
||||
lambda sol, e=expr: evaluate_expression(e, sol),
|
||||
ascending,
|
||||
))
|
||||
else:
|
||||
# Simple variable or expression
|
||||
key_fns.append((
|
||||
lambda sol, e=cond: evaluate_expression(e, sol),
|
||||
True,
|
||||
))
|
||||
|
||||
return order_by(solutions, key_fns)
|
||||
|
||||
|
||||
async def _eval_slice(node, tc, user, collection, limit):
|
||||
"""Evaluate a Slice node (LIMIT/OFFSET)."""
|
||||
# Pass tighter limit downstream if possible
|
||||
inner_limit = limit
|
||||
if node.length is not None:
|
||||
offset = node.start or 0
|
||||
inner_limit = min(limit, offset + node.length)
|
||||
|
||||
solutions = await evaluate(node.p, tc, user, collection, inner_limit)
|
||||
return slice_solutions(solutions, node.start or 0, node.length)
|
||||
|
||||
|
||||
async def _eval_extend(node, tc, user, collection, limit):
|
||||
"""Evaluate an Extend node (BIND)."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
var_name = str(node.var)
|
||||
expr = node.expr
|
||||
|
||||
result = []
|
||||
for sol in solutions:
|
||||
val = evaluate_expression(expr, sol)
|
||||
new_sol = dict(sol)
|
||||
if isinstance(val, Term):
|
||||
new_sol[var_name] = val
|
||||
elif isinstance(val, (int, float)):
|
||||
new_sol[var_name] = Term(type=LITERAL, value=str(val))
|
||||
elif isinstance(val, str):
|
||||
new_sol[var_name] = Term(type=LITERAL, value=val)
|
||||
elif isinstance(val, bool):
|
||||
new_sol[var_name] = Term(
|
||||
type=LITERAL, value=str(val).lower(),
|
||||
datatype="http://www.w3.org/2001/XMLSchema#boolean"
|
||||
)
|
||||
elif val is not None:
|
||||
new_sol[var_name] = Term(type=LITERAL, value=str(val))
|
||||
result.append(new_sol)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def _eval_group(node, tc, user, collection, limit):
|
||||
"""Evaluate a Group node (GROUP BY with aggregation)."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
# Extract grouping expressions
|
||||
group_exprs = []
|
||||
if hasattr(node, "expr") and node.expr:
|
||||
for expr in node.expr:
|
||||
if isinstance(expr, CompValue) and expr.name == "GroupAs":
|
||||
group_exprs.append((expr.expr, str(expr.var) if hasattr(expr, "var") and expr.var else None))
|
||||
elif isinstance(expr, Variable):
|
||||
group_exprs.append((expr, str(expr)))
|
||||
else:
|
||||
group_exprs.append((expr, None))
|
||||
|
||||
# Group solutions
|
||||
groups = defaultdict(list)
|
||||
for sol in solutions:
|
||||
key_parts = []
|
||||
for expr, _ in group_exprs:
|
||||
val = evaluate_expression(expr, sol)
|
||||
key_parts.append(_term_key(val) if isinstance(val, Term) else val)
|
||||
groups[tuple(key_parts)].append(sol)
|
||||
|
||||
if not group_exprs:
|
||||
# No GROUP BY - entire result is one group
|
||||
groups[()].extend(solutions)
|
||||
|
||||
# Build grouped solutions (one per group)
|
||||
result = []
|
||||
for key, group_sols in groups.items():
|
||||
sol = {}
|
||||
# Include group key variables
|
||||
if group_sols:
|
||||
for (expr, var_name), k in zip(group_exprs, key):
|
||||
if var_name and group_sols:
|
||||
sol[var_name] = evaluate_expression(expr, group_sols[0])
|
||||
sol["__group__"] = group_sols
|
||||
result.append(sol)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def _eval_aggregate_join(node, tc, user, collection, limit):
|
||||
"""Evaluate an AggregateJoin (aggregation functions after GROUP BY)."""
|
||||
solutions = await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
result = []
|
||||
for sol in solutions:
|
||||
group = sol.get("__group__", [sol])
|
||||
new_sol = {k: v for k, v in sol.items() if k != "__group__"}
|
||||
|
||||
# Apply aggregate functions
|
||||
if hasattr(node, "A") and node.A:
|
||||
for agg in node.A:
|
||||
var_name = str(agg.res)
|
||||
agg_val = _compute_aggregate(agg, group)
|
||||
new_sol[var_name] = agg_val
|
||||
|
||||
result.append(new_sol)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
async def _eval_graph(node, tc, user, collection, limit):
|
||||
"""Evaluate a Graph node (GRAPH clause)."""
|
||||
term = node.term
|
||||
|
||||
if isinstance(term, URIRef):
|
||||
# GRAPH <uri> { ... } — fixed graph
|
||||
# We'd need to pass graph to triples queries
|
||||
# For now, evaluate inner pattern normally
|
||||
logger.info(f"GRAPH <{term}> clause - graph filtering not yet wired")
|
||||
return await evaluate(node.p, tc, user, collection, limit)
|
||||
elif isinstance(term, Variable):
|
||||
# GRAPH ?g { ... } — variable graph
|
||||
logger.info(f"GRAPH ?{term} clause - variable graph not yet wired")
|
||||
return await evaluate(node.p, tc, user, collection, limit)
|
||||
else:
|
||||
return await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
|
||||
async def _eval_values(node, tc, user, collection, limit):
|
||||
"""Evaluate a VALUES clause (inline data)."""
|
||||
variables = [str(v) for v in node.var]
|
||||
solutions = []
|
||||
|
||||
for row in node.value:
|
||||
sol = {}
|
||||
for var_name, val in zip(variables, row):
|
||||
if val is not None and str(val) != "UNDEF":
|
||||
sol[var_name] = rdflib_term_to_term(val)
|
||||
solutions.append(sol)
|
||||
|
||||
return solutions
|
||||
|
||||
|
||||
async def _eval_to_multiset(node, tc, user, collection, limit):
|
||||
"""Evaluate a ToMultiSet node (subquery)."""
|
||||
return await evaluate(node.p, tc, user, collection, limit)
|
||||
|
||||
|
||||
# --- Aggregate computation ---
|
||||
|
||||
def _compute_aggregate(agg, group):
|
||||
"""Compute a single aggregate function over a group of solutions."""
|
||||
agg_name = agg.name if hasattr(agg, "name") else ""
|
||||
|
||||
# Get the expression to aggregate
|
||||
expr = agg.vars if hasattr(agg, "vars") else None
|
||||
|
||||
if agg_name == "Aggregate_Count":
|
||||
if hasattr(agg, "distinct") and agg.distinct:
|
||||
vals = set()
|
||||
for sol in group:
|
||||
if expr:
|
||||
val = evaluate_expression(expr, sol)
|
||||
if val is not None:
|
||||
vals.add(_term_key(val) if isinstance(val, Term) else val)
|
||||
else:
|
||||
vals.add(id(sol))
|
||||
return Term(type=LITERAL, value=str(len(vals)),
|
||||
datatype="http://www.w3.org/2001/XMLSchema#integer")
|
||||
return Term(type=LITERAL, value=str(len(group)),
|
||||
datatype="http://www.w3.org/2001/XMLSchema#integer")
|
||||
|
||||
if agg_name == "Aggregate_Sum":
|
||||
total = 0
|
||||
for sol in group:
|
||||
val = evaluate_expression(expr, sol) if expr else None
|
||||
num = _try_numeric(val)
|
||||
if num is not None:
|
||||
total += num
|
||||
return Term(type=LITERAL, value=str(total),
|
||||
datatype="http://www.w3.org/2001/XMLSchema#decimal")
|
||||
|
||||
if agg_name == "Aggregate_Avg":
|
||||
total = 0
|
||||
count = 0
|
||||
for sol in group:
|
||||
val = evaluate_expression(expr, sol) if expr else None
|
||||
num = _try_numeric(val)
|
||||
if num is not None:
|
||||
total += num
|
||||
count += 1
|
||||
avg = total / count if count > 0 else 0
|
||||
return Term(type=LITERAL, value=str(avg),
|
||||
datatype="http://www.w3.org/2001/XMLSchema#decimal")
|
||||
|
||||
if agg_name == "Aggregate_Min":
|
||||
min_val = None
|
||||
for sol in group:
|
||||
val = evaluate_expression(expr, sol) if expr else None
|
||||
if val is not None:
|
||||
cmp = _term_key(val) if isinstance(val, Term) else val
|
||||
if min_val is None or cmp < min_val[0]:
|
||||
min_val = (cmp, val)
|
||||
if min_val:
|
||||
val = min_val[1]
|
||||
if isinstance(val, Term):
|
||||
return val
|
||||
return Term(type=LITERAL, value=str(val))
|
||||
return None
|
||||
|
||||
if agg_name == "Aggregate_Max":
|
||||
max_val = None
|
||||
for sol in group:
|
||||
val = evaluate_expression(expr, sol) if expr else None
|
||||
if val is not None:
|
||||
cmp = _term_key(val) if isinstance(val, Term) else val
|
||||
if max_val is None or cmp > max_val[0]:
|
||||
max_val = (cmp, val)
|
||||
if max_val:
|
||||
val = max_val[1]
|
||||
if isinstance(val, Term):
|
||||
return val
|
||||
return Term(type=LITERAL, value=str(val))
|
||||
return None
|
||||
|
||||
if agg_name == "Aggregate_GroupConcat":
|
||||
separator = agg.separator if hasattr(agg, "separator") else " "
|
||||
vals = []
|
||||
for sol in group:
|
||||
val = evaluate_expression(expr, sol) if expr else None
|
||||
if val is not None:
|
||||
if isinstance(val, Term):
|
||||
vals.append(val.value if val.type == LITERAL else val.iri)
|
||||
else:
|
||||
vals.append(str(val))
|
||||
return Term(type=LITERAL, value=separator.join(vals))
|
||||
|
||||
if agg_name == "Aggregate_Sample":
|
||||
if group:
|
||||
val = evaluate_expression(expr, group[0]) if expr else None
|
||||
if isinstance(val, Term):
|
||||
return val
|
||||
if val is not None:
|
||||
return Term(type=LITERAL, value=str(val))
|
||||
return None
|
||||
|
||||
logger.warning(f"Unsupported aggregate: {agg_name}")
|
||||
return None
|
||||
|
||||
|
||||
# --- Helper functions ---
|
||||
|
||||
def _to_term(val):
|
||||
"""
|
||||
Convert a value to a schema Term. Handles Uri and Literal from the
|
||||
knowledge module (returned by TriplesClient) as well as plain strings.
|
||||
"""
|
||||
if val is None:
|
||||
return None
|
||||
if isinstance(val, Term):
|
||||
return val
|
||||
if isinstance(val, Uri):
|
||||
return Term(type=IRI, iri=str(val))
|
||||
if isinstance(val, KgLiteral):
|
||||
return Term(type=LITERAL, value=str(val))
|
||||
if isinstance(val, str):
|
||||
if val.startswith("http://") or val.startswith("https://") or val.startswith("urn:"):
|
||||
return Term(type=IRI, iri=val)
|
||||
return Term(type=LITERAL, value=val)
|
||||
return Term(type=LITERAL, value=str(val))
|
||||
|
||||
|
||||
def _resolve_term(tmpl, solution):
|
||||
"""
|
||||
Resolve a triple pattern term. If it's a variable and bound in the
|
||||
solution, return the bound Term. Otherwise return None (wildcard)
|
||||
for variables, or convert concrete terms.
|
||||
"""
|
||||
if isinstance(tmpl, Variable):
|
||||
name = str(tmpl)
|
||||
if name in solution:
|
||||
return solution[name]
|
||||
return None
|
||||
else:
|
||||
return rdflib_term_to_term(tmpl)
|
||||
|
||||
|
||||
async def _query_pattern(tc, s, p, o, user, collection, limit):
|
||||
"""
|
||||
Issue a streaming triple pattern query via TriplesClient.
|
||||
|
||||
Returns a list of Triple-like objects with s, p, o attributes.
|
||||
"""
|
||||
results = await tc.query(
|
||||
s=s, p=p, o=o,
|
||||
limit=limit,
|
||||
user=user,
|
||||
collection=collection,
|
||||
)
|
||||
return results
|
||||
|
||||
|
||||
def _try_numeric(val):
|
||||
"""Try to convert a value to a number, return None on failure."""
|
||||
if val is None:
|
||||
return None
|
||||
if isinstance(val, (int, float)):
|
||||
return val
|
||||
if isinstance(val, Term) and val.type == LITERAL:
|
||||
try:
|
||||
if "." in val.value:
|
||||
return float(val.value)
|
||||
return int(val.value)
|
||||
except (ValueError, TypeError):
|
||||
return None
|
||||
return None
|
||||
|
||||
|
||||
# --- Handler registry ---
|
||||
|
||||
_HANDLERS = {
|
||||
"SelectQuery": _eval_select_query,
|
||||
"Project": _eval_project,
|
||||
"BGP": _eval_bgp,
|
||||
"Join": _eval_join,
|
||||
"LeftJoin": _eval_left_join,
|
||||
"Union": _eval_union,
|
||||
"Filter": _eval_filter,
|
||||
"Distinct": _eval_distinct,
|
||||
"Reduced": _eval_reduced,
|
||||
"OrderBy": _eval_order_by,
|
||||
"Slice": _eval_slice,
|
||||
"Extend": _eval_extend,
|
||||
"Group": _eval_group,
|
||||
"AggregateJoin": _eval_aggregate_join,
|
||||
"Graph": _eval_graph,
|
||||
"values": _eval_values,
|
||||
"ToMultiSet": _eval_to_multiset,
|
||||
}
|
||||
481
trustgraph-flow/trustgraph/query/sparql/expressions.py
Normal file
481
trustgraph-flow/trustgraph/query/sparql/expressions.py
Normal file
|
|
@ -0,0 +1,481 @@
|
|||
"""
|
||||
SPARQL FILTER expression evaluator.
|
||||
|
||||
Evaluates rdflib algebra expression nodes against a solution (variable
|
||||
binding) to produce a value or boolean result.
|
||||
"""
|
||||
|
||||
import re
|
||||
import logging
|
||||
import operator
|
||||
|
||||
from rdflib.term import Variable, URIRef, Literal, BNode
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
|
||||
from ... schema import Term, IRI, LITERAL, BLANK
|
||||
from . parser import rdflib_term_to_term
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ExpressionError(Exception):
|
||||
"""Raised when a SPARQL expression cannot be evaluated."""
|
||||
pass
|
||||
|
||||
|
||||
def evaluate_expression(expr, solution):
|
||||
"""
|
||||
Evaluate a SPARQL expression against a solution binding.
|
||||
|
||||
Args:
|
||||
expr: rdflib algebra expression node
|
||||
solution: dict mapping variable names to Term values
|
||||
|
||||
Returns:
|
||||
The result value (Term, bool, number, string, or None)
|
||||
"""
|
||||
if expr is None:
|
||||
return True
|
||||
|
||||
# rdflib Variable
|
||||
if isinstance(expr, Variable):
|
||||
name = str(expr)
|
||||
return solution.get(name)
|
||||
|
||||
# rdflib concrete terms
|
||||
if isinstance(expr, URIRef):
|
||||
return Term(type=IRI, iri=str(expr))
|
||||
|
||||
if isinstance(expr, Literal):
|
||||
return rdflib_term_to_term(expr)
|
||||
|
||||
if isinstance(expr, BNode):
|
||||
return Term(type=BLANK, id=str(expr))
|
||||
|
||||
# Boolean constants
|
||||
if isinstance(expr, bool):
|
||||
return expr
|
||||
|
||||
# Numeric constants
|
||||
if isinstance(expr, (int, float)):
|
||||
return expr
|
||||
|
||||
# String constants
|
||||
if isinstance(expr, str):
|
||||
return expr
|
||||
|
||||
# CompValue nodes from rdflib algebra
|
||||
if isinstance(expr, CompValue):
|
||||
return _evaluate_comp_value(expr, solution)
|
||||
|
||||
# List/tuple (e.g. function arguments)
|
||||
if isinstance(expr, (list, tuple)):
|
||||
return [evaluate_expression(e, solution) for e in expr]
|
||||
|
||||
logger.warning(f"Unknown expression type: {type(expr)}: {expr}")
|
||||
return None
|
||||
|
||||
|
||||
def _evaluate_comp_value(node, solution):
|
||||
"""Evaluate a CompValue expression node."""
|
||||
name = node.name
|
||||
|
||||
# Relational expressions: =, !=, <, >, <=, >=
|
||||
if name == "RelationalExpression":
|
||||
return _eval_relational(node, solution)
|
||||
|
||||
# Conditional AND / OR
|
||||
if name == "ConditionalAndExpression":
|
||||
return _eval_conditional_and(node, solution)
|
||||
|
||||
if name == "ConditionalOrExpression":
|
||||
return _eval_conditional_or(node, solution)
|
||||
|
||||
# Unary NOT
|
||||
if name == "UnaryNot":
|
||||
val = evaluate_expression(node.expr, solution)
|
||||
return not _effective_boolean(val)
|
||||
|
||||
# Unary plus/minus
|
||||
if name == "UnaryPlus":
|
||||
return _to_numeric(evaluate_expression(node.expr, solution))
|
||||
|
||||
if name == "UnaryMinus":
|
||||
val = _to_numeric(evaluate_expression(node.expr, solution))
|
||||
return -val if val is not None else None
|
||||
|
||||
# Arithmetic
|
||||
if name == "AdditiveExpression":
|
||||
return _eval_additive(node, solution)
|
||||
|
||||
if name == "MultiplicativeExpression":
|
||||
return _eval_multiplicative(node, solution)
|
||||
|
||||
# SPARQL built-in functions
|
||||
if name.startswith("Builtin_"):
|
||||
return _eval_builtin(name, node, solution)
|
||||
|
||||
# Function call
|
||||
if name == "Function":
|
||||
return _eval_function(node, solution)
|
||||
|
||||
# Exists / NotExists
|
||||
if name == "Builtin_EXISTS":
|
||||
# EXISTS requires graph pattern evaluation - not handled here
|
||||
logger.warning("EXISTS not supported in filter expressions")
|
||||
return True
|
||||
|
||||
if name == "Builtin_NOTEXISTS":
|
||||
logger.warning("NOT EXISTS not supported in filter expressions")
|
||||
return True
|
||||
|
||||
# TrueFilter (used with OPTIONAL)
|
||||
if name == "TrueFilter":
|
||||
return True
|
||||
|
||||
# IN / NOT IN
|
||||
if name == "Builtin_IN":
|
||||
return _eval_in(node, solution)
|
||||
|
||||
if name == "Builtin_NOTIN":
|
||||
return not _eval_in(node, solution)
|
||||
|
||||
logger.warning(f"Unknown CompValue expression: {name}")
|
||||
return None
|
||||
|
||||
|
||||
def _eval_relational(node, solution):
|
||||
"""Evaluate a relational expression (=, !=, <, >, <=, >=)."""
|
||||
left = evaluate_expression(node.expr, solution)
|
||||
right = evaluate_expression(node.other, solution)
|
||||
op = node.op
|
||||
|
||||
if left is None or right is None:
|
||||
return False
|
||||
|
||||
left_cmp = _comparable_value(left)
|
||||
right_cmp = _comparable_value(right)
|
||||
|
||||
ops = {
|
||||
"=": operator.eq, "==": operator.eq,
|
||||
"!=": operator.ne,
|
||||
"<": operator.lt,
|
||||
">": operator.gt,
|
||||
"<=": operator.le,
|
||||
">=": operator.ge,
|
||||
}
|
||||
|
||||
op_fn = ops.get(str(op))
|
||||
if op_fn is None:
|
||||
logger.warning(f"Unknown relational operator: {op}")
|
||||
return False
|
||||
|
||||
try:
|
||||
return op_fn(left_cmp, right_cmp)
|
||||
except TypeError:
|
||||
return False
|
||||
|
||||
|
||||
def _eval_conditional_and(node, solution):
|
||||
"""Evaluate AND expression."""
|
||||
result = _effective_boolean(evaluate_expression(node.expr, solution))
|
||||
if not result:
|
||||
return False
|
||||
for other in node.other:
|
||||
result = _effective_boolean(evaluate_expression(other, solution))
|
||||
if not result:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _eval_conditional_or(node, solution):
|
||||
"""Evaluate OR expression."""
|
||||
result = _effective_boolean(evaluate_expression(node.expr, solution))
|
||||
if result:
|
||||
return True
|
||||
for other in node.other:
|
||||
result = _effective_boolean(evaluate_expression(other, solution))
|
||||
if result:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _eval_additive(node, solution):
|
||||
"""Evaluate additive expression (a + b - c ...)."""
|
||||
result = _to_numeric(evaluate_expression(node.expr, solution))
|
||||
if result is None:
|
||||
return None
|
||||
for op, operand in zip(node.op, node.other):
|
||||
val = _to_numeric(evaluate_expression(operand, solution))
|
||||
if val is None:
|
||||
return None
|
||||
if str(op) == "+":
|
||||
result = result + val
|
||||
elif str(op) == "-":
|
||||
result = result - val
|
||||
return result
|
||||
|
||||
|
||||
def _eval_multiplicative(node, solution):
|
||||
"""Evaluate multiplicative expression (a * b / c ...)."""
|
||||
result = _to_numeric(evaluate_expression(node.expr, solution))
|
||||
if result is None:
|
||||
return None
|
||||
for op, operand in zip(node.op, node.other):
|
||||
val = _to_numeric(evaluate_expression(operand, solution))
|
||||
if val is None:
|
||||
return None
|
||||
if str(op) == "*":
|
||||
result = result * val
|
||||
elif str(op) == "/":
|
||||
if val == 0:
|
||||
return None
|
||||
result = result / val
|
||||
return result
|
||||
|
||||
|
||||
def _eval_builtin(name, node, solution):
|
||||
"""Evaluate SPARQL built-in functions."""
|
||||
builtin = name[len("Builtin_"):]
|
||||
|
||||
if builtin == "BOUND":
|
||||
var_name = str(node.arg)
|
||||
return var_name in solution and solution[var_name] is not None
|
||||
|
||||
if builtin == "isIRI" or builtin == "isURI":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
return isinstance(val, Term) and val.type == IRI
|
||||
|
||||
if builtin == "isLITERAL":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
return isinstance(val, Term) and val.type == LITERAL
|
||||
|
||||
if builtin == "isBLANK":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
return isinstance(val, Term) and val.type == BLANK
|
||||
|
||||
if builtin == "STR":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
return Term(type=LITERAL, value=_to_string(val))
|
||||
|
||||
if builtin == "LANG":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
if isinstance(val, Term) and val.type == LITERAL:
|
||||
return Term(type=LITERAL, value=val.language or "")
|
||||
return Term(type=LITERAL, value="")
|
||||
|
||||
if builtin == "DATATYPE":
|
||||
val = evaluate_expression(node.arg, solution)
|
||||
if isinstance(val, Term) and val.type == LITERAL and val.datatype:
|
||||
return Term(type=IRI, iri=val.datatype)
|
||||
return Term(type=IRI, iri="http://www.w3.org/2001/XMLSchema#string")
|
||||
|
||||
if builtin == "REGEX":
|
||||
text = _to_string(evaluate_expression(node.text, solution))
|
||||
pattern = _to_string(evaluate_expression(node.pattern, solution))
|
||||
flags_str = ""
|
||||
if hasattr(node, "flags") and node.flags is not None:
|
||||
flags_str = _to_string(evaluate_expression(node.flags, solution))
|
||||
|
||||
re_flags = 0
|
||||
if "i" in flags_str:
|
||||
re_flags |= re.IGNORECASE
|
||||
if "m" in flags_str:
|
||||
re_flags |= re.MULTILINE
|
||||
if "s" in flags_str:
|
||||
re_flags |= re.DOTALL
|
||||
|
||||
try:
|
||||
return bool(re.search(pattern, text, re_flags))
|
||||
except re.error:
|
||||
return False
|
||||
|
||||
if builtin == "STRLEN":
|
||||
val = _to_string(evaluate_expression(node.arg, solution))
|
||||
return len(val)
|
||||
|
||||
if builtin == "UCASE":
|
||||
val = _to_string(evaluate_expression(node.arg, solution))
|
||||
return Term(type=LITERAL, value=val.upper())
|
||||
|
||||
if builtin == "LCASE":
|
||||
val = _to_string(evaluate_expression(node.arg, solution))
|
||||
return Term(type=LITERAL, value=val.lower())
|
||||
|
||||
if builtin == "CONTAINS":
|
||||
string = _to_string(evaluate_expression(node.arg1, solution))
|
||||
pattern = _to_string(evaluate_expression(node.arg2, solution))
|
||||
return pattern in string
|
||||
|
||||
if builtin == "STRSTARTS":
|
||||
string = _to_string(evaluate_expression(node.arg1, solution))
|
||||
prefix = _to_string(evaluate_expression(node.arg2, solution))
|
||||
return string.startswith(prefix)
|
||||
|
||||
if builtin == "STRENDS":
|
||||
string = _to_string(evaluate_expression(node.arg1, solution))
|
||||
suffix = _to_string(evaluate_expression(node.arg2, solution))
|
||||
return string.endswith(suffix)
|
||||
|
||||
if builtin == "CONCAT":
|
||||
args = [_to_string(evaluate_expression(a, solution)) for a in node.arg]
|
||||
return Term(type=LITERAL, value="".join(args))
|
||||
|
||||
if builtin == "IF":
|
||||
cond = _effective_boolean(evaluate_expression(node.arg1, solution))
|
||||
if cond:
|
||||
return evaluate_expression(node.arg2, solution)
|
||||
else:
|
||||
return evaluate_expression(node.arg3, solution)
|
||||
|
||||
if builtin == "COALESCE":
|
||||
for arg in node.arg:
|
||||
val = evaluate_expression(arg, solution)
|
||||
if val is not None:
|
||||
return val
|
||||
return None
|
||||
|
||||
if builtin == "sameTerm":
|
||||
left = evaluate_expression(node.arg1, solution)
|
||||
right = evaluate_expression(node.arg2, solution)
|
||||
if not isinstance(left, Term) or not isinstance(right, Term):
|
||||
return False
|
||||
from . solutions import _term_key
|
||||
return _term_key(left) == _term_key(right)
|
||||
|
||||
logger.warning(f"Unsupported built-in function: {builtin}")
|
||||
return None
|
||||
|
||||
|
||||
def _eval_function(node, solution):
|
||||
"""Evaluate a SPARQL function call."""
|
||||
# Cast functions (xsd:integer, xsd:string, etc.)
|
||||
iri = str(node.iri) if hasattr(node, "iri") else ""
|
||||
args = [evaluate_expression(a, solution) for a in node.expr]
|
||||
|
||||
xsd = "http://www.w3.org/2001/XMLSchema#"
|
||||
if iri == xsd + "integer":
|
||||
try:
|
||||
return int(_to_numeric(args[0]))
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
elif iri == xsd + "decimal" or iri == xsd + "double" or iri == xsd + "float":
|
||||
try:
|
||||
return float(_to_numeric(args[0]))
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
elif iri == xsd + "string":
|
||||
return Term(type=LITERAL, value=_to_string(args[0]))
|
||||
elif iri == xsd + "boolean":
|
||||
return _effective_boolean(args[0])
|
||||
|
||||
logger.warning(f"Unsupported function: {iri}")
|
||||
return None
|
||||
|
||||
|
||||
def _eval_in(node, solution):
|
||||
"""Evaluate IN expression."""
|
||||
val = evaluate_expression(node.expr, solution)
|
||||
for item in node.other:
|
||||
other = evaluate_expression(item, solution)
|
||||
if _comparable_value(val) == _comparable_value(other):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
# --- Value conversion helpers ---
|
||||
|
||||
def _effective_boolean(val):
|
||||
"""Convert a value to its effective boolean value (EBV)."""
|
||||
if isinstance(val, bool):
|
||||
return val
|
||||
if val is None:
|
||||
return False
|
||||
if isinstance(val, (int, float)):
|
||||
return val != 0
|
||||
if isinstance(val, str):
|
||||
return len(val) > 0
|
||||
if isinstance(val, Term):
|
||||
if val.type == LITERAL:
|
||||
v = val.value
|
||||
if val.datatype == "http://www.w3.org/2001/XMLSchema#boolean":
|
||||
return v.lower() in ("true", "1")
|
||||
if val.datatype in (
|
||||
"http://www.w3.org/2001/XMLSchema#integer",
|
||||
"http://www.w3.org/2001/XMLSchema#decimal",
|
||||
"http://www.w3.org/2001/XMLSchema#double",
|
||||
"http://www.w3.org/2001/XMLSchema#float",
|
||||
):
|
||||
try:
|
||||
return float(v) != 0
|
||||
except ValueError:
|
||||
return False
|
||||
return len(v) > 0
|
||||
return True
|
||||
return bool(val)
|
||||
|
||||
|
||||
def _to_string(val):
|
||||
"""Convert a value to a string."""
|
||||
if val is None:
|
||||
return ""
|
||||
if isinstance(val, str):
|
||||
return val
|
||||
if isinstance(val, Term):
|
||||
if val.type == IRI:
|
||||
return val.iri
|
||||
elif val.type == LITERAL:
|
||||
return val.value
|
||||
elif val.type == BLANK:
|
||||
return val.id
|
||||
return str(val)
|
||||
|
||||
|
||||
def _to_numeric(val):
|
||||
"""Convert a value to a number."""
|
||||
if val is None:
|
||||
return None
|
||||
if isinstance(val, (int, float)):
|
||||
return val
|
||||
if isinstance(val, Term) and val.type == LITERAL:
|
||||
try:
|
||||
if "." in val.value:
|
||||
return float(val.value)
|
||||
return int(val.value)
|
||||
except (ValueError, TypeError):
|
||||
return None
|
||||
if isinstance(val, str):
|
||||
try:
|
||||
if "." in val:
|
||||
return float(val)
|
||||
return int(val)
|
||||
except (ValueError, TypeError):
|
||||
return None
|
||||
return None
|
||||
|
||||
|
||||
def _comparable_value(val):
|
||||
"""
|
||||
Convert a value to a form suitable for comparison.
|
||||
Returns a tuple (type, value) for consistent ordering.
|
||||
"""
|
||||
if val is None:
|
||||
return (0, "")
|
||||
if isinstance(val, bool):
|
||||
return (1, val)
|
||||
if isinstance(val, (int, float)):
|
||||
return (2, val)
|
||||
if isinstance(val, str):
|
||||
return (3, val)
|
||||
if isinstance(val, Term):
|
||||
if val.type == IRI:
|
||||
return (4, val.iri)
|
||||
elif val.type == LITERAL:
|
||||
# Try numeric comparison for numeric types
|
||||
num = _to_numeric(val)
|
||||
if num is not None:
|
||||
return (2, num)
|
||||
return (3, val.value)
|
||||
elif val.type == BLANK:
|
||||
return (5, val.id)
|
||||
return (6, str(val))
|
||||
139
trustgraph-flow/trustgraph/query/sparql/parser.py
Normal file
139
trustgraph-flow/trustgraph/query/sparql/parser.py
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
"""
|
||||
SPARQL parser wrapping rdflib's SPARQL 1.1 parser and algebra compiler.
|
||||
Parses a SPARQL query string into an algebra tree for evaluation.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from rdflib.plugins.sparql import prepareQuery
|
||||
from rdflib.plugins.sparql.algebra import translateQuery
|
||||
from rdflib.plugins.sparql.parserutils import CompValue
|
||||
from rdflib.term import Variable, URIRef, Literal, BNode
|
||||
|
||||
from ... schema import Term, Triple, IRI, LITERAL, BLANK
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ParseError(Exception):
|
||||
"""Raised when a SPARQL query cannot be parsed."""
|
||||
pass
|
||||
|
||||
|
||||
class ParsedQuery:
|
||||
"""Result of parsing a SPARQL query string."""
|
||||
|
||||
def __init__(self, algebra, query_type, variables=None):
|
||||
self.algebra = algebra
|
||||
self.query_type = query_type # "select", "ask", "construct", "describe"
|
||||
self.variables = variables or [] # projected variable names (SELECT)
|
||||
|
||||
|
||||
def rdflib_term_to_term(t):
|
||||
"""Convert an rdflib term (URIRef, Literal, BNode) to a schema Term."""
|
||||
if isinstance(t, URIRef):
|
||||
return Term(type=IRI, iri=str(t))
|
||||
elif isinstance(t, Literal):
|
||||
term = Term(type=LITERAL, value=str(t))
|
||||
if t.datatype:
|
||||
term.datatype = str(t.datatype)
|
||||
if t.language:
|
||||
term.language = t.language
|
||||
return term
|
||||
elif isinstance(t, BNode):
|
||||
return Term(type=BLANK, id=str(t))
|
||||
else:
|
||||
return Term(type=LITERAL, value=str(t))
|
||||
|
||||
|
||||
def term_to_rdflib(t):
|
||||
"""Convert a schema Term to an rdflib term."""
|
||||
if t.type == IRI:
|
||||
return URIRef(t.iri)
|
||||
elif t.type == LITERAL:
|
||||
kwargs = {}
|
||||
if t.datatype:
|
||||
kwargs["datatype"] = URIRef(t.datatype)
|
||||
if t.language:
|
||||
kwargs["lang"] = t.language
|
||||
return Literal(t.value, **kwargs)
|
||||
elif t.type == BLANK:
|
||||
return BNode(t.id)
|
||||
else:
|
||||
return Literal(t.value)
|
||||
|
||||
|
||||
def parse_sparql(query_string):
|
||||
"""
|
||||
Parse a SPARQL query string into a ParsedQuery.
|
||||
|
||||
Args:
|
||||
query_string: SPARQL 1.1 query string
|
||||
|
||||
Returns:
|
||||
ParsedQuery with algebra tree, query type, and projected variables
|
||||
|
||||
Raises:
|
||||
ParseError: if the query cannot be parsed
|
||||
"""
|
||||
try:
|
||||
prepared = prepareQuery(query_string)
|
||||
except Exception as e:
|
||||
raise ParseError(f"SPARQL parse error: {e}") from e
|
||||
|
||||
algebra = prepared.algebra
|
||||
|
||||
# Determine query type and extract variables
|
||||
query_type = _detect_query_type(algebra)
|
||||
variables = _extract_variables(algebra, query_type)
|
||||
|
||||
return ParsedQuery(
|
||||
algebra=algebra,
|
||||
query_type=query_type,
|
||||
variables=variables,
|
||||
)
|
||||
|
||||
|
||||
def _detect_query_type(algebra):
|
||||
"""Detect the SPARQL query type from the algebra root."""
|
||||
name = algebra.name
|
||||
|
||||
if name == "SelectQuery":
|
||||
return "select"
|
||||
elif name == "AskQuery":
|
||||
return "ask"
|
||||
elif name == "ConstructQuery":
|
||||
return "construct"
|
||||
elif name == "DescribeQuery":
|
||||
return "describe"
|
||||
|
||||
# The top-level algebra node may be a modifier (Project, Slice, etc.)
|
||||
# wrapping the actual query. Check for common patterns.
|
||||
if name in ("Project", "Distinct", "Reduced", "OrderBy", "Slice"):
|
||||
return "select"
|
||||
|
||||
logger.warning(f"Unknown algebra root type: {name}, assuming select")
|
||||
return "select"
|
||||
|
||||
|
||||
def _extract_variables(algebra, query_type):
|
||||
"""Extract projected variable names from the algebra."""
|
||||
if query_type != "select":
|
||||
return []
|
||||
|
||||
# For SELECT queries, the Project node has PV (projected variables)
|
||||
if hasattr(algebra, "PV"):
|
||||
return [str(v) for v in algebra.PV]
|
||||
|
||||
# Walk down through modifiers to find Project
|
||||
node = algebra
|
||||
while hasattr(node, "p"):
|
||||
node = node.p
|
||||
if hasattr(node, "PV"):
|
||||
return [str(v) for v in node.PV]
|
||||
|
||||
# Fallback: collect all variables from the algebra
|
||||
if hasattr(algebra, "_vars"):
|
||||
return [str(v) for v in algebra._vars]
|
||||
|
||||
return []
|
||||
230
trustgraph-flow/trustgraph/query/sparql/service.py
Normal file
230
trustgraph-flow/trustgraph/query/sparql/service.py
Normal file
|
|
@ -0,0 +1,230 @@
|
|||
"""
|
||||
SPARQL query service. Accepts SPARQL queries, decomposes them into triple
|
||||
pattern lookups via the triples query pub/sub interface, performs in-memory
|
||||
joins/filters/projections, and returns SPARQL result bindings.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from ... schema import SparqlQueryRequest, SparqlQueryResponse
|
||||
from ... schema import SparqlBinding, Error, Term, Triple
|
||||
from ... base import FlowProcessor, ConsumerSpec, ProducerSpec
|
||||
from ... base import TriplesClientSpec
|
||||
|
||||
from . parser import parse_sparql, ParseError
|
||||
from . algebra import evaluate, EvaluationError
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
default_ident = "sparql-query"
|
||||
default_concurrency = 10
|
||||
|
||||
|
||||
class Processor(FlowProcessor):
|
||||
|
||||
def __init__(self, **params):
|
||||
|
||||
id = params.get("id", default_ident)
|
||||
concurrency = params.get("concurrency", default_concurrency)
|
||||
|
||||
super(Processor, self).__init__(
|
||||
**params | {
|
||||
"id": id,
|
||||
"concurrency": concurrency,
|
||||
}
|
||||
)
|
||||
|
||||
self.register_specification(
|
||||
ConsumerSpec(
|
||||
name="request",
|
||||
schema=SparqlQueryRequest,
|
||||
handler=self.on_message,
|
||||
concurrency=concurrency,
|
||||
)
|
||||
)
|
||||
|
||||
self.register_specification(
|
||||
ProducerSpec(
|
||||
name="response",
|
||||
schema=SparqlQueryResponse,
|
||||
)
|
||||
)
|
||||
|
||||
self.register_specification(
|
||||
TriplesClientSpec(
|
||||
request_name="triples-request",
|
||||
response_name="triples-response",
|
||||
)
|
||||
)
|
||||
|
||||
async def on_message(self, msg, consumer, flow):
|
||||
|
||||
try:
|
||||
|
||||
request = msg.value()
|
||||
id = msg.properties()["id"]
|
||||
|
||||
logger.debug(f"Handling SPARQL query request {id}...")
|
||||
|
||||
response = await self.execute_sparql(request, flow)
|
||||
|
||||
await flow("response").send(response, properties={"id": id})
|
||||
|
||||
logger.debug("SPARQL query request completed")
|
||||
|
||||
except Exception as e:
|
||||
|
||||
logger.error(
|
||||
f"Exception in SPARQL query service: {e}", exc_info=True
|
||||
)
|
||||
|
||||
r = SparqlQueryResponse(
|
||||
error=Error(
|
||||
type="sparql-query-error",
|
||||
message=str(e),
|
||||
),
|
||||
)
|
||||
|
||||
await flow("response").send(r, properties={"id": id})
|
||||
|
||||
async def execute_sparql(self, request, flow):
|
||||
"""Parse and evaluate a SPARQL query."""
|
||||
|
||||
# Parse the SPARQL query
|
||||
try:
|
||||
parsed = parse_sparql(request.query)
|
||||
except ParseError as e:
|
||||
return SparqlQueryResponse(
|
||||
error=Error(
|
||||
type="sparql-parse-error",
|
||||
message=str(e),
|
||||
),
|
||||
)
|
||||
|
||||
# Get the triples client from the flow
|
||||
triples_client = flow("triples-request")
|
||||
|
||||
# Evaluate the algebra
|
||||
try:
|
||||
solutions = await evaluate(
|
||||
parsed.algebra,
|
||||
triples_client,
|
||||
user=request.user or "trustgraph",
|
||||
collection=request.collection or "default",
|
||||
limit=request.limit or 10000,
|
||||
)
|
||||
except EvaluationError as e:
|
||||
return SparqlQueryResponse(
|
||||
error=Error(
|
||||
type="sparql-evaluation-error",
|
||||
message=str(e),
|
||||
),
|
||||
)
|
||||
|
||||
# Build response based on query type
|
||||
if parsed.query_type == "select":
|
||||
return self._build_select_response(parsed, solutions)
|
||||
elif parsed.query_type == "ask":
|
||||
return self._build_ask_response(solutions)
|
||||
elif parsed.query_type == "construct":
|
||||
return self._build_construct_response(parsed, solutions)
|
||||
elif parsed.query_type == "describe":
|
||||
return self._build_describe_response(parsed, solutions)
|
||||
else:
|
||||
return SparqlQueryResponse(
|
||||
error=Error(
|
||||
type="sparql-unsupported",
|
||||
message=f"Unsupported query type: {parsed.query_type}",
|
||||
),
|
||||
)
|
||||
|
||||
def _build_select_response(self, parsed, solutions):
|
||||
"""Build response for SELECT queries."""
|
||||
variables = parsed.variables
|
||||
|
||||
bindings = []
|
||||
for sol in solutions:
|
||||
values = [sol.get(v) for v in variables]
|
||||
bindings.append(SparqlBinding(values=values))
|
||||
|
||||
return SparqlQueryResponse(
|
||||
query_type="select",
|
||||
variables=variables,
|
||||
bindings=bindings,
|
||||
)
|
||||
|
||||
def _build_ask_response(self, solutions):
|
||||
"""Build response for ASK queries."""
|
||||
return SparqlQueryResponse(
|
||||
query_type="ask",
|
||||
ask_result=len(solutions) > 0,
|
||||
)
|
||||
|
||||
def _build_construct_response(self, parsed, solutions):
|
||||
"""Build response for CONSTRUCT queries."""
|
||||
# CONSTRUCT template is in the algebra
|
||||
template = []
|
||||
if hasattr(parsed.algebra, "template"):
|
||||
template = parsed.algebra.template
|
||||
|
||||
triples = []
|
||||
seen = set()
|
||||
|
||||
for sol in solutions:
|
||||
for s_tmpl, p_tmpl, o_tmpl in template:
|
||||
from rdflib.term import Variable
|
||||
from . parser import rdflib_term_to_term
|
||||
|
||||
s = self._resolve_construct_term(s_tmpl, sol)
|
||||
p = self._resolve_construct_term(p_tmpl, sol)
|
||||
o = self._resolve_construct_term(o_tmpl, sol)
|
||||
|
||||
if s is not None and p is not None and o is not None:
|
||||
key = (
|
||||
s.type, s.iri or s.value,
|
||||
p.type, p.iri or p.value,
|
||||
o.type, o.iri or o.value,
|
||||
)
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
triples.append(Triple(s=s, p=p, o=o))
|
||||
|
||||
return SparqlQueryResponse(
|
||||
query_type="construct",
|
||||
triples=triples,
|
||||
)
|
||||
|
||||
def _build_describe_response(self, parsed, solutions):
|
||||
"""Build response for DESCRIBE queries."""
|
||||
# DESCRIBE returns all triples about the described resources
|
||||
# For now, return empty - would need additional triples queries
|
||||
return SparqlQueryResponse(
|
||||
query_type="describe",
|
||||
triples=[],
|
||||
)
|
||||
|
||||
def _resolve_construct_term(self, tmpl, solution):
|
||||
"""Resolve a CONSTRUCT template term."""
|
||||
from rdflib.term import Variable
|
||||
from . parser import rdflib_term_to_term
|
||||
|
||||
if isinstance(tmpl, Variable):
|
||||
return solution.get(str(tmpl))
|
||||
else:
|
||||
return rdflib_term_to_term(tmpl)
|
||||
|
||||
@staticmethod
|
||||
def add_args(parser):
|
||||
FlowProcessor.add_args(parser)
|
||||
|
||||
parser.add_argument(
|
||||
'-c', '--concurrency',
|
||||
type=int,
|
||||
default=default_concurrency,
|
||||
help=f'Number of concurrent requests '
|
||||
f'(default: {default_concurrency})'
|
||||
)
|
||||
|
||||
|
||||
def run():
|
||||
Processor.launch(default_ident, __doc__)
|
||||
248
trustgraph-flow/trustgraph/query/sparql/solutions.py
Normal file
248
trustgraph-flow/trustgraph/query/sparql/solutions.py
Normal file
|
|
@ -0,0 +1,248 @@
|
|||
"""
|
||||
Solution sequence operations for SPARQL evaluation.
|
||||
|
||||
A solution is a dict mapping variable names (str) to Term values.
|
||||
A solution sequence is a list of solutions.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from collections import defaultdict
|
||||
|
||||
from ... schema import Term, IRI, LITERAL, BLANK
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _term_key(term):
|
||||
"""Create a hashable key from a Term for join/distinct operations."""
|
||||
if term is None:
|
||||
return None
|
||||
if term.type == IRI:
|
||||
return ("i", term.iri)
|
||||
elif term.type == LITERAL:
|
||||
return ("l", term.value, term.datatype, term.language)
|
||||
elif term.type == BLANK:
|
||||
return ("b", term.id)
|
||||
else:
|
||||
return ("?", str(term))
|
||||
|
||||
|
||||
def _solution_key(solution, variables):
|
||||
"""Create a hashable key from a solution for the given variables."""
|
||||
return tuple(_term_key(solution.get(v)) for v in variables)
|
||||
|
||||
|
||||
def _terms_equal(a, b):
|
||||
"""Check if two Terms are equal."""
|
||||
if a is None and b is None:
|
||||
return True
|
||||
if a is None or b is None:
|
||||
return False
|
||||
return _term_key(a) == _term_key(b)
|
||||
|
||||
|
||||
def _compatible(sol_a, sol_b):
|
||||
"""Check if two solutions are compatible (agree on shared variables)."""
|
||||
shared = set(sol_a.keys()) & set(sol_b.keys())
|
||||
return all(_terms_equal(sol_a[v], sol_b[v]) for v in shared)
|
||||
|
||||
|
||||
def _merge(sol_a, sol_b):
|
||||
"""Merge two compatible solutions into one."""
|
||||
result = dict(sol_a)
|
||||
result.update(sol_b)
|
||||
return result
|
||||
|
||||
|
||||
def hash_join(left, right):
|
||||
"""
|
||||
Inner join two solution sequences on shared variables.
|
||||
Uses hash join for efficiency.
|
||||
"""
|
||||
if not left or not right:
|
||||
return []
|
||||
|
||||
left_vars = set()
|
||||
for sol in left:
|
||||
left_vars.update(sol.keys())
|
||||
|
||||
right_vars = set()
|
||||
for sol in right:
|
||||
right_vars.update(sol.keys())
|
||||
|
||||
shared = sorted(left_vars & right_vars)
|
||||
|
||||
if not shared:
|
||||
# Cross product
|
||||
return [_merge(l, r) for l in left for r in right]
|
||||
|
||||
# Build hash table on the smaller side
|
||||
if len(left) <= len(right):
|
||||
index = defaultdict(list)
|
||||
for sol in left:
|
||||
key = _solution_key(sol, shared)
|
||||
index[key].append(sol)
|
||||
|
||||
results = []
|
||||
for sol_r in right:
|
||||
key = _solution_key(sol_r, shared)
|
||||
for sol_l in index.get(key, []):
|
||||
results.append(_merge(sol_l, sol_r))
|
||||
return results
|
||||
else:
|
||||
index = defaultdict(list)
|
||||
for sol in right:
|
||||
key = _solution_key(sol, shared)
|
||||
index[key].append(sol)
|
||||
|
||||
results = []
|
||||
for sol_l in left:
|
||||
key = _solution_key(sol_l, shared)
|
||||
for sol_r in index.get(key, []):
|
||||
results.append(_merge(sol_l, sol_r))
|
||||
return results
|
||||
|
||||
|
||||
def left_join(left, right, filter_fn=None):
|
||||
"""
|
||||
Left outer join (OPTIONAL semantics).
|
||||
Every left solution is preserved. If it joins with right solutions
|
||||
(and passes the optional filter), the merged solutions are included.
|
||||
Otherwise the original left solution is kept.
|
||||
"""
|
||||
if not left:
|
||||
return []
|
||||
|
||||
if not right:
|
||||
return list(left)
|
||||
|
||||
right_vars = set()
|
||||
for sol in right:
|
||||
right_vars.update(sol.keys())
|
||||
|
||||
left_vars = set()
|
||||
for sol in left:
|
||||
left_vars.update(sol.keys())
|
||||
|
||||
shared = sorted(left_vars & right_vars)
|
||||
|
||||
# Build hash table on right side
|
||||
index = defaultdict(list)
|
||||
for sol in right:
|
||||
key = _solution_key(sol, shared) if shared else ()
|
||||
index[key].append(sol)
|
||||
|
||||
results = []
|
||||
for sol_l in left:
|
||||
key = _solution_key(sol_l, shared) if shared else ()
|
||||
matches = index.get(key, [])
|
||||
|
||||
matched = False
|
||||
for sol_r in matches:
|
||||
merged = _merge(sol_l, sol_r)
|
||||
if filter_fn is None or filter_fn(merged):
|
||||
results.append(merged)
|
||||
matched = True
|
||||
|
||||
if not matched:
|
||||
results.append(dict(sol_l))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def union(left, right):
|
||||
"""Union two solution sequences (concatenation)."""
|
||||
return list(left) + list(right)
|
||||
|
||||
|
||||
def project(solutions, variables):
|
||||
"""Keep only the specified variables in each solution."""
|
||||
return [
|
||||
{v: sol[v] for v in variables if v in sol}
|
||||
for sol in solutions
|
||||
]
|
||||
|
||||
|
||||
def distinct(solutions):
|
||||
"""Remove duplicate solutions."""
|
||||
seen = set()
|
||||
results = []
|
||||
for sol in solutions:
|
||||
key = tuple(sorted(
|
||||
(k, _term_key(v)) for k, v in sol.items()
|
||||
))
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
results.append(sol)
|
||||
return results
|
||||
|
||||
|
||||
def order_by(solutions, key_fns):
|
||||
"""
|
||||
Sort solutions by the given key functions.
|
||||
|
||||
key_fns is a list of (fn, ascending) tuples where fn extracts
|
||||
a comparable value from a solution.
|
||||
"""
|
||||
if not key_fns:
|
||||
return solutions
|
||||
|
||||
def sort_key(sol):
|
||||
keys = []
|
||||
for fn, ascending in key_fns:
|
||||
val = fn(sol)
|
||||
# Convert to comparable form
|
||||
if val is None:
|
||||
comparable = ("", "")
|
||||
elif isinstance(val, Term):
|
||||
comparable = _term_key(val)
|
||||
else:
|
||||
comparable = ("v", str(val))
|
||||
keys.append(comparable)
|
||||
return keys
|
||||
|
||||
# Handle ascending/descending
|
||||
# For simplicity, sort ascending then reverse individual keys
|
||||
# This works for single sort keys; for multiple mixed keys we
|
||||
# need a wrapper
|
||||
result = sorted(solutions, key=sort_key)
|
||||
|
||||
# If any key is descending, we need a more complex approach.
|
||||
# Check if all are same direction for the simple case.
|
||||
if key_fns and all(not asc for _, asc in key_fns):
|
||||
result.reverse()
|
||||
elif key_fns and not all(asc for _, asc in key_fns):
|
||||
# Mixed ascending/descending - use negation wrapper
|
||||
result = _mixed_sort(solutions, key_fns)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _mixed_sort(solutions, key_fns):
|
||||
"""Sort with mixed ascending/descending keys."""
|
||||
import functools
|
||||
|
||||
def compare(a, b):
|
||||
for fn, ascending in key_fns:
|
||||
va = fn(a)
|
||||
vb = fn(b)
|
||||
ka = _term_key(va) if isinstance(va, Term) else ("v", str(va)) if va is not None else ("", "")
|
||||
kb = _term_key(vb) if isinstance(vb, Term) else ("v", str(vb)) if vb is not None else ("", "")
|
||||
|
||||
if ka < kb:
|
||||
return -1 if ascending else 1
|
||||
elif ka > kb:
|
||||
return 1 if ascending else -1
|
||||
|
||||
return 0
|
||||
|
||||
return sorted(solutions, key=functools.cmp_to_key(compare))
|
||||
|
||||
|
||||
def slice_solutions(solutions, offset=0, limit=None):
|
||||
"""Apply OFFSET and LIMIT to a solution sequence."""
|
||||
if offset:
|
||||
solutions = solutions[offset:]
|
||||
if limit is not None:
|
||||
solutions = solutions[:limit]
|
||||
return solutions
|
||||
Loading…
Add table
Add a link
Reference in a new issue