feat: query_policy semantic-layer-only restricts agents to predefined semantic-layer measures (#334)

* feat(sl): add predefined_measures_only guard to semantic query planning

SemanticQuery gains a predefined_measures_only flag; the planner rejects
any measure resolved with Provenance.COMPOSED (runtime aggregate
expressions and query-time derivations) while predefined measures,
predefined derived chains, dimensions, filters, and segments pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(config): add per-connection query_policy to warehouse connections

query_policy: semantic-layer-only | read-only-sql (default) on the
warehouse connection schema, plus a policy module with the raw-SQL
guard, federated member restriction lookup, and the project-level
predicate used to gate sql_execution registration.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli): enforce query_policy on raw SQL through one shared executor

ktx sql and the MCP sql_execution tool now share executeProjectRawSql
(resolve, policy check, read-only validation, execute), collapsing
their duplicated validate-then-execute paths. Restricted connections
are rejected before validation; federated raw SQL is rejected when any
member is restricted. sql_execution is not registered when every SQL
connection is restricted, and connection_list marks restricted
connections so agents route to sl_query. executeProjectReadOnlySql
stays generic for ktx-internal SQL (scan, ingest, SL-generated).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(sl): compile queries with predefined_measures_only from query_policy

compileLocalSlQuery injects the flag from the connection's query_policy,
never from caller input, covering both ktx sl query and the MCP
sl_query tool through the daemon compile path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: document query_policy semantic-layer-only

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(sl): close semantic-layer-only bypasses via filters and federated hint

The predefined_measures_only guard only inspected query.measures, so a
composed aggregate written into `filters` slipped through _classify_filters
into a HAVING clause untouched — letting a restricted agent evaluate
arbitrary aggregates (e.g. threshold-probing `sum(x) BETWEEN a AND b`).
Reject filter clauses that compose an aggregate function; a HAVING that
compares a predefined measure by name (`orders.revenue > 100`) still works.

Also make the federated sl_query error policy-aware: when a member is
restricted, raw federated SQL is disabled too, so stop directing the agent
to `ktx sql -c _ktx_federated` / sql_execution (a guaranteed failure) and
point to per-connection semantic-layer queries instead.

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
This commit is contained in:
Luca Martial 2026-07-03 01:54:17 -07:00 committed by GitHub
parent 66768fe009
commit a651b82e2f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
21 changed files with 887 additions and 68 deletions

View file

@ -139,6 +139,11 @@ connections are declared. It surfaces in `ktx connection` and in the agent's
connection list so the id is discoverable. Querying a single member database
directly with its own connection id (`ktx sql -c pg_books ...`) is unchanged.
If any member connection sets
[`query_policy: semantic-layer-only`](/docs/configuration/ktx-yaml#query-policy),
raw SQL against `_ktx_federated` is rejected as a whole: a federated query can
touch any member's tables, so one restricted member restricts the federation.
## Federated queries are read-only
DuckDB attaches every member database with read-only access. Federated queries

View file

@ -217,6 +217,41 @@ connections:
observed in-scope query history. The block uses `mode: exclude` and remains
hand-editable.
### Query policy
Set `query_policy: semantic-layer-only` on a warehouse connection to stop
agents from authoring SQL against it. The default, `read-only-sql`, allows
parser-validated read-only SQL through `ktx sql` and the `sql_execution` MCP
tool alongside semantic-layer queries.
```yaml
connections:
warehouse:
driver: snowflake
query_policy: semantic-layer-only
```
With `semantic-layer-only`:
- `ktx sql` and the `sql_execution` MCP tool reject the connection with a
clear error. When every SQL connection in the project is restricted, the
`sql_execution` tool is not registered at all.
- Raw SQL against the federated connection (`_ktx_federated`) is rejected
when any member connection is restricted.
- Semantic-layer queries (`ktx sl query`, the `sl_query` tool) accept only
measures predefined in the semantic-layer sources. Composed aggregate
expressions such as `sum(orders.amount)` are rejected wherever they appear,
including inside `filters` (a `HAVING`-style clause may only compare a
predefined measure by name, e.g. `orders.revenue > 100`). Grouping by
declared dimensions, filtering on columns, and segments remain available.
- `connection_list` marks the connection as restricted so agents route to
`sl_query` instead of burning a failed call.
The policy governs agent-facing query authorship, not data access: **ktx**'s
own scan, ingest, and semantic-layer-generated SQL still run, and context
tools such as `entity_details` and `dictionary_search` still expose schema
metadata and sampled values.
### Metabase
```yaml