feat(query-history): scope mining to modeled schemas by default (#258)

* feat(query-history): structure SQL analysis table refs * feat(query-history): qualify SQL analysis table refs * feat(query-history): wire modeled scope floor through ingest * chore(query-history): verify scope floor * test(query-history): align daemon SQL batch endpoint contract * feat(query-history): build scope from same-run scan catalog * feat(query-history): fail open on scope-floor catalog failures * chore(query-history): verify scope-floor v1 closure * refactor(query-history): share scope membership * feat(setup): apply derived query history filters * docs: document derived query history filters * fix(query-history): redact filter picker LLM prompt SQL * fix(setup): run filter picker SQL analysis through managed daemon * chore(query-history): verify filter picker v1 closure * fix(query-history): fail open on partial service-account attribution * fix(query-history): aggregate BigQuery users by execution count * fix(query-history): aggregate Snowflake users by execution count * fix(query-history): use BigQuery query info hash
2026-06-19 08:28:06 +02:00 · 2026-06-03 17:19:42 +02:00 · 2026-06-03 17:19:42 +02:00 · e70ae1e63b
commit e70ae1e63b
parent ce1516b357
42 changed files with 3090 additions and 274 deletions
--- a/docs-site/content/docs/cli-reference/ktx-setup.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-setup.mdx
@ -148,6 +148,13 @@ fix the prerequisite. If the later schema-context build also fails, interactive
 setup offers **Disable query history and retry** so you can finish database
 setup with `connections.<id>.context.queryHistory.enabled: false`.

+After the schema scan completes, setup can derive query-history service-account
+filters from in-scope history. If **ktx** finds clear operational roles, it
+prints each proposed exclusion with a reason and writes
+`connections.<id>.context.queryHistory.filters.serviceAccounts` only when you
+apply the proposal. In non-interactive setup with `--yes`, the proposal is
+applied automatically. Existing `serviceAccounts` blocks are never overwritten.
+
 For BigQuery, the remediation tells you to grant `roles/bigquery.resourceViewer`
 on the BigQuery project, or grant a custom role that contains
 `bigquery.jobs.listAll`.
--- a/docs-site/content/docs/configuration/ktx-yaml.mdx
+++ b/docs-site/content/docs/configuration/ktx-yaml.mdx
@ -179,9 +179,22 @@ connections:
    context:
      queryHistory:
        enabled: true
+        enabledSchemas:
+          - orbit_raw
+          - orbit_analytics
        minExecutions: 5
 ```

+- `enabledSchemas`: Optional list of schema or dataset names that query-history
+  ingest may mine. Omit it to let **ktx** derive the modeled schema floor from
+  the connection and semantic-layer sources. Use `["*"]` to disable the floor
+  for discovery runs.
+- `filters.serviceAccounts`: Optional service-account filter block. During
+  setup, when query history is enabled and no service-account block already
+  exists, **ktx** can propose exact role patterns such as `^svc_loader$` from
+  observed in-scope query history. The block uses `mode: exclude` and remains
+  hand-editable.
+
 ### Metabase

 ```yaml
--- a/docs-site/content/docs/guides/building-context.mdx
+++ b/docs-site/content/docs/guides/building-context.mdx
@ -57,7 +57,10 @@ isolation.
 ## Query history

 PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
-filters, service-account patterns, redaction rules, and high-usage templates.
+filters, redaction rules, high-usage templates, and service-account exclusions.
+When query history is enabled during setup, **ktx** reviews observed in-scope
+roles and can write exact `filters.serviceAccounts` patterns for operational
+traffic such as loader or refresh roles.

 Enable it during setup, store it under `connections.<id>.context.queryHistory`,
 or request it for one run: