From aa413501ffb8b7ce4f818f28947d10950941a592 Mon Sep 17 00:00:00 2001 From: Andrey Avtomonov Date: Wed, 13 May 2026 19:34:09 +0200 Subject: [PATCH] docs: update setup and primary source ingest wording --- .../content/docs/cli-reference/ktx-setup.mdx | 15 ++- .../docs/getting-started/quickstart.mdx | 58 +++++----- .../docs/integrations/primary-sources.mdx | 109 +++++++++--------- 3 files changed, 94 insertions(+), 88 deletions(-) diff --git a/docs-site/content/docs/cli-reference/ktx-setup.mdx b/docs-site/content/docs/cli-reference/ktx-setup.mdx index f490988a..fb4ba545 100644 --- a/docs-site/content/docs/cli-reference/ktx-setup.mdx +++ b/docs-site/content/docs/cli-reference/ktx-setup.mdx @@ -63,17 +63,16 @@ ktx setup [options] | `--database-schema ` | Database schema to include; repeatable | — | | `--skip-databases` | Leave database setup incomplete | `false` | -### Historic SQL +### Query history | Flag | Description | Default | |------|-------------|---------| -| `--enable-historic-sql` | Enable Historic SQL when the selected database supports it | `false` | -| `--disable-historic-sql` | Disable Historic SQL for the selected database | `false` | -| `--historic-sql-window-days ` | Historic SQL query-history window in days | — | -| `--historic-sql-min-executions ` | Minimum executions for a Historic SQL template | — | -| `--historic-sql-min-calls ` | Alias for `--historic-sql-min-executions` for one release | — | -| `--historic-sql-service-account-pattern ` | Historic SQL service-account regex; repeatable | — | -| `--historic-sql-redaction-pattern ` | Historic SQL SQL-literal redaction regex; repeatable | — | +| `--enable-query-history` | Enable query history when the selected database supports it | `false` | +| `--disable-query-history` | Disable query history for the selected database | `false` | +| `--query-history-window-days ` | Query-history lookback window in days | — | +| `--query-history-min-executions ` | Minimum executions for a query-history template | — | +| `--query-history-service-account-pattern ` | Query-history service-account regex; repeatable | — | +| `--query-history-redaction-pattern ` | Query-history SQL-literal redaction regex; repeatable | — | ### Context Source Configuration diff --git a/docs-site/content/docs/getting-started/quickstart.mdx b/docs-site/content/docs/getting-started/quickstart.mdx index 7aba00fd..a8e416c5 100644 --- a/docs-site/content/docs/getting-started/quickstart.mdx +++ b/docs-site/content/docs/getting-started/quickstart.mdx @@ -81,7 +81,8 @@ ktx dev runtime start --feature local-embeddings ## Step 3: Connect a database -Select one or more databases for KTX to scan. The wizard supports SQLite, PostgreSQL, MySQL, ClickHouse, SQL Server, BigQuery, and Snowflake. +Select one or more databases for KTX to connect to. The wizard supports +SQLite, PostgreSQL, MySQL, ClickHouse, SQL Server, BigQuery, and Snowflake. For PostgreSQL, you can enter connection details field by field or paste a connection URL: @@ -93,22 +94,27 @@ For PostgreSQL, you can enter connection details field by field or paste a conne If your URL contains credentials, KTX saves it to `.ktx/secrets/` and writes a `file:` reference in `ktx.yaml`. You can also use `env:DATABASE_URL` to reference an environment variable. -After connecting, KTX automatically runs a connection test and a structural scan: +After connecting, KTX automatically runs a connection test and builds fast +schema context: ``` -◇ Testing postgres-warehouse -│ ✓ Connection test passed -│ Driver: PostgreSQL · Tables: 42 -│ -◇ Scanning postgres-warehouse -│ ✓ Structural scan completed -│ Changes: 42 new tables -│ -◇ Primary source ready -│ postgres-warehouse · PostgreSQL · structural scan complete +Testing postgres-warehouse + Connection test passed + Driver: PostgreSQL - Tables: 42 + +Building schema context for postgres-warehouse + Running fast database ingest + +Schema context complete for postgres-warehouse + Changes: 42 new tables + +Primary source ready + postgres-warehouse - PostgreSQL - schema context complete ``` -For Snowflake and BigQuery, the wizard offers **Historic SQL** configuration for query history views. For PostgreSQL, enable Historic SQL with `--enable-historic-sql` when `pg_stat_statements` is configured. +For PostgreSQL, Snowflake, and BigQuery, the wizard can enable query-history +ingest when the warehouse history feature is available. Query history is stored +under `connections..context.queryHistory` in `ktx.yaml`. ## Step 4: Add context sources @@ -138,7 +144,8 @@ Context sources are saved to `ktx.yaml` and built during the next step. ## Step 5: Build context -This is where KTX does the heavy lifting. It runs an enriched scan of your database (generating AI-powered column and table descriptions) and ingests metadata from any configured context sources. +This is where KTX builds agent-ready context. It uses the database context +depth saved by setup and ingests metadata from any configured context sources. ``` ◆ Build KTX context for agents? @@ -146,19 +153,14 @@ This is where KTX does the heavy lifting. It runs an enriched scan of your datab │ ○ Leave context unbuilt and exit setup ``` -The build scans each primary source with LLM enrichment, detects table relationships, and runs ingestion agents that reconcile metadata from your context sources into semantic-layer YAML files and wiki pages. +Fast database context builds deterministic schema grounding. Deep database +context also generates AI descriptions, embeddings, and relationship evidence +when those capabilities are configured. -For a small database (under 50 tables), this takes a few minutes. Larger warehouses can take longer. You can press d to detach and let it run in the background: - -``` -KTX context build -Run: setup-context-local-abc123 -Project: /home/user/analytics - -Detach: press d to leave this running. -Resume: ktx setup --project-dir /home/user/analytics -Status: ktx status --project-dir /home/user/analytics -``` +For a small database (under 50 tables), this can take a few minutes. Larger +warehouses can take longer. Context builds run in the foreground; press +Ctrl+C to stop the current run and rerun `ktx setup` or `ktx ingest` +when you are ready to try again. When the build completes, KTX verifies that agent-ready context was produced: @@ -166,7 +168,7 @@ When the build completes, KTX verifies that agent-ready context was produced: KTX context is ready for agents. Primary sources: - postgres-warehouse: enriched scan complete + postgres-warehouse: deep context complete Context sources: dbt-main: memory update complete @@ -246,7 +248,7 @@ Agent integration ready: yes (claude-code:project) ## Next steps -- **Build more context** — learn about [scanning](/docs/guides/building-context), relationship detection, and ingestion workflows in the Building Context guide. +- **Build more context** — learn about [database ingest](/docs/guides/building-context), relationship detection, and source ingestion workflows in the Building Context guide. - **Refine your semantic layer** — the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and wiki pages. - **Understand the architecture** — read [The Context Layer](/docs/concepts/the-context-layer) to learn why a context layer is more than a semantic layer. - **Connect more agents** — see the [Agent Clients](/docs/integrations/agent-clients) integration page for per-tool setup details. diff --git a/docs-site/content/docs/integrations/primary-sources.mdx b/docs-site/content/docs/integrations/primary-sources.mdx index 94dc4e44..62b45fcf 100644 --- a/docs-site/content/docs/integrations/primary-sources.mdx +++ b/docs-site/content/docs/integrations/primary-sources.mdx @@ -3,13 +3,17 @@ title: Primary Sources description: Connect KTX to PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, or SQLite. --- -KTX connects to your data warehouse or database to scan schemas, discover relationships, and execute semantic layer queries. Each connection is defined in `ktx.yaml` under the `connections` key. +KTX connects to your data warehouse or database to build schema context, +discover relationships, and execute semantic layer queries. Each connection is +defined in `ktx.yaml` under the `connections` key. All connectors share these conventions: -- Sensitive values support `env:VAR_NAME` (read from environment) and `file:/path/to/secret` (read from file) references -- Connections are read-only — KTX never writes to your database -- Schema scanning discovers tables, columns, types, and constraints automatically +- Sensitive values support `env:VAR_NAME` (read from environment) and + `file:/path/to/secret` (read from file) references +- Connections are read-only; KTX never writes to your database +- Database ingest discovers tables, columns, types, and constraints + automatically ## Connection field reference @@ -22,12 +26,12 @@ Agents should prefer environment or file references over literal secrets. | `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, ClickHouse, SQL Server | Field-by-field connection values | | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan | | `readonly` | Strongly recommended | all primary sources | Marks the connection as read-only in KTX config | -| `historicSql` | No | supported warehouses | Enables query-history ingestion when the warehouse supports it | +| `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it | | `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference | ## PostgreSQL -The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and historic SQL via `pg_stat_statements`. +The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and query history via `pg_stat_statements`. ### Connection config @@ -75,12 +79,13 @@ connections: | Foreign keys | Yes | Full constraint detection | | Row count estimates | Yes | Via `pg_class.reltuples` | | Column statistics | Yes | Requires `pg_read_all_stats` role | -| Historic SQL | Yes | Via `pg_stat_statements` extension | +| Query history | Yes | Via `pg_stat_statements` extension | | Table sampling | Yes | `TABLESAMPLE SYSTEM` | -### Historic SQL +### Query history -PostgreSQL Historic SQL mines real query patterns from `pg_stat_statements`. This is the most mature local Historic SQL path and helps KTX understand how your team actually queries the data. +PostgreSQL query history mines real query patterns from `pg_stat_statements`. +This helps KTX understand how your team actually queries the data. **Requirements:** - `pg_stat_statements` extension enabled @@ -89,12 +94,12 @@ PostgreSQL Historic SQL mines real query patterns from `pg_stat_statements`. Thi **Config options:** ```yaml -historicSql: - enabled: true - dialect: postgres - minExecutions: 5 - filters: - dropTrivialProbes: true + context: + queryHistory: + enabled: true + minExecutions: 5 + filters: + dropTrivialProbes: true ``` ### Dialect notes @@ -108,7 +113,7 @@ historicSql: ## Snowflake -Connects via the Snowflake SDK. Supports multi-schema scanning, RSA key authentication, and Historic SQL configuration for Snowflake query history. +Connects via the Snowflake SDK. Supports multi-schema scanning, RSA key authentication, and query-history configuration for Snowflake query history. ### Connection config @@ -151,27 +156,27 @@ For multiple schemas: | Foreign keys | No | Not available in Snowflake | | Row count estimates | Yes | From `INFORMATION_SCHEMA.TABLES.ROW_COUNT` | | Column statistics | No | — | -| Historic SQL | Yes | Via `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` when enabled | +| Query history | Yes | Via `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` when enabled | | Table sampling | Yes | — | -### Historic SQL +### Query history -Snowflake Historic SQL reads aggregated query-history templates from +Snowflake query history reads aggregated query-history templates from `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` and feeds the same unified staged artifact shape as Postgres and BigQuery. ```yaml -historicSql: - enabled: true - dialect: snowflake - windowDays: 90 - minExecutions: 5 - filters: - dropTrivialProbes: true - serviceAccounts: - patterns: ['^svc_'] - mode: exclude - redactionPatterns: [] + context: + queryHistory: + enabled: true + windowDays: 90 + minExecutions: 5 + filters: + dropTrivialProbes: true + serviceAccounts: + patterns: ['^svc_'] + mode: exclude + redactionPatterns: [] ``` ### Dialect notes @@ -185,7 +190,7 @@ historicSql: ## BigQuery -Authenticates via GCP service account credentials. Supports multi-dataset scanning and Historic SQL configuration for `INFORMATION_SCHEMA.JOBS_BY_PROJECT`. +Authenticates via GCP service account credentials. Supports multi-dataset scanning and query-history configuration for `INFORMATION_SCHEMA.JOBS_BY_PROJECT`. ### Connection config @@ -226,27 +231,27 @@ The project ID is extracted automatically from the service account JSON file. | Foreign keys | No | Not available in BigQuery | | Row count estimates | Yes | From table metadata | | Column statistics | No | — | -| Historic SQL | Yes | Via region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` when enabled | +| Query history | Yes | Via region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` when enabled | | Table sampling | Yes | — | -### Historic SQL +### Query history -BigQuery Historic SQL reads aggregated query-history templates from +BigQuery query history reads aggregated query-history templates from region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` and feeds the same unified staged artifact shape as Postgres and Snowflake. ```yaml -historicSql: - enabled: true - dialect: bigquery - windowDays: 90 - minExecutions: 5 - filters: - dropTrivialProbes: true - serviceAccounts: - patterns: ['@bot\\.'] - mode: exclude - redactionPatterns: [] + context: + queryHistory: + enabled: true + windowDays: 90 + minExecutions: 5 + filters: + dropTrivialProbes: true + serviceAccounts: + patterns: ['@bot\\.'] + mode: exclude + redactionPatterns: [] ``` ### Dialect notes @@ -304,7 +309,7 @@ connections: | Foreign keys | No | Not a ClickHouse concept | | Row count estimates | Yes | Via `system.parts` aggregation | | Column statistics | No | — | -| Historic SQL | No | — | +| Query history | No | — | | Table sampling | Yes | — | ### Dialect notes @@ -363,7 +368,7 @@ connections: | Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` | | Row count estimates | Yes | From `TABLE_ROWS` (InnoDB estimate) | | Column statistics | No | — | -| Historic SQL | No | — | +| Query history | No | — | | Table sampling | Yes | Uses `RAND()` filter | ### Dialect notes @@ -431,7 +436,7 @@ For multiple schemas: | Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` | | Row count estimates | Yes | Via `sys.dm_db_partition_stats` | | Column statistics | No | — | -| Historic SQL | No | — | +| Query history | No | — | | Table sampling | Yes | — | | Nested analysis | No | — | @@ -490,7 +495,7 @@ No authentication required — SQLite is file-based. The file must be readable b | Foreign keys | Yes | Via `PRAGMA foreign_key_list()` (requires `PRAGMA foreign_keys = ON`) | | Row count estimates | Yes | Exact count via `SELECT COUNT(*)` | | Column statistics | No | — | -| Historic SQL | No | — | +| Query history | No | — | | Table sampling | Yes | — | | Nested analysis | No | — | @@ -508,7 +513,7 @@ No authentication required — SQLite is file-based. The file must be readable b | Error or symptom | Likely cause | Recovery | |------------------|--------------|----------| | Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials | -| Scan returns no tables | Schema/database/project filter is wrong or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions | -| Historic SQL is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun scan or setup | -| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on structural scan output | +| Database ingest returns no tables | Schema, database, or project filter is wrong, or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions | +| Query history is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun `ktx ingest --query-history` or `ktx setup` | +| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on fast schema context | | Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test ` and check the `ktx sl query` flags |