mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-10 08:05:14 +02:00
docs: align docs with current KTX behavior
This commit is contained in:
parent
6bc8d200ea
commit
3f022148c4
10 changed files with 51 additions and 29 deletions
|
|
@ -29,14 +29,16 @@ connections when you use `--all`.
|
|||
| `--deep` | Use AI-enriched database ingest | Stored connection default, or `fast` |
|
||||
| `--query-history` | Include database query-history usage patterns | Stored connection default |
|
||||
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
|
||||
| `--query-history-window-days <days>` | Query-history lookback window for this run | Stored connection default |
|
||||
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
|
||||
| `--plain` | Print plain text output | `true` |
|
||||
| `--json` | Print JSON output | `false` |
|
||||
| `--no-input` | Disable interactive terminal input | — |
|
||||
|
||||
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
|
||||
database connections. Query-history flags apply only to database connections
|
||||
that support query history. Query-history ingest runs after schema ingest and
|
||||
that support query history. The window flag applies to BigQuery and Snowflake;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead of a
|
||||
time-windowed history table. Query-history ingest runs after schema ingest and
|
||||
requires deep ingest readiness.
|
||||
|
||||
When `--all` selects both databases and context sources, database ingest runs
|
||||
|
|
@ -70,6 +72,7 @@ ktx ingest warehouse --deep
|
|||
|
||||
# Include query-history usage patterns
|
||||
ktx ingest warehouse --deep --query-history
|
||||
# Set the lookback window for BigQuery or Snowflake query history
|
||||
ktx ingest warehouse --query-history-window-days 30
|
||||
|
||||
# Build a source connection
|
||||
|
|
|
|||
|
|
@ -96,13 +96,16 @@ incomplete.
|
|||
|------|-------------|
|
||||
| `--enable-query-history` | Enable query-history ingest when the selected database supports it |
|
||||
| `--disable-query-history` | Disable query-history ingest for the selected database |
|
||||
| `--query-history-window-days <number>` | Query-history lookback window |
|
||||
| `--query-history-window-days <number>` | BigQuery/Snowflake query-history lookback window |
|
||||
| `--query-history-min-executions <number>` | Minimum executions for a query-history template |
|
||||
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable |
|
||||
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable |
|
||||
|
||||
Query history setup is supported for Postgres, BigQuery, and Snowflake. Enabling
|
||||
query history makes deep ingest readiness matter for later `ktx ingest` runs.
|
||||
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
|
||||
window flag applies to BigQuery and Snowflake; Postgres reads the current
|
||||
`pg_stat_statements` aggregate data instead of a time-windowed history table.
|
||||
Enabling query history makes deep ingest readiness matter for later
|
||||
`ktx ingest` runs.
|
||||
|
||||
### Context Sources
|
||||
|
||||
|
|
|
|||
|
|
@ -289,7 +289,7 @@ my-project/
|
|||
│ └── data-quality-notes.md
|
||||
├── raw-sources/
|
||||
│ └── warehouse/
|
||||
│ └── database-ingest/ # Schema ingest artifacts and reports
|
||||
│ └── live-database/ # Schema ingest artifacts and reports
|
||||
└── .ktx/
|
||||
├── db.sqlite # Local state (git-ignored)
|
||||
└── cache/ # Runtime cache (git-ignored)
|
||||
|
|
|
|||
|
|
@ -60,7 +60,8 @@ Use KTX when you want agents to:
|
|||
- **Explain metric provenance** with warehouse evidence
|
||||
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and modern BI platforms
|
||||
|
||||
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and SQL Server.
|
||||
Works with SQLite, PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and SQL
|
||||
Server.
|
||||
|
||||
## Explore the docs
|
||||
|
||||
|
|
|
|||
|
|
@ -51,8 +51,8 @@ For scripted setup, pass the project directory explicitly:
|
|||
ktx setup --project-dir ./analytics
|
||||
```
|
||||
|
||||
If setup exits early, rerun `ktx setup` in the same directory. KTX tracks
|
||||
completed setup steps and resumes from the remaining work.
|
||||
If setup exits early, rerun `ktx setup` in the same directory. KTX keeps local
|
||||
setup progress under `.ktx/setup/` and resumes from the remaining work.
|
||||
|
||||
## Step 2: Configure the LLM
|
||||
|
||||
|
|
@ -122,7 +122,8 @@ Database ready
|
|||
|
||||
PostgreSQL, BigQuery, and Snowflake can also enable query-history ingest. Query
|
||||
history helps KTX learn common query patterns, joins, service-account filters,
|
||||
and warehouse-specific usage.
|
||||
and warehouse-specific usage. BigQuery and Snowflake support a lookback window;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead.
|
||||
|
||||
## Step 5: Add context sources
|
||||
|
||||
|
|
@ -200,7 +201,7 @@ KTX writes plain files so people and agents can inspect changes in git.
|
|||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and setup state |
|
||||
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and query-history settings |
|
||||
| `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml`; do not commit these |
|
||||
| `.ktx/setup/*` | Local setup and context-build state |
|
||||
| `.ktx/agents/install-manifest.json` | Manifest used to manage installed agent files |
|
||||
|
|
|
|||
|
|
@ -62,13 +62,15 @@ configured, run `ktx setup` or use `--fast`.
|
|||
|
||||
PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
|
||||
KTX learn common joins, filters, service-account patterns, redaction rules, and
|
||||
usage-heavy query templates.
|
||||
usage-heavy query templates. BigQuery and Snowflake support a lookback window;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead.
|
||||
|
||||
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
|
||||
or request it for one run:
|
||||
|
||||
```bash
|
||||
ktx ingest warehouse --deep --query-history
|
||||
# Set the lookback window for BigQuery or Snowflake query history
|
||||
ktx ingest warehouse --query-history-window-days 30
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -60,21 +60,25 @@ semantic-layer/<connection-id>/<source-name>.yaml
|
|||
|
||||
```yaml
|
||||
name: orders
|
||||
description: Customer orders with booked revenue.
|
||||
descriptions:
|
||||
user: Customer orders with booked revenue.
|
||||
table: public.orders
|
||||
grain:
|
||||
- order_id
|
||||
columns:
|
||||
- name: order_id
|
||||
type: string
|
||||
description: Unique order identifier.
|
||||
descriptions:
|
||||
user: Unique order identifier.
|
||||
- name: order_date
|
||||
type: time
|
||||
role: time
|
||||
description: Date the order was placed.
|
||||
descriptions:
|
||||
user: Date the order was placed.
|
||||
- name: total_amount
|
||||
type: number
|
||||
description: Booked order value in USD.
|
||||
descriptions:
|
||||
user: Booked order value in USD.
|
||||
measures:
|
||||
- name: total_revenue
|
||||
expr: SUM(total_amount)
|
||||
|
|
@ -85,7 +89,8 @@ measures:
|
|||
|
||||
```yaml
|
||||
name: orders
|
||||
description: Customer orders with line-item totals.
|
||||
descriptions:
|
||||
user: Customer orders with line-item totals.
|
||||
table: public.orders
|
||||
grain:
|
||||
- order_id
|
||||
|
|
@ -93,26 +98,31 @@ grain:
|
|||
columns:
|
||||
- name: order_id
|
||||
type: string
|
||||
description: Unique order identifier.
|
||||
descriptions:
|
||||
user: Unique order identifier.
|
||||
|
||||
- name: order_date
|
||||
type: time
|
||||
role: time
|
||||
description: Date the order was placed.
|
||||
descriptions:
|
||||
user: Date the order was placed.
|
||||
|
||||
- name: status
|
||||
type: string
|
||||
visibility: public
|
||||
description: Current order status.
|
||||
descriptions:
|
||||
user: Current order status.
|
||||
|
||||
- name: _etl_loaded_at
|
||||
type: time
|
||||
visibility: hidden
|
||||
description: Internal load timestamp.
|
||||
descriptions:
|
||||
user: Internal load timestamp.
|
||||
|
||||
- name: total_amount
|
||||
type: number
|
||||
description: Order total in USD.
|
||||
descriptions:
|
||||
user: Order total in USD.
|
||||
|
||||
measures:
|
||||
- name: total_revenue
|
||||
|
|
@ -149,9 +159,10 @@ joins:
|
|||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `name` | Yes | Source identifier. Use lowercase words and underscores. |
|
||||
| `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
|
||||
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
|
||||
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
|
||||
| `columns` | No | Column definitions with type, role, visibility, and descriptions. |
|
||||
| `columns` | Yes | Non-empty column definitions with type, role, visibility, and descriptions. |
|
||||
| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. |
|
||||
| `segments` | No | Named predicates agents can reuse. |
|
||||
| `joins` | No | Relationships to other semantic sources. |
|
||||
|
|
@ -165,7 +176,7 @@ joins:
|
|||
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. |
|
||||
| Column | `role` | No | Special role such as `time` for default time dimensions. |
|
||||
| Column | `visibility` | No | `public`, `internal`, or `hidden`. |
|
||||
| Column | `description` | Strongly recommended | Business meaning and usage notes. |
|
||||
| Column | `descriptions` | Strongly recommended | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
|
||||
| Measure | `name` | Yes | Queryable metric name. |
|
||||
| Measure | `expr` | Yes | SQL aggregation expression at the source grain. |
|
||||
| Measure | `filter` | No | SQL predicate applied only to this measure. |
|
||||
|
|
|
|||
|
|
@ -75,7 +75,7 @@ Available commands:
|
|||
- `ktx status --json --project-dir /path/to/project`
|
||||
- `ktx sl list --json --project-dir /path/to/project`
|
||||
- `ktx sl search '<text>' --json --project-dir /path/to/project --connection-id '<id>'`
|
||||
- `ktx sl query --json --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --execute --max-rows 100`
|
||||
- `ktx sl query --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --format json --execute --max-rows 100`
|
||||
- `ktx wiki search '<query>' --json --project-dir /path/to/project --limit 10`
|
||||
```
|
||||
|
||||
|
|
@ -172,7 +172,7 @@ All supported agent clients call the same KTX CLI commands:
|
|||
| `ktx sl list --json` | List semantic-layer sources |
|
||||
| `ktx sl search <query> --json` | Search semantic-layer sources |
|
||||
| `ktx sl validate <source> --connection-id <id>` | Validate semantic source definitions |
|
||||
| `ktx sl query --json` | Execute a semantic-layer query when semantic compute is configured |
|
||||
| `ktx sl query --format json` | Execute a semantic-layer query when semantic compute is configured |
|
||||
|
||||
### Security constraints
|
||||
|
||||
|
|
|
|||
|
|
@ -34,8 +34,9 @@ automation flags documented in [`ktx setup`](/docs/cli-reference/ktx-setup).
|
|||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, query history, and setup state |
|
||||
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, and query history |
|
||||
| `.ktx/secrets/*` | Local file-backed secrets when you choose file references during setup |
|
||||
| `.ktx/setup/*` | Local setup progress and context-build state |
|
||||
| `semantic-layer/<connection-id>/` | YAML semantic sources generated by database and source ingestion |
|
||||
| `wiki/` | Markdown business context, definitions, and ingested knowledge |
|
||||
| `.ktx/agents/install-manifest.json` | Manifest of agent integration files installed by `ktx setup --agents` |
|
||||
|
|
|
|||
|
|
@ -228,7 +228,7 @@ mapping metadata. The BigQuery connector still authenticates with the
|
|||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Including materialized views and external tables |
|
||||
| Primary keys | No | - |
|
||||
| Primary keys | Yes | Via `INFORMATION_SCHEMA` table constraints when declared |
|
||||
| Foreign keys | No | Not available in BigQuery |
|
||||
| Row count estimates | Yes | From table metadata |
|
||||
| Column statistics | No | - |
|
||||
|
|
@ -500,7 +500,7 @@ No authentication required - SQLite is file-based. The file must be readable by
|
|||
- Uses `LIMIT X OFFSET Y` for pagination
|
||||
- SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB`
|
||||
- Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON`
|
||||
- In-memory databases supported with `path: ":memory:"` (for testing)
|
||||
- Database file must exist before `ktx connection test` or ingest runs
|
||||
|
||||
## Common errors
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue