Merge origin/main into merge-scan-into-ingest-v1

This commit is contained in:
Andrey Avtomonov 2026-05-14 00:47:03 +02:00
commit 9131c82724
98 changed files with 3207 additions and 1007 deletions

View file

@ -63,8 +63,7 @@ agents.
"connections": [
{
"id": "my-warehouse",
"driver": "postgres",
"readonly": false
"driver": "postgres"
}
]
}

View file

@ -18,8 +18,6 @@ ktx setup [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--project-dir <path>` | KTX project directory | `KTX_PROJECT_DIR`, nearest `ktx.yaml`, or cwd |
| `--new` | Create a new KTX project before setup | `false` |
| `--existing` | Use an existing KTX project | `false` |
| `--yes` | Accept safe defaults in non-interactive setup | `false` |
| `--no-input` | Disable interactive terminal input | — |
@ -29,75 +27,11 @@ ktx setup [options]
|------|-------------|---------|
| `--agents` | Install agent integration only | `false` |
| `--target <target>` | Agent target (`claude-code`, `codex`, `cursor`, `opencode`, `universal`) | — |
| `--agent-scope <scope>` | Agent install scope (`project` or `global`) | `project` |
| `--project` | Install agent integration into the project scope | `false` |
| `--global` | Install agent integration into the global target scope (Claude Code and Codex only) | `false` |
| `--skip-agents` | Leave agent integration incomplete for now | `false` |
### LLM Configuration
| Flag | Description | Default |
|------|-------------|---------|
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key | — |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key | — |
| `--anthropic-model <model>` | Anthropic model ID to validate and save | — |
| `--skip-llm` | Leave LLM setup incomplete for now | `false` |
### Embedding Configuration
| Flag | Description | Default |
|------|-------------|---------|
| `--embedding-backend <backend>` | Embedding backend (`openai` or `sentence-transformers`) | — |
| `--embedding-api-key-env <name>` | Environment variable containing the embedding provider API key | — |
| `--embedding-api-key-file <path>` | File containing the embedding provider API key | — |
| `--skip-embeddings` | Leave embedding setup incomplete for now | `false` |
### Database Configuration
| Flag | Description | Default |
|------|-------------|---------|
| `--database <driver>` | Database driver to configure; repeatable (`sqlite`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake`) | — |
| `--database-connection-id <id>` | Existing or new connection id; repeatable | — |
| `--new-database-connection-id <id>` | Connection id for one new database connection | — |
| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection | — |
| `--database-schema <schema>` | Database schema to include; repeatable | — |
| `--skip-databases` | Leave database setup incomplete | `false` |
### Query history
| Flag | Description | Default |
|------|-------------|---------|
| `--enable-query-history` | Enable query history when the selected database supports it | `false` |
| `--disable-query-history` | Disable query history for the selected database | `false` |
| `--query-history-window-days <number>` | Query-history lookback window in days | — |
| `--query-history-min-executions <number>` | Minimum executions for a query-history template | — |
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable | — |
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable | — |
### Context Source Configuration
| Flag | Description | Default |
|------|-------------|---------|
| `--source <type>` | Source connector type (`dbt`, `metricflow`, `metabase`, `looker`, `lookml`, `notion`) | — |
| `--source-connection-id <id>` | Connection id for source setup | — |
| `--source-path <path>` | Local source path for dbt, MetricFlow, or LookML | — |
| `--source-git-url <url>` | Git URL for dbt, MetricFlow, or LookML | — |
| `--source-branch <branch>` | Git branch for source setup | — |
| `--source-subpath <path>` | Repo subpath for source setup | — |
| `--source-auth-token-ref <ref>` | `env:` or `file:` credential ref for source repo auth | — |
| `--source-url <url>` | Source service URL for Metabase or Looker | — |
| `--source-api-key-ref <ref>` | `env:` or `file:` API key ref for Metabase or Notion | — |
| `--source-client-id <id>` | Looker client id | — |
| `--source-client-secret-ref <ref>` | `env:` or `file:` Looker client secret ref | — |
| `--source-warehouse-connection-id <id>` | Mapped warehouse connection id | — |
| `--source-project-name <name>` | dbt project name override | — |
| `--source-profiles-path <path>` | dbt profiles path | — |
| `--source-target <target>` | dbt target or source-specific mapping target | — |
| `--metabase-database-id <id>` | Metabase database id to map | — |
| `--notion-crawl-mode <mode>` | Notion crawl mode (`all_accessible` or `selected_roots`) | — |
| `--notion-root-page-id <id>` | Notion root page id; repeatable | — |
| `--skip-initial-source-ingest` | Validate source setup without building source context during setup | `false` |
| `--skip-sources` | Mark optional source setup complete with no sources | `false` |
The setup wizard is the public configuration interface. It prompts for LLM
credentials, embeddings, database connections, context sources, query history,
and agent integration when those values are needed.
## Examples
@ -105,17 +39,8 @@ ktx setup [options]
# Run the interactive setup wizard
ktx setup
# Create a new project and run setup
ktx setup --new
# Resume setup in an existing project
ktx setup --existing
# Non-interactive setup with Anthropic key from environment
ktx setup --yes --anthropic-api-key-env ANTHROPIC_API_KEY
# Set up a Postgres connection
ktx setup --database postgres --database-url "env:DATABASE_URL"
# Run setup for a specific project directory
ktx setup --project-dir ./analytics
# Install agent integration for Claude Code only
ktx setup --agents --target claude-code
@ -123,12 +48,6 @@ ktx setup --agents --target claude-code
# Install agent integration globally for Codex
ktx setup --agents --target codex --global
# Add a dbt source from a local path
ktx setup --source dbt --source-path ./my-dbt-project
# Skip optional steps for a minimal setup
ktx setup --skip-sources --skip-agents
# Check setup readiness
ktx status
```
@ -155,5 +74,5 @@ Agent integration ready: yes (codex:project)
|-------|-------|----------|
| Setup resumes an unexpected project | `KTX_PROJECT_DIR` or nearest `ktx.yaml` points to another directory | Pass `--project-dir <path>` explicitly |
| Health check for model fails | Provider key or model id is invalid | Set the correct environment variable or secret file and rerun setup |
| Setup cannot run in CI | Interactive prompts need a TTY | Use `--yes --no-input` with explicit flags for required values |
| Setup cannot run in CI | Interactive prompts need a TTY | Run setup interactively before CI, or provide a fixture `ktx.yaml` for automated tests |
| Agent integration missing | Setup skipped the agents step | Run `ktx setup --agents --target <target>` |

View file

@ -244,7 +244,7 @@ Agent integration ready: yes (claude-code:project)
| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime status`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection |
| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup` and choose to build context now |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --project` using the target you need |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex` using the target you need |
## Next steps

View file

@ -24,7 +24,6 @@ Agents must configure and ingest context sources in this order:
| Field | Required | Description |
|-------|----------|-------------|
| `driver` | Yes | Source adapter: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` |
| `readonly` | Strongly recommended | Marks the source as read-only for KTX |
| `source_dir` | For local file sources | Absolute or project-relative source directory |
| `repo_url` | For Git-hosted sources | Git repository URL |
| `branch` | No | Git branch to read |
@ -50,7 +49,6 @@ connections:
my-dbt:
driver: dbt
source_dir: /path/to/dbt/project
readonly: true
```
For a Git-hosted project:
@ -63,7 +61,6 @@ connections:
branch: main
path: analytics/dbt # For monorepos
auth_token_ref: env:GITHUB_TOKEN
readonly: true
```
### Authentication
@ -111,7 +108,6 @@ connections:
branch: main
path: dbt_metrics # Subdirectory for monorepos
auth_token_ref: env:GITHUB_TOKEN
readonly: true
```
For a local path:
@ -158,7 +154,6 @@ connections:
branch: main
path: analytics # Subdirectory for monorepos
auth_token_ref: env:GITHUB_TOKEN
readonly: true
```
For a local path:
@ -220,7 +215,6 @@ connections:
syncEnabled:
"3": true
syncMode: ONLY # Only ingest mapped databases
readonly: true
```
### Authentication
@ -277,7 +271,6 @@ connections:
mappings:
connectionMappings:
postgres_connection: postgres-main # Looker conn → KTX conn
readonly: true
```
### Authentication
@ -330,7 +323,6 @@ connections:
crawl_mode: selected_roots
root_page_ids:
- "abc123def456..."
readonly: true
```
For crawling all accessible pages:
@ -341,7 +333,6 @@ connections:
driver: notion
auth_token_ref: env:NOTION_TOKEN
crawl_mode: all_accessible
readonly: true
```
### Authentication

View file

@ -25,7 +25,6 @@ Agents should prefer environment or file references over literal secrets.
| `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` |
| `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, ClickHouse, SQL Server | Field-by-field connection values |
| `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan |
| `readonly` | Strongly recommended | all primary sources | Marks the connection as read-only in KTX config |
| `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it |
| `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference |
@ -39,9 +38,8 @@ The most full-featured connector. Supports schema introspection, foreign key det
connections:
my-postgres:
driver: postgres
url: postgresql://user:password@host:5432/database
url: env:DATABASE_URL
schema: public
readonly: true
```
Or with individual fields:
@ -59,7 +57,6 @@ connections:
- public
- analytics
ssl: true
readonly: true
```
### Authentication
@ -128,7 +125,6 @@ connections:
username: KTX_SERVICE
password: env:SNOWFLAKE_PASSWORD
role: ANALYST
readonly: true
```
For multiple schemas:
@ -201,7 +197,6 @@ connections:
credentials_json: file:~/.config/gcloud/bq-service-account.json
dataset_id: analytics
location: US
readonly: true
```
For multiple datasets:
@ -274,7 +269,6 @@ connections:
my-clickhouse:
driver: clickhouse
url: http://localhost:8123/analytics
readonly: true
```
Or with individual fields:
@ -289,7 +283,6 @@ connections:
username: default
password: env:CH_PASSWORD
ssl: false
readonly: true
```
### Authentication
@ -332,8 +325,7 @@ Standard MySQL/MariaDB connector with full foreign key support and schema intros
connections:
my-mysql:
driver: mysql
url: mysql://user:password@host:3306/database
readonly: true
url: env:MYSQL_DATABASE_URL
```
Or with individual fields:
@ -348,7 +340,6 @@ connections:
username: ktx_reader
password: env:MYSQL_PASSWORD
ssl: true
readonly: true
```
### Authentication
@ -391,8 +382,7 @@ Connects to Microsoft SQL Server and Azure SQL. Supports multi-schema scanning w
connections:
my-sqlserver:
driver: sqlserver
url: mssql://user:password@host:1433/database?trustServerCertificate=true
readonly: true
url: env:SQLSERVER_DATABASE_URL
```
Or with individual fields:
@ -408,7 +398,6 @@ connections:
password: env:MSSQL_PASSWORD
schema: dbo
trustServerCertificate: true
readonly: true
```
For multiple schemas:
@ -460,7 +449,6 @@ connections:
my-sqlite:
driver: sqlite
path: ./data/warehouse.sqlite
readonly: true
```
Path supports multiple formats: