mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-19 08:28:06 +02:00
chore: move docs site workspace
This commit is contained in:
parent
0ae9b6effd
commit
a46563bb01
52 changed files with 3 additions and 3 deletions
279
docs-site/content/docs/integrations/agent-clients.mdx
Normal file
279
docs-site/content/docs/integrations/agent-clients.mdx
Normal file
|
|
@ -0,0 +1,279 @@
|
|||
---
|
||||
title: Agent Clients
|
||||
description: Set up KTX with Claude Code, Cursor, Codex, and OpenCode.
|
||||
---
|
||||
|
||||
KTX integrates with coding agents through two channels that can be used independently or together:
|
||||
|
||||
- **MCP server** — A persistent Model Context Protocol server that exposes KTX tools (semantic queries, knowledge search, SQL execution) directly to the agent
|
||||
- **CLI skills** — Command definitions that teach the agent how to invoke KTX via the terminal
|
||||
|
||||
Run `ktx setup` and select your agent targets, or configure manually using the snippets below.
|
||||
|
||||
## Claude Code
|
||||
|
||||
### Install via `ktx setup`
|
||||
|
||||
During setup, select **Claude Code** from the agent targets. KTX writes:
|
||||
|
||||
| Mode | File |
|
||||
|------|------|
|
||||
| CLI skills | `.claude/skills/ktx/SKILL.md` |
|
||||
| MCP server | `.mcp.json` (under `mcpServers.ktx`) |
|
||||
|
||||
Both project-scoped and global installations are supported. Global installs write to `~/.claude/skills/ktx/SKILL.md`.
|
||||
|
||||
### Manual MCP configuration
|
||||
|
||||
Add KTX to `.mcp.json` at your project root:
|
||||
|
||||
```json title=".mcp.json"
|
||||
{
|
||||
"mcpServers": {
|
||||
"ktx": {
|
||||
"command": "ktx",
|
||||
"args": [
|
||||
"--project-dir", "/path/to/ktx-project",
|
||||
"serve",
|
||||
"--mcp", "stdio",
|
||||
"--semantic-compute",
|
||||
"--execute-queries"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Replace `/path/to/ktx-project` with your KTX project directory. For a pinned local checkout, use the absolute path to the built CLI as the command and arguments generated by `ktx setup`.
|
||||
|
||||
### Manual CLI skills configuration
|
||||
|
||||
Create `.claude/skills/ktx/SKILL.md`:
|
||||
|
||||
```markdown title=".claude/skills/ktx/SKILL.md"
|
||||
---
|
||||
name: ktx
|
||||
description: Use local KTX semantic context, wiki knowledge, and safe SQL execution for this project.
|
||||
---
|
||||
|
||||
Available commands:
|
||||
- `ktx agent context --json --project-dir /path/to/project`
|
||||
- `ktx agent sl list --json --project-dir /path/to/project`
|
||||
- `ktx agent sl read '<sourceName>' --json --project-dir /path/to/project`
|
||||
- `ktx agent sl query --json --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --execute --max-rows 100`
|
||||
- `ktx agent wiki search '<query>' --json --project-dir /path/to/project`
|
||||
- `ktx agent wiki read '<pageId>' --json --project-dir /path/to/project`
|
||||
- `ktx agent sql execute --json --project-dir /path/to/project --connection-id '<id>' --sql-file '<path>' --max-rows 100`
|
||||
```
|
||||
|
||||
### Workflow tips
|
||||
|
||||
- Claude Code discovers skills automatically from `.claude/skills/` — no restart needed after setup
|
||||
- The MCP server starts on-demand when Claude Code first calls a KTX tool
|
||||
- Use `--semantic-compute` to enable query planning and execution
|
||||
- Global installation (`~/.claude/skills/ktx/SKILL.md`) makes KTX available in all projects without per-project setup
|
||||
|
||||
---
|
||||
|
||||
## Cursor
|
||||
|
||||
### Install via `ktx setup`
|
||||
|
||||
During setup, select **Cursor** from the agent targets. KTX writes:
|
||||
|
||||
| Mode | File |
|
||||
|------|------|
|
||||
| CLI rules | `.cursor/rules/ktx.mdc` |
|
||||
| MCP server | `.cursor/mcp.json` (under `mcpServers.ktx`) |
|
||||
|
||||
Cursor supports project-scoped installation only.
|
||||
|
||||
### Manual MCP configuration
|
||||
|
||||
Create or edit `.cursor/mcp.json`:
|
||||
|
||||
```json title=".cursor/mcp.json"
|
||||
{
|
||||
"mcpServers": {
|
||||
"ktx": {
|
||||
"command": "ktx",
|
||||
"args": [
|
||||
"--project-dir", "/path/to/ktx-project",
|
||||
"serve",
|
||||
"--mcp", "stdio",
|
||||
"--semantic-compute",
|
||||
"--execute-queries"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Manual CLI rules configuration
|
||||
|
||||
Create `.cursor/rules/ktx.mdc` with the same content structure as the Claude Code SKILL.md file — Cursor rules use the `.mdc` extension but support the same markdown format with command definitions.
|
||||
|
||||
### Workflow tips
|
||||
|
||||
- After adding MCP config, restart Cursor or reload the window for the server to connect
|
||||
- Cursor rules in `.cursor/rules/` are automatically loaded into agent context
|
||||
- MCP tools appear in Cursor's tool list once the server is running
|
||||
- Project-scoped only — no global installation option
|
||||
|
||||
---
|
||||
|
||||
## Codex
|
||||
|
||||
### Install via `ktx setup`
|
||||
|
||||
During setup, select **Codex** from the agent targets. KTX writes:
|
||||
|
||||
| Mode | File |
|
||||
|------|------|
|
||||
| CLI skills | `.agents/skills/ktx/SKILL.md` |
|
||||
| MCP server | `.agents/mcp/ktx.json` (under `mcpServers.ktx`) |
|
||||
|
||||
Both project-scoped and global installations are supported. Global installs write to `$CODEX_HOME/skills/ktx/SKILL.md` (defaults to `~/.codex/skills/ktx/SKILL.md`).
|
||||
|
||||
### Manual MCP configuration
|
||||
|
||||
Create or edit `.agents/mcp/ktx.json`:
|
||||
|
||||
```json title=".agents/mcp/ktx.json"
|
||||
{
|
||||
"mcpServers": {
|
||||
"ktx": {
|
||||
"command": "ktx",
|
||||
"args": [
|
||||
"--project-dir", "/path/to/ktx-project",
|
||||
"serve",
|
||||
"--mcp", "stdio",
|
||||
"--semantic-compute",
|
||||
"--execute-queries"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Manual CLI skills configuration
|
||||
|
||||
Create `.agents/skills/ktx/SKILL.md` with the same content structure as Claude Code's SKILL.md.
|
||||
|
||||
### Workflow tips
|
||||
|
||||
- Set `CODEX_HOME` environment variable to customize the global installation directory
|
||||
- Codex shares the `.agents/` directory structure with the universal format — one install covers both
|
||||
- Global installation makes KTX available across all Codex sessions
|
||||
|
||||
---
|
||||
|
||||
## OpenCode
|
||||
|
||||
### Install via `ktx setup`
|
||||
|
||||
During setup, select **OpenCode** from the agent targets. KTX writes:
|
||||
|
||||
| Mode | File |
|
||||
|------|------|
|
||||
| CLI commands | `.opencode/commands/ktx.md` |
|
||||
| MCP server | `.opencode/mcp.json` (under `mcpServers.ktx`) |
|
||||
|
||||
OpenCode supports project-scoped installation only.
|
||||
|
||||
### Manual MCP configuration
|
||||
|
||||
Create or edit `.opencode/mcp.json`:
|
||||
|
||||
```json title=".opencode/mcp.json"
|
||||
{
|
||||
"mcpServers": {
|
||||
"ktx": {
|
||||
"command": "ktx",
|
||||
"args": [
|
||||
"--project-dir", "/path/to/ktx-project",
|
||||
"serve",
|
||||
"--mcp", "stdio",
|
||||
"--semantic-compute",
|
||||
"--execute-queries"
|
||||
],
|
||||
"env": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Manual CLI commands configuration
|
||||
|
||||
Create `.opencode/commands/ktx.md` with the same command definitions as Claude Code's SKILL.md.
|
||||
|
||||
### Workflow tips
|
||||
|
||||
- OpenCode reads commands from `.opencode/commands/` on startup
|
||||
- Project-scoped only — no global installation option
|
||||
- Commands file uses standard markdown format (`.md` extension)
|
||||
|
||||
---
|
||||
|
||||
## MCP server reference
|
||||
|
||||
All agent clients connect to the same KTX MCP server. The server exposes these tools:
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `connection_list` | List configured database connections |
|
||||
| `connection_test` | Test a database connection |
|
||||
| `knowledge_search` | Semantic + full-text search across knowledge pages |
|
||||
| `knowledge_read` | Read a specific knowledge page |
|
||||
| `knowledge_write` | Write or update a knowledge page |
|
||||
| `sl_list_sources` | List semantic layer sources |
|
||||
| `sl_read_source` | Read a semantic source definition |
|
||||
| `sl_write_source` | Write or update a semantic source |
|
||||
| `sl_validate` | Validate a source against the database schema |
|
||||
| `sl_query` | Execute a semantic layer query |
|
||||
| `ingest_trigger` | Trigger an ingestion run |
|
||||
| `ingest_status` | Check ingestion status |
|
||||
| `ingest_report` | View an ingestion report |
|
||||
| `ingest_replay` | Replay a past ingestion session |
|
||||
| `scan_trigger` | Trigger a structural, enriched, or relationship scan |
|
||||
| `scan_status` | Check scan status |
|
||||
| `scan_report` | View a completed scan report |
|
||||
| `scan_list_artifacts` | List artifacts produced by a scan |
|
||||
| `scan_read_artifact` | Read a scan artifact |
|
||||
| `memory_capture` | Capture reusable context from an agent conversation when memory capture is enabled |
|
||||
| `memory_capture_status` | Check a memory capture run |
|
||||
|
||||
### Server flags
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `--project-dir` | KTX project directory; otherwise KTX uses `KTX_PROJECT_DIR`, the nearest `ktx.yaml`, or the current directory | Auto-detected |
|
||||
| `--mcp stdio` | Transport mode (stdio only) | Required |
|
||||
| `--semantic-compute` | Enable semantic layer queries | `false` |
|
||||
| `--execute-queries` | Allow read-only SQL execution | `false` |
|
||||
| `--semantic-compute-url` | Remote compute endpoint URL | — |
|
||||
| `--database-introspection-url` | Live schema introspection endpoint | — |
|
||||
| `--memory-capture` | Record agent interactions | `false` |
|
||||
| `--memory-model` | LLM model for memory processing | — |
|
||||
|
||||
### Security constraints
|
||||
|
||||
- SQL execution is always read-only
|
||||
- Agent CLI SQL execution requires an explicit `--max-rows` limit from 1 to 1000; MCP semantic queries default to a 1000-row cap
|
||||
- Secrets and credentials are never exposed in tool responses
|
||||
- The server runs as a child process of the agent client (no network exposure)
|
||||
|
||||
---
|
||||
|
||||
## Comparison
|
||||
|
||||
| | Claude Code | Cursor | Codex | OpenCode |
|
||||
|---|---|---|---|---|
|
||||
| MCP support | Yes | Yes | Yes | Yes |
|
||||
| CLI skills | Yes | Yes (.mdc) | Yes | Yes |
|
||||
| Global install | Yes | No | Yes | No |
|
||||
| Config location | `.mcp.json` | `.cursor/mcp.json` | `.agents/mcp/ktx.json` | `.opencode/mcp.json` |
|
||||
| Skills location | `.claude/skills/` | `.cursor/rules/` | `.agents/skills/` | `.opencode/commands/` |
|
||||
353
docs-site/content/docs/integrations/context-sources.mdx
Normal file
353
docs-site/content/docs/integrations/context-sources.mdx
Normal file
|
|
@ -0,0 +1,353 @@
|
|||
---
|
||||
title: Context Sources
|
||||
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, and Notion.
|
||||
---
|
||||
|
||||
Context sources feed your existing analytics tooling into KTX. During ingestion, KTX extracts metadata from each source and uses an LLM agent to reconcile it with your existing semantic layer and knowledge base — merging intelligently rather than overwriting.
|
||||
|
||||
All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value.
|
||||
|
||||
## dbt
|
||||
|
||||
Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Model and source definitions from `schema.yml` files
|
||||
- Column descriptions and types
|
||||
- Test coverage signals
|
||||
- Semantic model references (if using dbt semantic layer)
|
||||
- Data lineage between models
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-dbt:
|
||||
driver: dbt
|
||||
source_dir: /path/to/dbt/project
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For a Git-hosted project:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-dbt:
|
||||
driver: dbt
|
||||
repo_url: https://github.com/org/dbt-repo
|
||||
branch: main
|
||||
path: analytics/dbt # For monorepos
|
||||
auth_token_ref: env:GITHUB_TOKEN
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Local path | `source_dir: /absolute/path/to/dbt/project` |
|
||||
| Public repo | `repo_url: https://github.com/org/repo` |
|
||||
| Private repo | `repo_url` + `auth_token_ref: env:GITHUB_TOKEN` |
|
||||
|
||||
**Optional fields:**
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `profiles_path` | Path to `profiles.yml` (if non-standard location) |
|
||||
| `target` | dbt target name (e.g., `dev`, `prod`) |
|
||||
| `project_name` | Override auto-detected project name |
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- YAML semantic sources generated from dbt schema files
|
||||
- One work unit per model file (for projects with >25 YAML files) or all at once for smaller projects
|
||||
- Column descriptions, tests, and relationships are preserved
|
||||
|
||||
---
|
||||
|
||||
## MetricFlow
|
||||
|
||||
Ingests MetricFlow semantic models and metric definitions. Useful when your team defines metrics in MetricFlow's YAML format.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Semantic model definitions (entities, dimensions, measures)
|
||||
- Cross-model metric definitions
|
||||
- Dimension and entity relationships between models
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-metricflow:
|
||||
driver: metricflow
|
||||
metricflow:
|
||||
repoUrl: https://github.com/org/metricflow-repo
|
||||
branch: main
|
||||
path: dbt_metrics # Subdirectory for monorepos
|
||||
auth_token_ref: env:GITHUB_TOKEN
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For a local path:
|
||||
|
||||
```yaml
|
||||
metricflow:
|
||||
repoUrl: file:///absolute/path/to/project
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Public repo | `repoUrl: https://github.com/org/repo` |
|
||||
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
|
||||
| Local path | `repoUrl: file:///path/to/project` |
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Semantic models with their entities, dimensions, and measures
|
||||
- Metric definitions with their expressions and filters
|
||||
- Work units organized by connected component (metrics + related semantic models grouped together)
|
||||
|
||||
---
|
||||
|
||||
## LookML
|
||||
|
||||
Ingests LookML view and model definitions from a Git repository. Extracts field definitions, SQL table references, and join relationships.
|
||||
|
||||
### What it provides
|
||||
|
||||
- View definitions (dimensions, measures, derived tables)
|
||||
- Model explore definitions and joins
|
||||
- SQL table name references
|
||||
- Field-level descriptions and labels
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-lookml:
|
||||
driver: lookml
|
||||
repoUrl: https://github.com/org/lookml-repo
|
||||
branch: main
|
||||
path: analytics # Subdirectory for monorepos
|
||||
auth_token_ref: env:GITHUB_TOKEN
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For a local path:
|
||||
|
||||
```yaml
|
||||
repoUrl: file:///absolute/path/to/lookml
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Public repo | `repoUrl: https://github.com/org/repo` |
|
||||
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
|
||||
| Local path | `repoUrl: file:///path/to/project` |
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- View and model definitions organized by connected component
|
||||
- LookML field types mapped to semantic layer column types
|
||||
- Join definitions and relationship cardinalities
|
||||
- SQL table references for warehouse mapping validation
|
||||
|
||||
### Warehouse mapping
|
||||
|
||||
Optionally validate that LookML references match your expected Looker connection:
|
||||
|
||||
```yaml
|
||||
mappings:
|
||||
expectedLookerConnectionName: postgres_connection
|
||||
```
|
||||
|
||||
This validates that LookML model `connection:` declarations match expectations, flagging mismatches during ingestion.
|
||||
|
||||
---
|
||||
|
||||
## Metabase
|
||||
|
||||
Ingests dashboards, questions, and their underlying SQL queries from a Metabase instance. Maps Metabase databases to your KTX warehouse connections.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Dashboard metadata and organization
|
||||
- Question/query definitions (native SQL and structured queries)
|
||||
- Table and column usage patterns from queries
|
||||
- Database-to-warehouse relationship mapping
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-metabase:
|
||||
driver: metabase
|
||||
api_url: https://metabase.company.com
|
||||
api_key_ref: env:METABASE_API_KEY
|
||||
mappings:
|
||||
databaseMappings:
|
||||
"3": postgres-main # Metabase DB ID → KTX connection
|
||||
syncEnabled:
|
||||
"3": true
|
||||
syncMode: ONLY # Only ingest mapped databases
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| API key | `api_key_ref: env:METABASE_API_KEY` |
|
||||
|
||||
Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys**.
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Semantic sources generated from SQL queries in questions
|
||||
- Knowledge pages for dashboards (purpose, key metrics, relationships)
|
||||
- Work units per dashboard and per question
|
||||
|
||||
### Warehouse mapping
|
||||
|
||||
Metabase databases must be mapped to KTX connections so ingested context links to the correct warehouse:
|
||||
|
||||
```yaml
|
||||
mappings:
|
||||
databaseMappings:
|
||||
"<metabase_db_id>": "<ktx_connection_id>"
|
||||
syncEnabled:
|
||||
"<metabase_db_id>": true
|
||||
syncMode: ONLY # ONLY = restrict to mapped DBs
|
||||
```
|
||||
|
||||
Find Metabase database IDs in **Admin > Databases** — the ID is in the URL when editing a database.
|
||||
|
||||
---
|
||||
|
||||
## Looker
|
||||
|
||||
Ingests explores, looks, and dashboards from a Looker instance via the Looker API. Maps Looker connections to your KTX warehouse connections.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Explore definitions and field metadata
|
||||
- Dashboard and look configurations
|
||||
- Query patterns and usage signals
|
||||
- Looker folder structure for organization context
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-looker:
|
||||
driver: looker
|
||||
base_url: https://looker.company.com
|
||||
client_id: your-looker-client-id
|
||||
client_secret_ref: env:LOOKER_CLIENT_SECRET
|
||||
mappings:
|
||||
connectionMappings:
|
||||
postgres_connection: postgres-main # Looker conn → KTX conn
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| OAuth client credentials | `client_id` + `client_secret_ref: env:LOOKER_CLIENT_SECRET` |
|
||||
|
||||
Generate API credentials in Looker: **Admin > Users > Edit > API Keys**.
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Semantic sources from explore field definitions
|
||||
- Knowledge pages for dashboards (purpose, audience, key metrics)
|
||||
- Triage signals for automated content classification
|
||||
- Work units per explore and per dashboard
|
||||
|
||||
### Warehouse mapping
|
||||
|
||||
Map Looker connection names to KTX connections so explores link to the correct warehouse:
|
||||
|
||||
```yaml
|
||||
mappings:
|
||||
connectionMappings:
|
||||
"<looker_connection_name>": "<ktx_connection_id>"
|
||||
```
|
||||
|
||||
Find Looker connection names in **Admin > Database > Connections**.
|
||||
|
||||
---
|
||||
|
||||
## Notion
|
||||
|
||||
Ingests pages and databases from a Notion workspace as knowledge pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Knowledge pages synthesized from Notion content
|
||||
- Page hierarchy and relationships
|
||||
- Database schemas (when Notion databases describe data sources)
|
||||
- Semantic clustering for organized ingestion
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-notion:
|
||||
driver: notion
|
||||
auth_token_ref: env:NOTION_TOKEN
|
||||
crawl_mode: selected_roots
|
||||
root_page_ids:
|
||||
- "abc123def456..."
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For crawling all accessible pages:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-notion:
|
||||
driver: notion
|
||||
auth_token_ref: env:NOTION_TOKEN
|
||||
crawl_mode: all_accessible
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Internal integration token | `auth_token_ref: env:NOTION_TOKEN` |
|
||||
|
||||
Create an integration at [notion.so/my-integrations](https://www.notion.so/my-integrations), then share target pages with the integration.
|
||||
|
||||
### Configuration options
|
||||
|
||||
| Field | Description | Default |
|
||||
|-------|-------------|---------|
|
||||
| `crawl_mode` | `all_accessible` or `selected_roots` | — |
|
||||
| `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
|
||||
| `root_database_ids` | Database IDs to include | `[]` |
|
||||
| `max_pages_per_run` | Pages processed per sync | `1000` |
|
||||
| `max_knowledge_creates_per_run` | New pages created per sync | `5` |
|
||||
| `max_knowledge_updates_per_run` | Pages updated per sync | `20` |
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Knowledge pages synthesized from Notion content (not raw copies)
|
||||
- Domain context extracted and organized by topic
|
||||
- Triage signals for classifying page relevance
|
||||
- Work units clustered by semantic similarity for efficient processing
|
||||
|
||||
### Notes
|
||||
|
||||
- Notion is knowledge-only — it does not produce semantic layer sources
|
||||
- Rate limits apply; large workspaces may require multiple ingestion runs
|
||||
- `last_successful_cursor` is auto-managed for incremental sync
|
||||
5
docs-site/content/docs/integrations/meta.json
Normal file
5
docs-site/content/docs/integrations/meta.json
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
{
|
||||
"title": "Integrations",
|
||||
"defaultOpen": true,
|
||||
"pages": ["primary-sources", "context-sources", "agent-clients"]
|
||||
}
|
||||
469
docs-site/content/docs/integrations/primary-sources.mdx
Normal file
469
docs-site/content/docs/integrations/primary-sources.mdx
Normal file
|
|
@ -0,0 +1,469 @@
|
|||
---
|
||||
title: Primary Sources
|
||||
description: Connect KTX to PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, or SQLite.
|
||||
---
|
||||
|
||||
KTX connects to your data warehouse or database to scan schemas, discover relationships, and execute semantic layer queries. Each connection is defined in `ktx.yaml` under the `connections` key.
|
||||
|
||||
All connectors share these conventions:
|
||||
|
||||
- Sensitive values support `env:VAR_NAME` (read from environment) and `file:/path/to/secret` (read from file) references
|
||||
- Connections are read-only — KTX never writes to your database
|
||||
- Schema scanning discovers tables, columns, types, and constraints automatically
|
||||
|
||||
## PostgreSQL
|
||||
|
||||
The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and historic SQL via `pg_stat_statements`.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-postgres:
|
||||
driver: postgres
|
||||
url: postgresql://user:password@host:5432/database
|
||||
schema: public
|
||||
readonly: true
|
||||
```
|
||||
|
||||
Or with individual fields:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-postgres:
|
||||
driver: postgres
|
||||
host: localhost
|
||||
port: 5432
|
||||
database: analytics
|
||||
username: ktx_reader
|
||||
password: env:PG_PASSWORD
|
||||
schemas:
|
||||
- public
|
||||
- analytics
|
||||
ssl: true
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Password | `password: env:PG_PASSWORD` or `password: file:/path/to/secret` |
|
||||
| Connection URL | `url: env:DATABASE_URL` |
|
||||
| SSL | `ssl: true`, optionally `rejectUnauthorized: false` for self-signed certs |
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `pg_catalog` |
|
||||
| Primary keys | Yes | Via `information_schema.table_constraints` |
|
||||
| Foreign keys | Yes | Full constraint detection |
|
||||
| Row count estimates | Yes | Via `pg_class.reltuples` |
|
||||
| Column statistics | Yes | Requires `pg_read_all_stats` role |
|
||||
| Historic SQL | Yes | Via `pg_stat_statements` extension |
|
||||
| Table sampling | Yes | `TABLESAMPLE SYSTEM` |
|
||||
|
||||
### Historic SQL
|
||||
|
||||
PostgreSQL Historic SQL mines real query patterns from `pg_stat_statements`. This is the most mature local Historic SQL path and helps KTX understand how your team actually queries the data.
|
||||
|
||||
**Requirements:**
|
||||
- `pg_stat_statements` extension enabled
|
||||
- `pg_read_all_stats` role granted to the KTX user
|
||||
|
||||
**Config options:**
|
||||
|
||||
```yaml
|
||||
historicSql:
|
||||
minCalls: 5 # Minimum call count to include a query template
|
||||
maxTemplatesPerRun: 5000
|
||||
```
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- SQL generation uses `LIMIT/OFFSET` pagination
|
||||
- Named parameters converted to positional (`$1`, `$2`, ...)
|
||||
- Supports `COUNT(*) FILTER (WHERE ...)` for null analysis
|
||||
- Full support for PostgreSQL types: `uuid`, `jsonb`, `timestamptz`, `numeric`, `text[]`, etc.
|
||||
|
||||
---
|
||||
|
||||
## Snowflake
|
||||
|
||||
Connects via the Snowflake SDK. Supports multi-schema scanning, RSA key authentication, and Historic SQL configuration for Snowflake query history.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-snowflake:
|
||||
driver: snowflake
|
||||
account: xy12345
|
||||
warehouse: ANALYTICS_WH
|
||||
database: PROD
|
||||
schema_name: PUBLIC
|
||||
username: KTX_SERVICE
|
||||
password: env:SNOWFLAKE_PASSWORD
|
||||
role: ANALYST
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For multiple schemas:
|
||||
|
||||
```yaml
|
||||
schema_names:
|
||||
- PUBLIC
|
||||
- ANALYTICS
|
||||
- STAGING
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Password | `password: env:SNOWFLAKE_PASSWORD` |
|
||||
| RSA key pair | `authMethod: rsa`, `privateKey: file:~/.ssh/snowflake_key.pem`, optional `passphrase` |
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
|
||||
| Primary keys | Yes | Via table constraints |
|
||||
| Foreign keys | No | Not available in Snowflake |
|
||||
| Row count estimates | Yes | From `INFORMATION_SCHEMA.TABLES.ROW_COUNT` |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | Configurable | Query-history settings can be stored; local CLI Historic SQL ingest currently uses the Postgres path |
|
||||
| Table sampling | Yes | — |
|
||||
|
||||
### Historic SQL
|
||||
|
||||
Snowflake Historic SQL settings describe how query history should be sampled when that runtime path is available.
|
||||
|
||||
```yaml
|
||||
historicSql:
|
||||
windowDays: 90
|
||||
redactionPatterns: []
|
||||
serviceAccountUserPatterns: []
|
||||
```
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- All identifiers are uppercase by default (case-insensitive matching)
|
||||
- Connection context set per query (`USE ROLE`, `USE WAREHOUSE`, `USE DATABASE`, `USE SCHEMA`)
|
||||
- Parameter binding uses positional `?` placeholders
|
||||
- Date values normalized to ISO 8601 strings
|
||||
|
||||
---
|
||||
|
||||
## BigQuery
|
||||
|
||||
Authenticates via GCP service account credentials. Supports multi-dataset scanning and Historic SQL configuration for `INFORMATION_SCHEMA.JOBS_BY_PROJECT`.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-bigquery:
|
||||
driver: bigquery
|
||||
credentials_json: file:~/.config/gcloud/bq-service-account.json
|
||||
dataset_id: analytics
|
||||
location: US
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For multiple datasets:
|
||||
|
||||
```yaml
|
||||
dataset_ids:
|
||||
- analytics
|
||||
- marketing
|
||||
- finance
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Service account JSON | `credentials_json: file:/path/to/key.json` |
|
||||
| Environment variable | `credentials_json: env:GCP_CREDENTIALS_JSON` |
|
||||
|
||||
The project ID is extracted automatically from the service account JSON file.
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Including materialized views and external tables |
|
||||
| Primary keys | No | — |
|
||||
| Foreign keys | No | Not available in BigQuery |
|
||||
| Row count estimates | Yes | From table metadata |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | Configurable | Query-history settings can be stored; local CLI Historic SQL ingest currently uses the Postgres path |
|
||||
| Table sampling | Yes | — |
|
||||
|
||||
### Historic SQL
|
||||
|
||||
BigQuery Historic SQL settings describe how `INFORMATION_SCHEMA.JOBS_BY_PROJECT` should be sampled when that runtime path is available.
|
||||
|
||||
```yaml
|
||||
historicSql:
|
||||
windowDays: 90
|
||||
redactionPatterns: []
|
||||
serviceAccountUserPatterns: []
|
||||
```
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Parameter binding uses named `@param` syntax
|
||||
- Arrays flattened to comma-separated strings in results
|
||||
- Location specified at query execution time
|
||||
- Supports `maxBytesBilled` and `jobTimeoutMs` limits
|
||||
|
||||
---
|
||||
|
||||
## ClickHouse
|
||||
|
||||
Connects over HTTP (port 8123) or HTTPS (port 8443). Supports the ClickHouse native type system including `Nullable`, `LowCardinality`, and `Array` wrappers.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-clickhouse:
|
||||
driver: clickhouse
|
||||
url: http://localhost:8123/analytics
|
||||
readonly: true
|
||||
```
|
||||
|
||||
Or with individual fields:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-clickhouse:
|
||||
driver: clickhouse
|
||||
host: clickhouse.internal
|
||||
port: 8123
|
||||
database: analytics
|
||||
username: default
|
||||
password: env:CH_PASSWORD
|
||||
ssl: false
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Basic auth | `username` + `password` (HTTP basic auth) |
|
||||
| No auth | Default user `default` with no password |
|
||||
| HTTPS | Set `ssl: true` (uses port 8443 by default) |
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `system.tables`, engine-based detection |
|
||||
| Primary keys | Yes | Via `system.columns` |
|
||||
| Foreign keys | No | Not a ClickHouse concept |
|
||||
| Row count estimates | Yes | Via `system.parts` aggregation |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | No | — |
|
||||
| Table sampling | Yes | — |
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Parameter binding uses `{param:Type}` syntax (e.g., `{database:String}`)
|
||||
- Detects views vs. tables by engine name (`View`, `MaterializedView`)
|
||||
- Handles `Nullable(T)` and `LowCardinality(Nullable(T))` type wrappers
|
||||
- Dictionary tables are excluded from scanning
|
||||
- Results returned in JSONCompact or JSONEachRow format
|
||||
|
||||
---
|
||||
|
||||
## MySQL
|
||||
|
||||
Standard MySQL/MariaDB connector with full foreign key support and schema introspection.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-mysql:
|
||||
driver: mysql
|
||||
url: mysql://user:password@host:3306/database
|
||||
readonly: true
|
||||
```
|
||||
|
||||
Or with individual fields:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-mysql:
|
||||
driver: mysql
|
||||
host: mysql.internal
|
||||
port: 3306
|
||||
database: analytics
|
||||
username: ktx_reader
|
||||
password: env:MYSQL_PASSWORD
|
||||
ssl: true
|
||||
readonly: true
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| Password | `password: env:MYSQL_PASSWORD` or `password: file:/path/to/secret` |
|
||||
| SSL | `ssl: true` or `ssl: { rejectUnauthorized: false }` |
|
||||
| URL parameters | `?ssl=true` or `?sslmode=required` in connection URL |
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
|
||||
| Primary keys | Yes | Via `KEY_COLUMN_USAGE` |
|
||||
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
|
||||
| Row count estimates | Yes | From `TABLE_ROWS` (InnoDB estimate) |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | No | — |
|
||||
| Table sampling | Yes | Uses `RAND()` filter |
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Parameter binding uses positional `?` placeholders
|
||||
- Uses `LIMIT X OFFSET Y` for pagination
|
||||
- Single database per connection (no multi-schema)
|
||||
- Supports 20+ MySQL types including `enum`, `json`, `datetime`, `decimal`
|
||||
- Table comments extracted with InnoDB metadata prefix stripping
|
||||
|
||||
---
|
||||
|
||||
## SQL Server
|
||||
|
||||
Connects to Microsoft SQL Server and Azure SQL. Supports multi-schema scanning with `dbo` as the default schema.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-sqlserver:
|
||||
driver: sqlserver
|
||||
url: mssql://user:password@host:1433/database?trustServerCertificate=true
|
||||
readonly: true
|
||||
```
|
||||
|
||||
Or with individual fields:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-sqlserver:
|
||||
driver: sqlserver
|
||||
host: sql.internal
|
||||
port: 1433
|
||||
database: Analytics
|
||||
username: ktx_reader
|
||||
password: env:MSSQL_PASSWORD
|
||||
schema: dbo
|
||||
trustServerCertificate: true
|
||||
readonly: true
|
||||
```
|
||||
|
||||
For multiple schemas:
|
||||
|
||||
```yaml
|
||||
schemas:
|
||||
- dbo
|
||||
- analytics
|
||||
- staging
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| SQL Server auth | `username` + `password` |
|
||||
| Encrypted connection | Always enabled, `trustServerCertificate: true` for self-signed |
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `INFORMATION_SCHEMA.TABLES` |
|
||||
| Primary keys | Yes | Via `TABLE_CONSTRAINTS` and `KEY_COLUMN_USAGE` |
|
||||
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
|
||||
| Row count estimates | Yes | Via `sys.dm_db_partition_stats` |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | No | — |
|
||||
| Table sampling | Yes | — |
|
||||
| Nested analysis | No | — |
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Parameter binding uses `@paramName` syntax
|
||||
- Row limiting uses `SELECT TOP N * FROM (query) AS ktx_query_result`
|
||||
- Encryption is always required; certificate validation is optional
|
||||
- Multi-schema support with per-schema isolation
|
||||
|
||||
---
|
||||
|
||||
## SQLite
|
||||
|
||||
File-based connector using `better-sqlite3`. Ideal for local development, embedded analytics, or testing.
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
my-sqlite:
|
||||
driver: sqlite
|
||||
path: ./data/warehouse.sqlite
|
||||
readonly: true
|
||||
```
|
||||
|
||||
Path supports multiple formats:
|
||||
|
||||
```yaml
|
||||
# Relative path (resolved against project directory)
|
||||
path: ./warehouse.sqlite
|
||||
|
||||
# Absolute path
|
||||
path: /var/data/analytics.db
|
||||
|
||||
# Home directory expansion
|
||||
path: ~/data/warehouse.sqlite
|
||||
|
||||
# Environment variable
|
||||
path: env:SQLITE_DB_PATH
|
||||
|
||||
# URL format
|
||||
url: sqlite:///path/to/db.sqlite
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
No authentication required — SQLite is file-based. The file must be readable by the process running KTX.
|
||||
|
||||
### Features
|
||||
|
||||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Via `sqlite_master` |
|
||||
| Primary keys | Yes | Via `PRAGMA table_info()` |
|
||||
| Foreign keys | Yes | Via `PRAGMA foreign_key_list()` (requires `PRAGMA foreign_keys = ON`) |
|
||||
| Row count estimates | Yes | Exact count via `SELECT COUNT(*)` |
|
||||
| Column statistics | No | — |
|
||||
| Historic SQL | No | — |
|
||||
| Table sampling | Yes | — |
|
||||
| Nested analysis | No | — |
|
||||
|
||||
### Dialect notes
|
||||
|
||||
- Synchronous query execution (no connection pooling)
|
||||
- Parameter binding uses `:paramName` syntax
|
||||
- Uses `LIMIT X OFFSET Y` for pagination
|
||||
- SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB`
|
||||
- Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON`
|
||||
- In-memory databases supported with `path: ":memory:"` (for testing)
|
||||
Loading…
Add table
Add a link
Reference in a new issue