chore: move docs site workspace

2026-07-01 08:59:39 +02:00 · 2026-05-11 16:53:42 +02:00 · 2026-05-11 16:53:42 +02:00 · a46563bb01
commit a46563bb01
parent 0ae9b6effd
52 changed files with 3 additions and 3 deletions
--- a/docs-site/content/docs/guides/building-context.mdx
+++ b/docs-site/content/docs/guides/building-context.mdx
@ -0,0 +1,241 @@
+---
+title: Building Context
+description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.
+---
+
+Building context is a two-step process. First, you **scan** your database to discover its structure — tables, columns, types, constraints, and relationships. Then you **ingest** from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.
+
+## Scanning
+
+Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.
+
+### Running a scan
+
+```bash
+ktx dev scan <connection-id>
+```
+
+This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
+
+| Mode | What it does |
+|------|-------------|
+| `structural` | Tables, columns, types, constraints, row counts (default) |
+| `enriched` | Structural scan plus LLM-generated column descriptions |
+| `relationships` | Structural scan plus foreign key relationship detection |
+
+```bash
+# Scan with relationship detection
+ktx dev scan my-postgres --mode relationships
+
+# Preview without writing results
+ktx dev scan my-postgres --dry-run
+```
+
+### Checking scan status
+
+Every scan produces a run ID. Use it to check progress or review results:
+
+```bash
+# Check status of a scan run
+ktx dev scan status <run-id>
+
+# Print the full scan report
+ktx dev scan report <run-id>
+
+# Get the report as JSON for scripting
+ktx dev scan report <run-id> --json
+```
+
+### Relationship detection
+
+Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
+
+| Score range | Status | Meaning |
+|-------------|--------|---------|
+| &ge; 0.85 | `accepted` | High confidence — applied automatically |
+| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
+| &lt; 0.55 | `rejected` | Low confidence — not applied |
+
+After a relationship scan, review the candidates:
+
+```bash
+# Show candidates pending review (default)
+ktx dev scan relationships <run-id>
+
+# Show all candidates regardless of status
+ktx dev scan relationships <run-id> --status all
+
+# Accept a specific candidate
+ktx dev scan relationships <run-id> --accept <candidate-id>
+
+# Reject a candidate with a note
+ktx dev scan relationships <run-id> --reject <candidate-id> --note "These columns share a name but are unrelated"
+```
+
+Once you've reviewed candidates, apply the accepted ones as joins in your semantic layer:
+
+```bash
+# Apply all accepted relationships
+ktx dev scan relationship-apply <run-id> --all-accepted
+
+# Preview what would be applied
+ktx dev scan relationship-apply <run-id> --all-accepted --dry-run
+
+# Apply a specific candidate
+ktx dev scan relationship-apply <run-id> --candidate <candidate-id>
+```
+
+### Calibrating thresholds
+
+As you review more relationships, KTX can evaluate whether the default thresholds (0.85 accept, 0.55 review) are optimal for your schema:
+
+```bash
+# See how your feedback aligns with current thresholds
+ktx dev scan relationship-calibration --connection my-postgres
+
+# Get threshold recommendations (needs 20+ labels, 5+ accepted, 5+ rejected)
+ktx dev scan relationship-thresholds --connection my-postgres
+
+# Export your review decisions as calibration labels
+ktx dev scan relationship-feedback --connection my-postgres
+```
+
+## Ingestion
+
+Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and knowledge pages.
+
+### How it works
+
+Each ingest run follows this flow:
+
+1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
+2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
+3. **Semantic sources** (YAML) and **knowledge pages** (Markdown) are written to your project directory
+
+### Running an ingest
+
+```bash
+# Ingest one configured context source
+ktx ingest my-dbt-source
+
+# Ingest every configured context source
+ktx ingest --all
+```
+
+The public `ktx ingest` command uses the source configuration in `ktx.yaml`, including the source `driver` and any adapter-specific paths or credentials.
+
+For adapter-level debugging, use the low-level `ktx dev ingest run` command:
+
+```bash
+ktx dev ingest run --connection-id my-dbt-source --adapter dbt
+```
+
+Useful low-level flags:
+
+| Flag | Description |
+|------|-------------|
+| `--source-dir <path>` | Directory containing source files (e.g., your dbt project) |
+| `--viz` | Render the memory-flow TUI for real-time progress |
+| `--json` | Output as JSON |
+| `--plain` | Plain text output |
+
+### Watching progress
+
+```bash
+# Check status of the latest ingest
+ktx ingest status
+
+# Check a specific run
+ktx ingest status <run-id>
+
+# Open the visual ingest report (TUI)
+ktx ingest watch
+
+# Replay a past ingest run
+ktx dev ingest replay <run-id>
+```
+
+The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.
+
+### Available adapters
+
+| Adapter | Source | What gets ingested |
+|---------|--------|--------------------|
+| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
+| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
+| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
+| `looker` | Looker API | Explores, looks, dashboard metadata |
+| `metabase` | Metabase API | Questions, dashboards, table metadata |
+| `notion` | Notion API | Database pages, knowledge articles |
+| `historic-sql` | Query history | Frequent queries, usage patterns, runtime stats |
+| `live-database` | Direct DB connection | Live schema introspection |
+
+See [Context Sources](/docs/integrations/context-sources) for adapter-specific setup and auth configuration.
+
+### What gets generated
+
+A typical dbt ingest produces semantic sources and knowledge pages in your project:
+
+**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
+
+```yaml title="semantic-layer/my-postgres/orders.yaml"
+name: orders
+table: public.orders
+grain:
+  - order_id
+columns:
+  - name: order_id
+    type: string
+    description: Unique order identifier
+  - name: customer_id
+    type: string
+    description: Foreign key to customers table
+  - name: order_date
+    type: time
+    role: time
+    description: Date the order was placed
+  - name: total_amount
+    type: number
+    description: Total order value in USD
+measures:
+  - name: total_revenue
+    expr: SUM(total_amount)
+    description: Sum of all order values
+  - name: order_count
+    expr: COUNT(DISTINCT order_id)
+    description: Number of distinct orders
+joins:
+  - to: customers
+    on: orders.customer_id = customers.customer_id
+    relationship: many_to_one
+```
+
+**Knowledge page** (`knowledge/global/order-status-definitions.md`):
+
+```markdown
+---
+summary: Business definitions for order status values
+tags: [orders, definitions]
+sl_refs: [orders]
+---
+
+## Order Statuses
+
+- **pending**: Order placed but not yet processed
+- **confirmed**: Payment received, awaiting fulfillment
+- **shipped**: Order dispatched to carrier
+- **delivered**: Order received by customer
+- **cancelled**: Order cancelled before shipment
+
+Orders in "pending" status for more than 48 hours are flagged for review.
+```
+
+### Deterministic replay
+
+Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:
+
+```bash
+ktx dev ingest replay <run-id> --viz
+```
+
+This opens the same TUI view as the original run, letting you step through the agent's reasoning.
--- a/docs-site/content/docs/guides/meta.json
+++ b/docs-site/content/docs/guides/meta.json
@ -0,0 +1,5 @@
+{
+  "title": "Guides",
+  "defaultOpen": true,
+  "pages": ["building-context", "writing-context", "serving-agents"]
+}
--- a/docs-site/content/docs/guides/serving-agents.mdx
+++ b/docs-site/content/docs/guides/serving-agents.mdx
@ -0,0 +1,207 @@
+---
+title: Serving Agents
+description: Expose your context to Claude Code, Cursor, Codex, and other coding agents.
+---
+
+Once you've built and refined your context, the final step is exposing it to coding agents. KTX provides two channels: an **MCP server** for persistent integration with tools like Claude Code and Cursor, and **CLI commands** for direct terminal access.
+
+## MCP Server
+
+The MCP (Model Context Protocol) server gives agents structured access to your entire context layer — semantic sources, knowledge pages, scans, and ingestion — through a standard tool-calling interface.
+
+### Starting the server
+
+```bash
+ktx serve --mcp stdio
+```
+
+This starts an MCP server on stdio, which is how Claude Code, Cursor, and other MCP-compatible tools communicate with KTX. You typically don't run this manually — your agent's configuration handles it.
+
+### Configuration options
+
+| Flag | Description | Default |
+|------|-------------|---------|
+| `--mcp <mode>` | MCP transport mode (currently `stdio`) | Required |
+| `--user-id <id>` | User identifier for knowledge scoping | `local` |
+| `--semantic-compute` | Enable semantic layer planning and query execution | `false` |
+| `--semantic-compute-url <url>` | URL for the semantic compute daemon | &mdash; |
+| `--database-introspection-url <url>` | Daemon URL for live database access | &mdash; |
+| `--execute-queries` | Allow agents to execute SQL queries | `false` |
+| `--memory-capture` | Enable memory capture from conversations | `false` |
+| `--memory-model <model>` | LLM model for memory capture | &mdash; |
+
+### Available tools
+
+When an agent connects via MCP, it can call these tools:
+
+**Connections**
+
+| Tool | Description |
+|------|-------------|
+| `connection_list` | List configured data connections |
+| `connection_test` | Test a connection through the scan connector |
+
+**Semantic Layer**
+
+| Tool | Description |
+|------|-------------|
+| `sl_list_sources` | List sources, optionally filtered by connection or search query |
+| `sl_read_source` | Read a source YAML by connection and name |
+| `sl_write_source` | Create, replace, or delete a source |
+| `sl_validate` | Validate sources against the database schema |
+| `sl_query` | Execute a semantic query — returns rows, SQL, and query plan |
+
+**Knowledge**
+
+| Tool | Description |
+|------|-------------|
+| `knowledge_search` | Search knowledge pages by query, returns ranked summaries |
+| `knowledge_read` | Read a knowledge page by key |
+| `knowledge_write` | Create or replace a knowledge page |
+
+**Scanning**
+
+| Tool | Description |
+|------|-------------|
+| `scan_trigger` | Run a structural, enriched, or relationship scan |
+| `scan_status` | Check the status of a running scan |
+| `scan_report` | Read a completed scan report |
+| `scan_list_artifacts` | List files produced by a scan run |
+| `scan_read_artifact` | Read a scan artifact by path |
+
+**Ingestion**
+
+| Tool | Description |
+|------|-------------|
+| `ingest_trigger` | Trigger an ingest run for an adapter and connection |
+| `ingest_status` | Check ingest progress, including diff and work-unit summaries |
+| `ingest_report` | Read a stored ingest report |
+| `ingest_replay` | Read the memory-flow replay for a past ingest |
+
+**Memory**
+
+| Tool | Description |
+|------|-------------|
+| `memory_capture` | Capture knowledge and semantic updates from a conversation |
+| `memory_capture_status` | Check the status of a memory capture run |
+
+### How agents use these tools
+
+A typical agent interaction flows like this:
+
+1. Agent calls `connection_list` to see available databases
+2. Agent calls `sl_list_sources` to discover what semantic sources exist
+3. Agent calls `knowledge_search` to find business context relevant to the user's question
+4. Agent calls `sl_query` with measures, dimensions, and filters to get data
+5. Agent presents results with the business context it found
+
+Agents should use the semantic layer for analytics questions because it enforces correct joins, grain-aware aggregation, and consistent metric definitions. If SQL execution is enabled, KTX only allows read-only SQL with row limits.
+
+## CLI Commands
+
+For agents that work through the terminal rather than MCP, KTX provides a set of machine-readable commands under `ktx agent`. These return JSON output designed for programmatic consumption.
+
+### Available commands
+
+```bash
+# List available tools and their descriptions
+ktx agent tools --json
+
+# Get project context for planning
+ktx agent context --json
+```
+
+**Semantic layer:**
+
+```bash
+# List sources
+ktx agent sl list --json
+ktx agent sl list --json --connection-id my-postgres
+
+# Read a source
+ktx agent sl read orders --json --connection-id my-postgres
+
+# Run a query from a JSON file
+ktx agent sl query --json \
+  --connection-id my-postgres \
+  --query-file query.json \
+  --execute \
+  --max-rows 100
+```
+
+**Knowledge:**
+
+```bash
+# Search knowledge pages
+ktx agent wiki search "revenue recognition" --json --limit 10
+
+# Read a specific page
+ktx agent wiki read order-status-definitions --json
+```
+
+**SQL execution:**
+
+```bash
+# Execute read-only SQL with a row limit
+ktx agent sql execute --json \
+  --connection-id my-postgres \
+  --sql-file query.sql \
+  --max-rows 500
+```
+
+### When to use CLI vs MCP
+
+| | MCP | CLI |
+|---|-----|-----|
+| **Best for** | Persistent agent integrations | Terminal-based workflows, scripting |
+| **Protocol** | Structured tool calls over stdio | Shell commands with JSON output |
+| **Used by** | Claude Code, Cursor, Codex | Shell scripts, custom agents, debugging |
+| **State** | Server runs continuously | Stateless per invocation |
+
+Most users should set up MCP — it gives agents richer context and a more natural interaction model. The CLI commands are useful for scripting, debugging, and agents that operate through terminal tools.
+
+## Setting Up Your Agent
+
+The fastest way to connect an agent is through the setup wizard:
+
+```bash
+ktx setup
+```
+
+The agents step auto-detects installed tools and generates the right configuration. For manual setup or per-tool details, see the [Agent Clients](/docs/integrations/agent-clients) integration page.
+
+### Quick manual setup
+
+**Claude Code** — add to `.claude/settings.json`:
+
+```json
+{
+  "mcpServers": {
+    "ktx": {
+      "command": "ktx",
+      "args": ["serve", "--mcp", "stdio", "--semantic-compute", "--execute-queries"],
+      "env": {
+        "KTX_PROJECT_DIR": "/path/to/your/ktx/project"
+      }
+    }
+  }
+}
+```
+
+**Cursor** — add to `.cursor/mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "ktx": {
+      "command": "ktx",
+      "args": ["serve", "--mcp", "stdio", "--semantic-compute", "--execute-queries"],
+      "env": {
+        "KTX_PROJECT_DIR": "/path/to/your/ktx/project"
+      }
+    }
+  }
+}
+```
+
+After configuration, the agent can immediately start calling KTX tools — listing sources, searching knowledge, and querying your semantic layer.
--- a/docs-site/content/docs/guides/writing-context.mdx
+++ b/docs-site/content/docs/guides/writing-context.mdx
@ -0,0 +1,273 @@
+---
+title: Writing Context
+description: Write and refine semantic sources and knowledge pages.
+---
+
+After building context through scanning and ingestion, you'll want to refine it — edit semantic sources to match your business logic, add knowledge pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
+
+## Semantic Sources
+
+Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer — the structured definitions that agents use to generate correct SQL.
+
+### Listing sources
+
+```bash
+# List all sources across connections
+ktx sl list
+
+# List sources for a specific connection
+ktx sl list --connection-id my-postgres
+
+# Output as JSON
+ktx sl list --json
+```
+
+### Reading a source
+
+```bash
+ktx sl read orders --connection-id my-postgres
+```
+
+This prints the full YAML definition for the source.
+
+### The source schema
+
+A semantic source defines a single queryable entity — usually a table or a SQL expression. Here's a fully annotated example:
+
+```yaml
+name: orders
+description: Customer orders with line-item totals
+table: public.orders          # or use `sql:` for a custom SQL expression
+grain:
+  - order_id                  # columns that uniquely identify a row
+
+columns:
+  - name: order_id
+    type: string              # string | number | time | boolean
+    description: Unique order identifier
+
+  - name: order_date
+    type: time
+    role: time                # marks this as the default time dimension
+    description: Date the order was placed
+
+  - name: status
+    type: string
+    visibility: public        # public (default) | internal | hidden
+    description: Current order status
+
+  - name: _etl_loaded_at
+    type: time
+    visibility: hidden        # hidden columns are excluded from agent queries
+    description: Internal ETL timestamp
+
+  - name: total_amount
+    type: number
+    description: Order total in USD
+
+measures:
+  - name: total_revenue
+    expr: SUM(total_amount)
+    description: Sum of all order values
+  - name: order_count
+    expr: COUNT(DISTINCT order_id)
+    description: Number of distinct orders
+  - name: avg_order_value
+    expr: AVG(total_amount)
+    description: Average order value
+  - name: high_value_revenue
+    expr: SUM(total_amount)
+    filter: total_amount > 100
+    description: Revenue from orders over $100
+
+segments:
+  - name: us_orders
+    expr: country = 'US'
+    description: Orders from US customers
+
+joins:
+  - to: customers
+    on: orders.customer_id = customers.customer_id
+    relationship: many_to_one   # many_to_one | one_to_many | one_to_one
+  - to: order_items
+    on: orders.order_id = order_items.order_id
+    relationship: one_to_many
+    alias: items                # optional alias for the joined source
+```
+
+Key fields:
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `name` | Yes | Source identifier (lowercase, underscores) |
+| `table` or `sql` | Yes | Database table or custom SQL expression (exactly one) |
+| `grain` | Yes | Columns that define row uniqueness |
+| `columns` | No | Column definitions with type, role, visibility |
+| `measures` | No | Aggregation expressions (SUM, COUNT, AVG, etc.) |
+| `joins` | No | Relationships to other sources |
+| `segments` | No | Named filter conditions |
+| `inherits_columns_from` | No | Inherit column metadata from a manifest entry |
+
+Column visibility controls what agents see:
+
+| Visibility | Behavior |
+|------------|----------|
+| `public` | Included in agent queries and listings (default) |
+| `internal` | Available for joins and measures but not shown to agents |
+| `hidden` | Excluded entirely — useful for ETL columns |
+
+### Writing a source
+
+```bash
+ktx sl write orders --connection-id my-postgres --yaml '
+name: orders
+table: public.orders
+grain: [order_id]
+columns:
+  - name: order_id
+    type: string
+  - name: total_amount
+    type: number
+measures:
+  - name: total_revenue
+    expr: SUM(total_amount)
+'
+```
+
+You can also edit source files directly — they live at `semantic-layer/<connection-id>/<source-name>.yaml` in your project directory.
+
+### Validating sources
+
+Validation checks a source definition against the actual database schema:
+
+```bash
+ktx sl validate orders --connection-id my-postgres
+```
+
+This catches mismatches — columns that don't exist in the table, type mismatches, invalid join targets — before an agent tries to use the source.
+
+### Querying
+
+The semantic layer compiles your measures and dimensions into SQL, optionally executing it against the database:
+
+```bash
+# Compile a query to SQL
+ktx sl query \
+  --connection-id my-postgres \
+  --measure total_revenue \
+  --measure order_count \
+  --dimension "order_date" \
+  --filter "status = 'completed'" \
+  --order-by order_date:desc \
+  --limit 10 \
+  --format sql
+```
+
+This outputs the compiled SQL without executing it. To run the query:
+
+```bash
+# Execute and return results
+ktx sl query \
+  --connection-id my-postgres \
+  --measure total_revenue \
+  --dimension "order_date" \
+  --execute \
+  --max-rows 100
+```
+
+Query flags:
+
+| Flag | Description |
+|------|-------------|
+| `--measure <name>` | Measure to query (repeatable, at least one required) |
+| `--dimension <name>` | Dimension to group by (repeatable) |
+| `--filter <expr>` | Filter expression (repeatable) |
+| `--segment <name>` | Named segment to apply (repeatable) |
+| `--order-by <field[:dir]>` | Sort field, optionally with `:asc` or `:desc` (repeatable) |
+| `--limit <n>` | Maximum rows in the compiled query |
+| `--format <mode>` | Output format: `json` (default) or `sql` |
+| `--execute` | Execute the query against the database |
+| `--max-rows <n>` | Maximum rows to return when executing |
+| `--include-empty` | Include empty/null rows in results |
+
+The query planner is grain-aware — it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining.
+
+## Knowledge Pages
+
+Knowledge pages are Markdown files that capture business context — definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
+
+### What they are
+
+When an agent asks "what counts as an active user?" or "why do revenue numbers differ between the dashboard and the SQL query?", the answer isn't in the schema. It's tribal knowledge that lives in Slack threads, Notion pages, or someone's head. Knowledge pages make that context searchable and available to agents.
+
+### Organization
+
+Knowledge pages are organized by scope:
+
+```
+knowledge/
+├── global/                          # Cross-cutting definitions
+│   ├── order-status-definitions.md
+│   ├── revenue-recognition-rules.md
+│   └── data-freshness-sla.md
+└── user/
+    └── local/                       # User-scoped context
+        ├── schema-conventions.md
+        └── known-data-issues.md
+```
+
+- **Global pages** apply across all connections — business definitions, metric standards, company terminology.
+- **User-scoped pages** are private to a user ID — personal notes, local gotchas, or context you do not want shared globally.
+
+### Writing pages
+
+```bash
+ktx wiki write order-status-definitions \
+  --scope global \
+  --summary "Business definitions for order status values" \
+  --content "## Order Statuses
+
+- **pending**: Order placed but not yet processed
+- **confirmed**: Payment received, awaiting fulfillment
+- **shipped**: Order dispatched to carrier
+- **delivered**: Order received by customer
+- **cancelled**: Order cancelled before shipment
+
+Orders in pending status for more than 48 hours are flagged for review." \
+  --tag orders \
+  --tag definitions \
+  --sl-ref orders
+```
+
+Write flags:
+
+| Flag | Description |
+|------|-------------|
+| `--scope <scope>` | `global` (default) or `user` |
+| `--summary <text>` | Short description for search results (required) |
+| `--content <text>` | Full Markdown content (required) |
+| `--tag <tag>` | Categorization tag (repeatable) |
+| `--ref <ref>` | Reference to external resources (repeatable) |
+| `--sl-ref <ref>` | Link to a semantic source (repeatable) |
+
+You can also create and edit knowledge pages directly as Markdown files in the `knowledge/` directory.
+
+### Listing pages
+
+```bash
+ktx wiki list
+```
+
+### Reading a page
+
+```bash
+ktx wiki read order-status-definitions
+```
+
+### Searching
+
+```bash
+ktx wiki search "revenue recognition"
+```
+
+Search uses both full-text matching and semantic similarity — it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question.