diff --git a/docs-site/content/docs/getting-started/quickstart.mdx b/docs-site/content/docs/getting-started/quickstart.mdx index 4f0c3a65..335aedfa 100644 --- a/docs-site/content/docs/getting-started/quickstart.mdx +++ b/docs-site/content/docs/getting-started/quickstart.mdx @@ -1,254 +1,286 @@ --- title: Quickstart -description: Set up KTX and build your first context in under 10 minutes. +description: Set up KTX, build local context, and connect your coding agent. --- -This guide walks you through `ktx setup` - an interactive wizard that configures your LLM provider, connects your database, optionally ingests from your existing tools, builds context, and installs agent integration. +This guide gets a local analytics project ready for KTX. You will install the +CLI, run the setup wizard, connect a database, build context, and install agent +rules that teach your coding assistant which KTX commands to run. -If you are a coding assistant trying to decide which KTX docs page to read, start with the [Agent Quickstart](/docs/ai-resources/agent-quickstart). This page is the human setup walkthrough. +If you are a coding assistant choosing a docs route, start with the +[Agent Quickstart](/docs/ai-resources/agent-quickstart). This page is the +human setup walkthrough. -## Workflow summary +## What setup does -Use this sequence when you are setting up KTX in an analytics project: +`ktx setup` is the main project workflow. It can create or resume `ktx.yaml`, +configure model and embedding providers, add database connections, add optional +context sources, build the first context artifacts, and install agent +integration. -1. `npm install -g @kaelio/ktx` - install the published KTX CLI from npm. -2. `ktx setup` - create or resume a KTX project. +When you run bare `ktx` in an interactive terminal outside a KTX project, the +CLI opens the same setup experience. Inside an existing project, `ktx setup` +resumes incomplete work or opens a menu for changing setup, connecting an +agent, checking status, or exploring a demo project. -The setup wizard is stateful. If it exits before completion, rerun `ktx setup` in the same project directory to resume from the first incomplete step. +## Install the CLI -## Install and run setup - -Install the published [`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) CLI: +Install the published `@kaelio/ktx` package: ```bash npm install -g @kaelio/ktx ``` -Then run the setup wizard: +Then run setup from the analytics project directory: ```bash ktx setup ``` -The local checkout flow is only for contributors working on KTX itself. See [Contributing](/docs/community/contributing) for that setup. +The local checkout workflow is only for KTX contributors. See +[Contributing](/docs/community/contributing) for that path. -The wizard walks through six steps. You can go back at any point, and if you exit early, rerunning `ktx setup` resumes where you left off. +## Step 1: Choose the project -## Step 1: Configure LLM +In an interactive terminal, setup can create a new KTX project or resume the +nearest existing project. The main project file is `ktx.yaml`. -KTX uses an Anthropic model to enrich schema descriptions, generate semantic sources during ingestion, and reconcile metadata from your tools. +For scripted setup, pass the project directory explicitly: -The wizard asks how to find your API key: - -``` -◆ How should KTX find your Anthropic API key? -│ ○ Use ANTHROPIC_API_KEY from the environment -│ ○ Paste a key and save it as a local secret file +```bash +ktx setup --project-dir ./analytics ``` -If you choose to paste a key, KTX saves it in `.ktx/secrets/anthropic-api-key` with local file permissions. Your `ktx.yaml` stores a `file:` reference, never the raw key. +If setup exits early, rerun `ktx setup` in the same directory. KTX tracks +completed setup steps and resumes from the remaining work. -Next, choose a model: +## Step 2: Configure the LLM -``` -◆ Which Anthropic model should KTX use? -│ ○ Claude Sonnet 4.6 (recommended) -│ ○ Claude Opus 4.6 -│ ○ Claude Haiku 4.5 -│ ○ Enter a model ID manually -``` +KTX uses a Claude model for ingest agents that turn schemas, SQL, BI metadata, +and documents into semantic-layer sources and wiki context. -KTX runs a health check to verify your key and model work before saving. +Setup supports two LLM provider paths: -## Step 2: Configure embeddings +| Provider | Use when | Credential model | +|----------|----------|------------------| +| Anthropic API | You have an Anthropic API key | `ANTHROPIC_API_KEY` or a local `file:` secret | +| Google Vertex AI for Anthropic Claude | Your organization runs Claude through Google Cloud | Application Default Credentials plus Vertex project and location | -KTX uses embeddings for semantic search over sources, wiki content, schema metadata, and relationship evidence. +For Anthropic API, setup can read the key from the environment or save a pasted +key to `.ktx/secrets/anthropic-api-key`. `ktx.yaml` stores an `env:` or `file:` +reference, not the raw key. -``` -◆ Which embedding option should KTX use? -│ ○ Local sentence-transformers embeddings -│ ○ OpenAI embeddings (recommended) -``` +For Vertex AI, setup uses Google Application Default Credentials. It can read +your active `gcloud` project, list visible projects, or accept explicit +`--vertex-project` and `--vertex-location` values. -**OpenAI embeddings** use `text-embedding-3-small` (1536 dimensions) and require an `OPENAI_API_KEY`. +Setup checks the selected model before saving. Anthropic API setup fetches live +Claude model choices when possible and falls back to bundled defaults if model +discovery is unavailable. -**Local embeddings** use `all-MiniLM-L6-v2` (384 dimensions) via the KTX managed Python runtime. No API key is needed. KTX can install and start the runtime during setup; to prepare it ahead of time, run: +## Step 3: Configure embeddings + +KTX uses embeddings for semantic search over semantic-layer sources, wiki +context, schema metadata, and relationship evidence. + +| Backend | Default model | Notes | +|---------|---------------|-------| +| OpenAI | `text-embedding-3-small` | Recommended for hosted embeddings. Requires an OpenAI API key. | +| Local sentence-transformers | `all-MiniLM-L6-v2` | Runs through the KTX-managed Python runtime. No hosted embedding key is required. | + +OpenAI setup reads `OPENAI_API_KEY` or saves a local secret file. Local +sentence-transformers setup can install and start the managed runtime during +setup. To prepare that runtime before setup, run: ```bash ktx dev runtime install --feature local-embeddings --yes ktx dev runtime start --feature local-embeddings ``` -## Step 3: Connect a database +## Step 4: Add a database -Select one or more databases for KTX to connect to. The wizard supports -SQLite, PostgreSQL, MySQL, ClickHouse, SQL Server, BigQuery, and Snowflake. +KTX needs at least one primary database connection before it can build database +context. The wizard supports SQLite, PostgreSQL, MySQL, ClickHouse, SQL Server, +BigQuery, and Snowflake. -For PostgreSQL, you can enter connection details field by field or paste a connection URL: +You can usually enter connection fields interactively or provide a URL. Secret +URLs can be stored as local files under `.ktx/secrets/` or referenced with +`env:NAME` in `ktx.yaml`. -``` -◆ How do you want to connect to PostgreSQL? -│ ○ Enter connection details (host, port, database, user) -│ ○ Paste a connection URL -``` +After saving a connection, setup tests it and builds fast schema context: -If your URL contains credentials, KTX saves it to `.ktx/secrets/` and writes a `file:` reference in `ktx.yaml`. You can also use `env:DATABASE_URL` to reference an environment variable. - -After connecting, KTX automatically runs a connection test and builds fast -schema context: - -``` -Testing postgres-warehouse +```text +Testing warehouse Connection test passed - Driver: PostgreSQL - Status: ok -Building schema context for postgres-warehouse +Building schema context for warehouse Running fast database ingest -Schema context complete for postgres-warehouse - Changes: 42 new tables - Database ready - postgres-warehouse - PostgreSQL - schema context complete + warehouse - PostgreSQL - schema context complete ``` -For PostgreSQL, Snowflake, and BigQuery, the wizard can enable query-history -ingest when the warehouse history feature is available. Query history is stored -under `connections..context.queryHistory` in `ktx.yaml`. +PostgreSQL, BigQuery, and Snowflake can also enable query-history ingest. Query +history helps KTX learn common query patterns, joins, service-account filters, +and warehouse-specific usage. -## Step 4: Add context sources +## Step 5: Add context sources -Context sources let KTX ingest metadata from your existing analytics tools. This step is optional - you can skip it and add sources later. +Context sources are optional, but they make the first context layer much richer. +Setup can add: -``` -◆ Which context sources should KTX ingest? -│ ◻ dbt -│ ◻ MetricFlow -│ ◻ Metabase -│ ◻ Looker -│ ◻ LookML -│ ◻ Notion -``` +| Source | Typical input | What KTX learns | +|--------|---------------|-----------------| +| dbt | Local project or Git repo | Models, columns, tests, descriptions, tags | +| MetricFlow | Local project or Git repo | Semantic models, metrics, dimensions, entities | +| LookML | Local files or Git repo | Views, explores, dimensions, measures, joins | +| Looker | API URL and credentials | Explores, looks, dashboards, model metadata | +| Metabase | API URL and key | Questions, dashboards, BI database mappings | +| Notion | Integration token and crawl settings | Business docs and knowledge pages | -For **dbt**, point KTX at a local path or git URL. KTX reads your `dbt_project.yml` and schema files to extract model metadata: +Setup maps BI and source metadata back to your primary warehouse connection so +generated context points at the right tables. -``` -◆ dbt source location -│ ○ Local path -│ ○ Git URL -``` +You can skip this step and add sources later by rerunning `ktx setup`. -For **Metabase** and **Looker**, you provide an API URL and credentials. KTX maps BI databases to your KTX primary source connections so it knows which warehouse tables the BI metadata refers to. +## Step 6: Build context -Context sources are saved to `ktx.yaml` and built during the next step. +The context build turns configured databases and sources into local artifacts +agents can read. It runs database ingest first, then source ingest and memory +updates. -## Step 5: Build context +Fast database ingest records deterministic schema grounding. Deep ingest adds +AI-enriched descriptions, embeddings, relationship evidence, and query-history +context when configured. -This is where KTX builds agent-ready context. It uses the database context -depth saved by setup and ingests metadata from any configured context sources. +When the build finishes, setup verifies that agent-ready context exists: -``` -◆ Build KTX context for agents? -│ ○ Build context now (recommended) -│ ○ Leave context unbuilt and exit setup -``` - -Fast database context builds deterministic schema grounding. Deep database -context also generates AI descriptions, embeddings, and relationship evidence -when those capabilities are configured. - -For a small database (under 50 tables), this can take a few minutes. Larger -warehouses can take longer. Context builds run in the foreground; press -Ctrl+C to stop the current run and rerun `ktx setup` or `ktx ingest` -when you are ready to try again. - -When the build completes, KTX verifies that agent-ready context was produced: - -``` +```text KTX context is ready for agents. Databases: - postgres-warehouse: deep context complete + warehouse: deep context complete Context sources: - dbt-main: memory update complete + dbt_main: memory update complete Verification: Agent context: ready Semantic search: ready ``` -## Step 6: Install agent integration +If a foreground build is interrupted, rerun `ktx setup` or build the same target +with `ktx ingest `. -The final step connects KTX to your coding agent. Choose how agents should access the project: +## Step 7: Install agent integration -``` -◆ How should agents use this KTX project? -│ ○ CLI tools and skills +The final setup step installs project-local rules for your coding assistant. +Supported targets are Claude Code, Codex, Cursor, OpenCode, and universal +`.agents`. + +You can also run this step later: + +```bash +ktx setup --agents --target codex ``` -Then select which agents to install for: +Claude Code and Codex also support global installs: -``` -◆ Which agent targets should KTX install? -│ ◻ Claude Code -│ ◻ Codex -│ ◻ Cursor -│ ◻ OpenCode -│ ◻ Custom agent (.agents) +```bash +ktx setup --agents --target codex --global ``` -**CLI mode** writes a skill file (e.g., `.claude/skills/ktx/SKILL.md`) that teaches the agent to call KTX commands directly. - -**Custom agent** uses the universal `.agents` target for agents that can read project-local skills. +Agent rules are CLI-based. They point agents at the KTX CLI path that created +the file, so agents do not need a separate `ktx` binary in `PATH`. If the CLI +path changes after reinstalling or moving a checkout, rerun `ktx setup --agents`. ## Generated files -KTX writes project state as plain files so agents can inspect and edit changes in git. +KTX writes plain files so people and agents can inspect changes in git. -| Path | Created by | Purpose | -|------|------------|---------| -| `ktx.yaml` | `ktx setup` | Main project configuration: connections, LLM settings, embeddings, and context sources | -| `.ktx/secrets/*` | `ktx setup` when file-backed secrets are selected | Local secret files referenced from `ktx.yaml`; do not commit these | -| `semantic-layer//*.yaml` | context build, ingestion, or direct file edits | Semantic source definitions agents use for SQL generation | -| `wiki/global/*.md` | ingestion, memory capture, or direct file edits | Shared business context and metric definitions | -| `wiki/user//*.md` | memory capture or direct file edits | User-scoped notes for one agent/user context | -| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling public `ktx` commands | +| Path | Purpose | +|------|---------| +| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and setup state | +| `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml`; do not commit these | +| `.ktx/setup/*` | Local setup and context-build state | +| `.ktx/agents/install-manifest.json` | Manifest used to manage installed agent files | +| `semantic-layer//*.yaml` | Semantic source definitions used for SQL generation | +| `wiki/global/*.md` | Shared business context and metric definitions | +| `wiki/user//*.md` | User-scoped notes and local context | +| `.claude/skills/ktx/SKILL.md` | Claude Code project skill | +| `.agents/skills/ktx/SKILL.md` | Codex or universal project skill | +| `.cursor/rules/ktx.mdc` | Cursor project rule | +| `.opencode/commands/ktx.md` | OpenCode project command | -## Verify it worked +## Verify setup -Check your project status: +Run: ```bash ktx status ``` -``` +Example output: + +```text KTX project: /home/user/analytics Project ready: yes LLM ready: yes (claude-sonnet-4-6) Embeddings ready: yes (text-embedding-3-small) -Databases configured: yes (postgres-warehouse) -Context sources configured: yes (dbt-main) +Databases configured: yes (warehouse) +Context sources configured: yes (dbt_main) KTX context built: yes -Agent integration ready: yes (claude-code:project) +Agent integration ready: yes (codex:project) ``` +Use JSON when an agent or script needs a structured readiness check: + +```bash +ktx status --json +``` + +## Scripted setup example + +Use non-interactive setup when creating repeatable fixtures or automation: + +```bash +ktx setup \ + --project-dir ./analytics \ + --no-input \ + --skip-llm \ + --skip-embeddings \ + --database postgres \ + --new-database-connection-id warehouse \ + --database-url env:DATABASE_URL \ + --database-schema public +``` + +Then build context: + +```bash +ktx ingest warehouse --fast +``` + +See [ktx setup](/docs/cli-reference/ktx-setup) for the full automation flag +surface. + ## Common errors -| Error or symptom | Likely cause | Recovery | -|------------------|--------------|----------| -| `ktx: command not found` | The KTX package is not installed globally, or the shell cannot find the global binary | Run `npm install -g @kaelio/ktx` and open a new shell | -| LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option | -| OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings | -| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime status`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup | -| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection | -| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup` and choose to build context now | -| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex` using the target you need | +| Symptom | Likely cause | Recovery | +|---------|--------------|----------| +| `ktx: command not found` | The global package is not installed or your shell cannot find it | Reinstall `@kaelio/ktx` and open a new shell | +| Setup resumes the wrong project | `KTX_PROJECT_DIR` or the nearest `ktx.yaml` points somewhere else | Pass `--project-dir ` | +| Anthropic health check fails | API key, model id, or access is invalid | Fix `ANTHROPIC_API_KEY` or rerun setup with a different key or model | +| Vertex AI health check fails | Vertex API, Claude access, project, location, or IAM permissions are missing | Check the project, location, Application Default Credentials, and Vertex AI permissions | +| OpenAI embeddings fail | `OPENAI_API_KEY` is missing or invalid | Export the key or choose local sentence-transformers embeddings | +| Local embeddings fail | Managed Python runtime cannot install or start | Run `ktx dev runtime status`, then install the local embeddings runtime | +| Database test fails | Credentials, network access, database, warehouse, or schema is wrong | Test the same values with the database's native client, then rerun setup | +| Context is not built | Setup saved configuration but skipped or interrupted the build | Run `ktx setup` or `ktx ingest --all` | +| Agent integration is incomplete | Setup skipped the agents step or installed a different target | Run `ktx setup --agents --target ` | ## Next steps -- **Build more context** - learn about [database ingest](/docs/guides/building-context), relationship detection, and source ingestion workflows in the Building Context guide. -- **Refine your semantic layer** - the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and wiki pages. -- **Understand the architecture** - read [The Context Layer](/docs/concepts/the-context-layer) to learn why a context layer is more than a semantic layer. -- **Connect more agents** - see the [Agent Clients](/docs/integrations/agent-clients) integration page for per-tool setup details. +- Build and refresh context with [Building Context](/docs/guides/building-context). +- Edit semantic sources and wiki pages with [Writing Context](/docs/guides/writing-context). +- Connect more tools with [Agent Clients](/docs/integrations/agent-clients). +- Read [The Context Layer](/docs/concepts/the-context-layer) to understand the architecture. diff --git a/docs-site/content/docs/guides/building-context.mdx b/docs-site/content/docs/guides/building-context.mdx index e18bc3cb..c21b7921 100644 --- a/docs-site/content/docs/guides/building-context.mdx +++ b/docs-site/content/docs/guides/building-context.mdx @@ -1,171 +1,195 @@ --- title: Building Context -description: Build database and source context from configured KTX connections. +description: Build and refresh KTX context from databases, source tools, query history, and text. --- -Building context reads your configured connections and writes local context that -agents can use. Database connections produce schema context, and source -connections such as dbt, Looker, Metabase, and Notion produce semantic sources -and wiki pages. +Building context turns configured connections into local semantic-layer sources +and wiki pages. Agents use those files to understand your schema, business +definitions, metric logic, joins, and known caveats before they write SQL. + +Use this guide after `ktx setup` has created `ktx.yaml` and at least one +database or context-source connection. + +## The build loop + +Most projects use this loop: + +1. Check readiness with `ktx status`. +2. Build one connection with `ktx ingest `, or build everything + with `ktx ingest --all`. +3. Search or inspect the generated files under `semantic-layer/` and `wiki/`. +4. Edit source YAML or Markdown when business logic needs refinement. +5. Validate and query representative sources before handing the context to an + agent. + +`ktx ingest --all` runs database connections first, then context-source +connections. That order lets dbt, BI, Notion, and text ingest attach context to +known warehouse tables. ## Database ingest -Database ingest connects to your warehouse and extracts structural metadata. -KTX stores the results locally so agents can understand your schema without -querying the database directly. - -### Running database ingest +Database ingest connects to a configured warehouse and records local schema +context. It gives agents table, column, type, constraint, and row-count +grounding without requiring them to inspect the database directly. ```bash -ktx ingest -``` - -This runs a fast schema ingest by default. You can choose the depth with public -flags: - -| Flag | What it does | -|------|-------------| -| `--fast` | Tables, columns, types, constraints, and row counts | -| `--deep` | Fast ingest plus AI-enriched database context | - -```bash -# Build one connection quickly -ktx ingest my-postgres --fast - -# Build AI-enriched database context -ktx ingest my-postgres --deep +# Build one configured database connection +ktx ingest warehouse # Build all configured connections ktx ingest --all ``` -### Checking results +Depth controls how much context KTX builds: -Every ingest prints a summary and writes local artifacts. Use `ktx status` -after ingest to review project readiness and follow-up setup work: +| Flag | Best for | What it does | +|------|----------|--------------| +| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic schema ingest with tables, columns, types, constraints, and row counts | +| `--deep` | Agent-ready context for real analysis | Fast ingest plus AI-enriched descriptions, embeddings, relationship evidence, and optional query history | + +Examples: ```bash -ktx status +ktx ingest warehouse --fast +ktx ingest warehouse --deep +ktx ingest --all --deep ``` -### Relationship detection +Deep ingest needs LLM and embedding readiness. If those providers are not +configured, run `ktx setup` or use `--fast`. -Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals - name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status: +## Query history -| Score range | Status | Meaning | -|-------------|--------|---------| -| ≥ 0.85 | `accepted` | High confidence - applied automatically | -| 0.55 – 0.84 | `review` | Plausible - needs human review | -| < 0.55 | `rejected` | Low confidence - not applied | +PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps +KTX learn common joins, filters, service-account patterns, redaction rules, and +usage-heavy query templates. -Deep database ingest can include relationship evidence where the connector can -provide it. Relationship review and calibration subcommands are not part of the -current public CLI surface. - -## Ingestion - -Ingestion pulls semantic context from your existing analytics tools - dbt projects, Looker models, Metabase questions, and more - and writes it into your KTX project as semantic sources and wiki pages. - -### How it works - -Each ingest run follows this flow: - -1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.) -2. An **LLM agent** reconciles the extracted metadata with your existing context - it merges intelligently rather than overwriting -3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory - -### Running an ingest +Enable it during setup, store it under `connections..context.queryHistory`, +or request it for one run: ```bash -ktx ingest my-dbt-source +ktx ingest warehouse --deep --query-history +ktx ingest warehouse --query-history-window-days 30 ``` -Useful output flags: +Use `--no-query-history` when you want to skip a stored query-history setting +for one run. + +## Relationship evidence + +Many databases do not declare all foreign keys. KTX can score relationship +candidates using signals such as name similarity, type compatibility, value +overlap, embedding similarity, uniqueness, null rate, and structural priors. + +The public CLI does not expose separate relationship review subcommands. +Relationship evidence is built as part of deep database ingest when the +connector and readiness checks support it. + +## Context-source ingest + +Context-source connections pull business metadata from tools your team already +uses. The current public `ktx ingest` command is connection-centric: pass one +configured connection id, or pass `--all`. + +```bash +# Build one source connection +ktx ingest dbt_main + +# Build every configured database and source connection +ktx ingest --all +``` + +Supported source types: + +| Driver | Typical source | Output | +|--------|----------------|--------| +| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata | +| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins | +| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins | +| `looker` | Looker API | Explores, looks, dashboards, and model metadata | +| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings | +| `notion` | Notion API | Wiki pages and business knowledge | + +Source ingest extracts metadata, reconciles it with existing local context, and +writes semantic-layer YAML plus wiki Markdown. It merges rather than blindly +overwriting local edits. + +## Text ingest + +Use `ktx ingest text` for notes, Markdown files, runbooks, Slack exports, or +other free-form knowledge that should become searchable KTX memory. + +```bash +# Capture a Markdown file +ktx ingest text docs/revenue-notes.md --connection-id warehouse + +# Capture one stdin item +printf "Refunds are excluded from net revenue." | ktx ingest text - + +# Capture direct text +ktx ingest text --text "ARR excludes one-time implementation fees." +``` + +Useful flags: | Flag | Description | |------|-------------| -| `--json` | Output as JSON | -| `--plain` | Plain text output | +| `--connection-id ` | Attach the captured memory to a KTX connection | +| `--user-id ` | Attribute capture to a user scope, default `local-cli` | +| `--json` | Print structured output | +| `--fail-fast` | Stop after the first failed text item | -Foreground context builds do not detach into background control sessions. If a -run is interrupted, rerun `ktx ingest ` or `ktx ingest --all`. +Text ingest is a good fit for small, high-signal documents. For system-specific +connectors such as Notion, dbt, or Metabase, prefer configured source ingest so +KTX can preserve source metadata. -### Supported context sources +## Output and artifacts -| Driver | Source | What gets ingested | -|--------|--------|--------------------| -| `dbt` | dbt project | Model definitions, column descriptions, tests, tags | -| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins | -| `lookml` | LookML files | Views, explores, dimensions, measures, joins | -| `looker` | Looker API | Explores, looks, dashboard metadata | -| `metabase` | Metabase API | Questions, dashboards, table metadata | -| `notion` | Notion API | Database pages, knowledge articles | +Every ingest run prints a summary. Use `--json` when an agent or script needs a +structured plan and per-target results. -Query history is a database connection facet. Enable it with -`connections..context.queryHistory` or pass `--query-history` for a current -run. See [Context Sources](/docs/integrations/context-sources) for -driver-specific setup and auth configuration. - -### What gets generated - -A typical dbt ingest produces semantic sources and wiki pages in your project: - -**Semantic source** (`semantic-layer/my-postgres/orders.yaml`): - -```yaml title="semantic-layer/my-postgres/orders.yaml" -name: orders -table: public.orders -grain: - - order_id -columns: - - name: order_id - type: string - description: Unique order identifier - - name: customer_id - type: string - description: Foreign key to customers table - - name: order_date - type: time - role: time - description: Date the order was placed - - name: total_amount - type: number - description: Total order value in USD -measures: - - name: total_revenue - expr: SUM(total_amount) - description: Sum of all order values - - name: order_count - expr: COUNT(DISTINCT order_id) - description: Number of distinct orders -joins: - - to: customers - on: orders.customer_id = customers.customer_id - relationship: many_to_one +```bash +ktx ingest --all --json ``` -**Wiki page** (`wiki/global/order-status-definitions.md`): +Typical generated files: -```markdown ---- -summary: Business definitions for order status values -tags: [orders, definitions] -sl_refs: [orders] ---- +| Path | Created by | Purpose | +|------|------------|---------| +| `semantic-layer//*.yaml` | Database and source ingest | Queryable semantic source definitions | +| `wiki/global/*.md` | Source, text, and memory ingest | Shared business definitions and notes | +| `wiki/user//*.md` | Text and memory ingest | User-scoped context | +| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup | -## Order Statuses +Ingest sessions also record transcripts with tool calls, LLM responses, and +write decisions. Inspect them when you need to debug why a source or wiki page +was written a certain way. -- **pending**: Order placed but not yet processed -- **confirmed**: Payment received, awaiting fulfillment -- **shipped**: Order dispatched to carrier -- **delivered**: Order received by customer -- **cancelled**: Order cancelled before shipment +## Example: first full refresh -Orders in "pending" status for more than 48 hours are flagged for review. +After interactive setup: + +```bash +ktx status +ktx ingest --all --deep +ktx status ``` -### Ingest transcripts +Then inspect what changed: -Every ingest session records a full transcript: tool calls, LLM responses, and -write decisions. Inspect the stored transcript files when you need to debug why -a source was written a certain way. +```bash +git status --short +ktx sl list --json +ktx wiki search "revenue" --json --limit 10 +``` + +## Common errors + +| Symptom | Likely cause | Recovery | +|---------|--------------|----------| +| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` | +| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` | +| Query history is unsupported | The selected database driver does not expose query history | Run schema ingest without query-history flags | +| No target selected | You omitted both a connection id and `--all` | Run `ktx ingest ` or `ktx ingest --all` | +| Source flags have no effect | Depth and query-history flags were supplied for a source connector | Use those flags only for database connections | +| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` | diff --git a/docs-site/content/docs/guides/serving-agents.mdx b/docs-site/content/docs/guides/serving-agents.mdx index 4a93ae43..192b1c7f 100644 --- a/docs-site/content/docs/guides/serving-agents.mdx +++ b/docs-site/content/docs/guides/serving-agents.mdx @@ -1,59 +1,167 @@ --- title: Serving Agents -description: Expose your context to Claude Code, Cursor, Codex, and other coding agents. +description: Expose KTX context to Claude Code, Codex, Cursor, OpenCode, and custom agents. --- -Once you've built and refined your context, expose it to coding agents through -the public KTX CLI. Claude Code, Cursor, Codex, OpenCode, and custom agent -workflows can call the same commands you use at a terminal. +KTX serves agents through the public CLI and project-local instruction files. +Agents do not need a separate server. They read the generated rules, call KTX +commands, inspect local context files, and use JSON output when they need +structured results. -## CLI Commands +## Recommended setup -KTX public commands support JSON output for the context reads that agents use -most often. Use `--project-dir` when the agent is not already running inside the -KTX project directory. - -### Available commands +Run the agent install step from a KTX project: + +```bash +ktx setup --agents +``` + +Or install a specific target: + +```bash +ktx setup --agents --target codex +``` + +Supported targets: + +| Target | Generated project file | +|--------|------------------------| +| Claude Code | `.claude/skills/ktx/SKILL.md` | +| Codex | `.agents/skills/ktx/SKILL.md` | +| Cursor | `.cursor/rules/ktx.mdc` | +| OpenCode | `.opencode/commands/ktx.md` | +| Universal `.agents` | `.agents/skills/ktx/SKILL.md` | + +Claude Code and Codex also support global installs: + +```bash +ktx setup --agents --target claude-code --global +ktx setup --agents --target codex --global +``` + +KTX records installed files in `.ktx/agents/install-manifest.json`. Rerun +`ktx setup --agents` after moving a checkout or reinstalling the CLI so the +generated instructions point at the current CLI path. + +## Agent command set + +All supported agent clients use the same command surface. Use `--project-dir` +when the agent is running outside the KTX project directory. + +### Readiness ```bash -# Check setup and context readiness ktx status --json ``` -**Semantic layer:** +Agents should run this before relying on context. It reports project, LLM, +embedding, database, context-source, context-build, and agent-integration +readiness. + +### Semantic layer discovery ```bash -# List sources ktx sl list --json -ktx sl list --json --connection-id my-postgres -ktx sl search "revenue" --json +ktx sl list --connection-id warehouse --json +ktx sl search "revenue" --json --limit 10 +``` -# Run a query from a JSON file -ktx sl query --json \ - --connection-id my-postgres \ - --query-file query.json \ +Agents use these commands to discover source names, connection ids, measures, +dimensions, and likely files to inspect. + +### Semantic-layer validation and queries + +```bash +ktx sl validate orders --connection-id warehouse +``` + +Compile SQL before executing: + +```bash +ktx sl query \ + --connection-id warehouse \ + --measure orders.total_revenue \ + --dimension orders.created_date \ + --format sql +``` + +Execute only when the task calls for live data: + +```bash +ktx sl query \ + --connection-id warehouse \ + --measure orders.total_revenue \ + --dimension orders.status \ --execute \ --max-rows 100 ``` -**Wiki:** +For complex calls, agents can write a JSON query object and pass it with +`--query-file`. + +### Wiki context ```bash -# Search wiki pages +ktx wiki list --json ktx wiki search "revenue recognition" --json --limit 10 ``` -## Setting Up Your Agent +Agents should search wiki context when a question depends on business +definitions, metric caveats, process rules, or terms that are not obvious from +schema names. -The fastest way to connect an agent is through the setup wizard: +### Context refresh + +Agents can refresh context when the user asks them to: ```bash -ktx setup +ktx ingest warehouse --fast +ktx ingest --all +ktx ingest text docs/revenue-notes.md --connection-id warehouse ``` -The agents step auto-detects installed tools and generates the right -configuration. For manual setup or per-tool details, see the -[Agent Clients](/docs/integrations/agent-clients) integration page. +Use `--deep` only when LLM and embedding setup is ready and the user expects an +AI-enriched refresh. -After configuration, the agent can immediately call KTX commands to list -sources, search wiki pages, and query your semantic layer. +## Good agent behavior + +Agents should: + +- Run `ktx status --json` before using KTX context. +- Use `ktx sl search` and `ktx wiki search` before writing SQL from memory. +- Inspect the relevant YAML or Markdown files after search returns candidates. +- Compile SQL with `ktx sl query --format sql` before executing. +- Use `--max-rows` whenever executing a live query. +- Validate edited semantic sources with `ktx sl validate`. +- Keep generated context changes reviewable in git. + +Agents should not assume a background server, ORPC route, frontend app, or +external migration system exists. KTX is a local context layer with a CLI and +plain project files. + +## Manual setup + +Manual setup is useful for custom agents that can read project-local +instructions but are not yet a named target. + +1. Install the universal target: + + ```bash + ktx setup --agents --target universal + ``` + +2. Configure the agent to read `.agents/skills/ktx/SKILL.md`. +3. Open the agent in the KTX project directory. +4. Ask it to run `ktx status --json` and summarize readiness. + +For per-client notes, see [Agent Clients](/docs/integrations/agent-clients). + +## Troubleshooting + +| Symptom | Likely cause | Recovery | +|---------|--------------|----------| +| Agent says KTX is unavailable | Agent did not load the generated instruction file | Rerun `ktx setup --agents --target ` and restart the agent session | +| Agent command cannot find the project | Agent is running outside the KTX directory | Add `--project-dir ` or open the agent in the project root | +| Generated rules point at a missing CLI path | CLI was moved, rebuilt, or reinstalled | Rerun `ktx setup --agents` | +| Agent cannot find a metric | Context is missing or stale | Run `ktx sl search`, inspect source YAML, then refresh with `ktx ingest` if needed | +| Agent query returns too many rows | The command executed without a result cap | Require `--max-rows` for executed queries | diff --git a/docs-site/content/docs/guides/writing-context.mdx b/docs-site/content/docs/guides/writing-context.mdx index 488e11e2..fe9d3fdb 100644 --- a/docs-site/content/docs/guides/writing-context.mdx +++ b/docs-site/content/docs/guides/writing-context.mdx @@ -1,295 +1,341 @@ --- title: Writing Context -description: Write and refine semantic sources and wiki pages. +description: Edit semantic sources and wiki pages so agents use your business logic. --- -After building context through scanning and ingestion, you'll want to refine it - edit semantic sources to match your business logic, add wiki pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works. +KTX context is meant to be edited. Ingest gives you a grounded first draft, then +you refine source YAML and wiki Markdown until agents can answer data questions +with the same definitions your team uses. -## Agent workflow summary +Use this guide when you are adding measures, fixing joins, documenting business +rules, or reviewing context changes made by an agent. -Agents should refine context in this order: +## Editing workflow -1. `ktx sl list --json` - discover available sources and connection ids. -2. `ktx sl search --json` - find source candidates for a concept. -3. Edit the source YAML directly in `semantic-layer//`. -4. `ktx sl validate --connection-id ` - verify columns, joins, and table references. -5. `ktx sl query ... --format sql` - compile a representative query without executing it. -6. `ktx wiki search ...` - check business context captured by ingest or memory. +Use this order for most context changes: -## Semantic Sources +1. Discover existing context. -Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer - the structured definitions that agents use to generate correct SQL. + ```bash + ktx sl list --json + ktx sl search "revenue" --json + ktx wiki search "revenue recognition" --json --limit 10 + ``` -### Listing sources +2. Edit the smallest relevant files under `semantic-layer//` or + `wiki/`. +3. Validate semantic source changes. -```bash -# List all sources across connections -ktx sl list + ```bash + ktx sl validate orders --connection-id warehouse + ``` -# List sources for a specific connection -ktx sl list --connection-id my-postgres +4. Compile a representative query before executing it. -# Output as JSON -ktx sl list --json + ```bash + ktx sl query \ + --connection-id warehouse \ + --measure orders.total_revenue \ + --dimension orders.created_date \ + --format sql + ``` + +5. Search again using likely user wording to confirm the new context is + discoverable. + +## Semantic sources + +Semantic sources are YAML files that describe queryable entities. A source is +usually a table, but it can also point at a custom SQL expression. Sources +define the vocabulary agents use for measures, dimensions, segments, joins, and +grain-aware query planning. + +Source files live at: + +```text +semantic-layer//.yaml ``` -### Searching sources - -```bash -ktx sl search "revenue" --connection-id my-postgres --json -``` - -Search returns ranked source summaries. To inspect or edit a source, open the -YAML file under `semantic-layer//`. - -### The source schema - -A semantic source defines a single queryable entity - usually a table or a SQL expression. Here's a fully annotated example: +### Minimal source ```yaml name: orders -description: Customer orders with line-item totals -table: public.orders # or use `sql:` for a custom SQL expression +description: Customer orders with booked revenue. +table: public.orders grain: - - order_id # columns that uniquely identify a row + - order_id +columns: + - name: order_id + type: string + description: Unique order identifier. + - name: order_date + type: time + role: time + description: Date the order was placed. + - name: total_amount + type: number + description: Booked order value in USD. +measures: + - name: total_revenue + expr: SUM(total_amount) + description: Sum of booked order value before refunds. +``` + +### Full source shape + +```yaml +name: orders +description: Customer orders with line-item totals. +table: public.orders +grain: + - order_id columns: - name: order_id - type: string # string | number | time | boolean - description: Unique order identifier + type: string + description: Unique order identifier. - name: order_date type: time - role: time # marks this as the default time dimension - description: Date the order was placed + role: time + description: Date the order was placed. - name: status type: string - visibility: public # public (default) | internal | hidden - description: Current order status + visibility: public + description: Current order status. - name: _etl_loaded_at type: time - visibility: hidden # hidden columns are excluded from agent queries - description: Internal ETL timestamp + visibility: hidden + description: Internal load timestamp. - name: total_amount type: number - description: Order total in USD + description: Order total in USD. measures: - name: total_revenue expr: SUM(total_amount) - description: Sum of all order values + description: Sum of all order values. - name: order_count expr: COUNT(DISTINCT order_id) - description: Number of distinct orders + description: Number of distinct orders. - name: avg_order_value expr: AVG(total_amount) - description: Average order value + description: Average booked order value. - name: high_value_revenue expr: SUM(total_amount) filter: total_amount > 100 - description: Revenue from orders over $100 + description: Revenue from orders over $100. segments: - - name: us_orders - expr: country = 'US' - description: Orders from US customers + - name: completed_orders + expr: status = 'completed' + description: Orders that completed fulfillment. joins: - to: customers on: orders.customer_id = customers.customer_id - relationship: many_to_one # many_to_one | one_to_many | one_to_one + relationship: many_to_one - to: order_items on: orders.order_id = order_items.order_id relationship: one_to_many - alias: items # optional alias for the joined source + alias: items ``` -Key fields: +### Source fields | Field | Required | Description | |-------|----------|-------------| -| `name` | Yes | Source identifier (lowercase, underscores) | -| `table` or `sql` | Yes | Database table or custom SQL expression (exactly one) | -| `grain` | Yes | Columns that define row uniqueness | -| `columns` | No | Column definitions with type, role, visibility | -| `measures` | No | Aggregation expressions (SUM, COUNT, AVG, etc.) | -| `joins` | No | Relationships to other sources | -| `segments` | No | Named filter conditions | -| `inherits_columns_from` | No | Inherit column metadata from a manifest entry | +| `name` | Yes | Source identifier. Use lowercase words and underscores. | +| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. | +| `grain` | Yes | Columns that uniquely identify a row at the source grain. | +| `columns` | No | Column definitions with type, role, visibility, and descriptions. | +| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. | +| `segments` | No | Named predicates agents can reuse. | +| `joins` | No | Relationships to other semantic sources. | +| `inherits_columns_from` | No | Inherit column metadata from a manifest entry. | -Source component fields: +### Component fields | Component | Field | Required | Description | |-----------|-------|----------|-------------| -| Column | `name` | Yes | Column identifier as used in SQL expressions | -| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean` | -| Column | `role` | No | Special role such as `time` for default time dimensions | -| Column | `visibility` | No | `public`, `internal`, or `hidden` | -| Column | `description` | Strongly recommended | Human-readable business meaning | -| Measure | `name` | Yes | Queryable metric name | -| Measure | `expr` | Yes | SQL aggregation expression at the source grain | -| Measure | `filter` | No | SQL predicate applied only to this measure | -| Measure | `description` | Strongly recommended | Definition agents can cite and compare | -| Segment | `name` | Yes | Reusable filter name | -| Segment | `expr` | Yes | SQL predicate for the segment | -| Join | `to` | Yes | Target semantic source name | -| Join | `on` | Yes | SQL join condition using source names or aliases | -| Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one` | -| Join | `alias` | No | Query alias for repeated or clearer joins | +| Column | `name` | Yes | Column identifier used in SQL expressions. | +| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. | +| Column | `role` | No | Special role such as `time` for default time dimensions. | +| Column | `visibility` | No | `public`, `internal`, or `hidden`. | +| Column | `description` | Strongly recommended | Business meaning and usage notes. | +| Measure | `name` | Yes | Queryable metric name. | +| Measure | `expr` | Yes | SQL aggregation expression at the source grain. | +| Measure | `filter` | No | SQL predicate applied only to this measure. | +| Measure | `description` | Strongly recommended | Definition agents can cite and compare. | +| Segment | `name` | Yes | Reusable filter name. | +| Segment | `expr` | Yes | SQL predicate for the segment. | +| Join | `to` | Yes | Target semantic source name. | +| Join | `on` | Yes | SQL join condition using source names or aliases. | +| Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one`. | +| Join | `alias` | No | Query alias for repeated or clearer joins. | -Column visibility controls what agents see: +### Visibility -| Visibility | Behavior | -|------------|----------| -| `public` | Included in agent queries and listings (default) | -| `internal` | Available for joins and measures but not shown to agents | -| `hidden` | Excluded entirely - useful for ETL columns | +| Visibility | Agent behavior | +|------------|----------------| +| `public` | Included in listings and available for agent queries. | +| `internal` | Available for joins and measures, but not highlighted to agents. | +| `hidden` | Excluded from agent-facing context. Use for ETL fields and sensitive internals. | -### Editing a source +## Measures -Edit source files directly. They live at -`semantic-layer//.yaml` in your project directory. +Good measures have precise names, SQL expressions at the correct grain, and +descriptions that say what is included and excluded. -### Validating sources - -Validation checks a source definition against the actual database schema: - -```bash -ktx sl validate orders --connection-id my-postgres +```yaml +measures: + - name: net_revenue + expr: SUM(total_amount - refunded_amount) + filter: status = 'completed' + description: Completed order revenue after refunds, excluding cancelled orders. ``` -This catches mismatches - columns that don't exist in the table, type mismatches, invalid join targets - before an agent tries to use the source. +Prefer one canonical measure plus wiki synonyms over several nearly identical +measures. If your team uses multiple definitions, document the distinction in a +wiki page and link it with `sl_refs`. -### Querying +## Joins and grain -The semantic layer compiles your measures and dimensions into SQL, optionally executing it against the database: +`grain` and `relationship` prevent agents from producing double-counted SQL. +State the row grain even when it seems obvious. + +```yaml +grain: + - order_id +joins: + - to: customers + on: orders.customer_id = customers.customer_id + relationship: many_to_one +``` + +Use `many_to_one` for dimensions such as customer, account, product, or plan. +Use `one_to_many` only when the target can fan out the source rows, such as +orders to order items. + +## Validate and query + +Validation checks source YAML against the live database schema: + +```bash +ktx sl validate orders --connection-id warehouse +``` + +It catches missing columns, invalid join targets, and table-reference problems +before an agent relies on the source. + +Compile a query to inspect generated SQL: ```bash -# Compile a query to SQL ktx sl query \ - --connection-id my-postgres \ - --measure total_revenue \ - --measure order_count \ - --dimension "order_date" \ - --filter "status = 'completed'" \ - --order-by order_date:desc \ + --connection-id warehouse \ + --measure orders.total_revenue \ + --dimension orders.order_date \ + --filter "orders.status = 'completed'" \ + --order-by orders.order_date:desc \ --limit 10 \ --format sql ``` -This outputs the compiled SQL without executing it. To run the query: +Execute only when you need live rows: ```bash -# Execute and return results ktx sl query \ - --connection-id my-postgres \ - --measure total_revenue \ - --dimension "order_date" \ + --connection-id warehouse \ + --measure orders.total_revenue \ + --dimension orders.status \ --execute \ --max-rows 100 ``` -Query flags: +## Wiki pages -| Flag | Description | -|------|-------------| -| `--measure ` | Measure to query (repeatable, at least one required) | -| `--dimension ` | Dimension to group by (repeatable) | -| `--filter ` | Filter expression (repeatable) | -| `--segment ` | Named segment to apply (repeatable) | -| `--order-by ` | Sort field, optionally with `:asc` or `:desc` (repeatable) | -| `--limit ` | Maximum rows in the compiled query | -| `--format ` | Output format: `json` (default) or `sql` | -| `--execute` | Execute the query against the database | -| `--max-rows ` | Maximum rows to return when executing | -| `--include-empty` | Include empty/null rows in results | +Wiki pages capture business context that does not belong in a single source +file: metric policies, dashboard caveats, company vocabulary, data freshness, +known issues, and source-of-truth notes. -The query planner is grain-aware - it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining. +Wiki files live under: -### Workflow: edit and validate a source - -1. Open `semantic-layer/my-postgres/orders.yaml`. -2. Edit the file to add columns, measures, joins, or descriptions. -3. `ktx sl validate orders --connection-id my-postgres` - check the definition against the live schema. -4. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` - compile a representative query. - -If validation fails, fix the YAML before asking an agent to use the source. Common validation failures are missing columns, invalid join targets, and measure expressions that reference fields outside the source. - -## Wiki Pages - -Wiki pages are Markdown files that capture business context - definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it. - -### What they are - -When an agent asks "what counts as an active user?" or "why do revenue numbers differ between the dashboard and the SQL query?", the answer isn't in the schema. It's tribal knowledge that lives in Slack threads, Notion pages, or someone's head. Wiki pages make that context searchable and available to agents. - -### Organization - -Wiki pages are organized by scope: - -``` +```text wiki/ -├── global/ # Cross-cutting definitions -│ ├── order-status-definitions.md -│ ├── revenue-recognition-rules.md -│ └── data-freshness-sla.md -└── user/ - └── local/ # User-scoped context - ├── schema-conventions.md - └── known-data-issues.md + global/ + user// ``` -- **Global pages** apply across all connections - business definitions, metric standards, company terminology. -- **User-scoped pages** are private to a user ID - personal notes, local gotchas, or context you do not want shared globally. +Use global pages for shared business rules. Use user-scoped pages for local +notes, personal conventions, or context that should not be shared broadly. -### Editing pages +### Wiki page example -Create and edit wiki pages directly as Markdown files in the `wiki/` -directory. Ingest and memory capture also create these pages automatically. +```markdown +--- +summary: Revenue recognition rules for finance reporting. +tags: [revenue, finance, reporting] +sl_refs: [orders] +external_refs: + - type: notion + id: finance-revenue-policy +--- -Wiki page fields: +## Recognized Revenue + +Recognized revenue includes completed orders after refunds. It excludes +cancelled orders, test orders, implementation fees, and tax. + +Finance reporting uses order completion date, not invoice creation date. +``` + +Useful frontmatter: | Field | Required | Description | |-------|----------|-------------| -| Key | Yes | Stable page identifier used as the Markdown filename | -| Summary | Yes | Short text shown in search results | -| Content | Yes | Full Markdown business context | -| Scope | No | `global` for shared context or `user` for user-scoped notes | -| Tags | No | Search and organization labels | -| External refs | No | Links or identifiers for source-of-truth systems | -| Semantic-layer refs | No | Source names the page explains or constrains | +| `summary` | Yes | Short text shown in search results. | +| `tags` | No | Business terms and synonyms that improve search. | +| `sl_refs` | No | Semantic source names the page explains or constrains. | +| `external_refs` | No | Source-of-truth system links or ids. | -### Listing pages +## Add searchable business context + +1. Search first. + + ```bash + ktx wiki search "active customer definition" --json --limit 10 + ``` + +2. If no page covers the rule, create or edit a Markdown file under + `wiki/global/`. +3. Write a compact `summary` with the wording users are likely to ask. +4. Add tags for synonyms and related business areas. +5. Add `sl_refs` for relevant semantic sources. +6. Search again with a user-like phrase. + +## Review context changes + +Before accepting agent-written context: ```bash -ktx wiki list +git diff -- semantic-layer wiki +ktx sl validate orders --connection-id warehouse +ktx sl search "revenue" --json +ktx wiki search "revenue recognition" --json --limit 10 ``` -### Searching - -```bash -ktx wiki search "revenue recognition" -``` - -Search uses both full-text matching and semantic similarity - it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question. - -### Workflow: add searchable business context - -1. Search first: `ktx wiki search "order status definitions"`. -2. If no page already covers the rule, create or edit a Markdown file under `wiki/global/`. -3. Include concise frontmatter; agents see the summary before loading full content. -4. Add `tags` values for the business area and `sl_refs` values for related semantic sources. -5. Search again with the user's likely wording to confirm the page is discoverable. +Check that definitions are specific, hidden columns stay hidden, joins have +explicit relationships, and measures compile into the expected SQL. ## Common errors -| Error or symptom | Likely cause | Recovery | -|------------------|--------------|----------| -| `ktx sl validate` reports a missing column | YAML references a column that is absent from the scanned table | Run a fresh scan or update the YAML to match the warehouse schema | -| Query compilation double-counts a measure | Join relationship or grain is missing or wrong | Add `grain` and explicit `relationship` values, then validate and recompile | -| Agent cannot find a metric | Measure name or description does not match business terminology | Add a measure description and a wiki page with common synonyms | -| Wiki search misses a page | Summary and tags do not include likely user wording | Rewrite the summary and add relevant tags, then search again | -| Semantic-layer changes are hard to review | The YAML edit is too large or unfocused | Split the change into smaller source-file edits, then review the git diff | +| Symptom | Likely cause | Recovery | +|---------|--------------|----------| +| `ktx sl validate` reports a missing column | YAML references a column absent from the scanned table | Refresh database context or update the YAML | +| Query compilation double-counts a measure | `grain` or join `relationship` is missing or wrong | Add explicit grain and relationship values, then recompile | +| Agent cannot find a metric | Measure name and description do not match business terminology | Add a clearer measure description and a wiki page with synonyms | +| Wiki search misses a page | Summary, tags, or content do not match user wording | Rewrite the summary and add likely synonyms | +| Context diff is hard to review | One edit changed too many concepts | Split the change into focused source and wiki edits |