--- title: Writing Context description: Write and refine semantic sources and knowledge pages. --- After building context through scanning and ingestion, you'll want to refine it — edit semantic sources to match your business logic, add knowledge pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works. ## Agent workflow summary Agents should refine context in this order: 1. `ktx sl list --json` — discover available sources and connection ids. 2. `ktx sl read --connection-id ` — inspect the current YAML. 3. Edit the source YAML directly or use `ktx sl write`. 4. `ktx sl validate --connection-id ` — verify columns, joins, and table references. 5. `ktx sl query ... --format sql` — compile a representative query without executing it. 6. `ktx wiki search ...` and `ktx wiki write ...` — add business context that does not belong in schema YAML. ## Semantic Sources Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer — the structured definitions that agents use to generate correct SQL. ### Listing sources ```bash # List all sources across connections ktx sl list # List sources for a specific connection ktx sl list --connection-id my-postgres # Output as JSON ktx sl list --json ``` ### Reading a source ```bash ktx sl read orders --connection-id my-postgres ``` This prints the full YAML definition for the source. ### The source schema A semantic source defines a single queryable entity — usually a table or a SQL expression. Here's a fully annotated example: ```yaml name: orders description: Customer orders with line-item totals table: public.orders # or use `sql:` for a custom SQL expression grain: - order_id # columns that uniquely identify a row columns: - name: order_id type: string # string | number | time | boolean description: Unique order identifier - name: order_date type: time role: time # marks this as the default time dimension description: Date the order was placed - name: status type: string visibility: public # public (default) | internal | hidden description: Current order status - name: _etl_loaded_at type: time visibility: hidden # hidden columns are excluded from agent queries description: Internal ETL timestamp - name: total_amount type: number description: Order total in USD measures: - name: total_revenue expr: SUM(total_amount) description: Sum of all order values - name: order_count expr: COUNT(DISTINCT order_id) description: Number of distinct orders - name: avg_order_value expr: AVG(total_amount) description: Average order value - name: high_value_revenue expr: SUM(total_amount) filter: total_amount > 100 description: Revenue from orders over $100 segments: - name: us_orders expr: country = 'US' description: Orders from US customers joins: - to: customers on: orders.customer_id = customers.customer_id relationship: many_to_one # many_to_one | one_to_many | one_to_one - to: order_items on: orders.order_id = order_items.order_id relationship: one_to_many alias: items # optional alias for the joined source ``` Key fields: | Field | Required | Description | |-------|----------|-------------| | `name` | Yes | Source identifier (lowercase, underscores) | | `table` or `sql` | Yes | Database table or custom SQL expression (exactly one) | | `grain` | Yes | Columns that define row uniqueness | | `columns` | No | Column definitions with type, role, visibility | | `measures` | No | Aggregation expressions (SUM, COUNT, AVG, etc.) | | `joins` | No | Relationships to other sources | | `segments` | No | Named filter conditions | | `inherits_columns_from` | No | Inherit column metadata from a manifest entry | Source component fields: | Component | Field | Required | Description | |-----------|-------|----------|-------------| | Column | `name` | Yes | Column identifier as used in SQL expressions | | Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean` | | Column | `role` | No | Special role such as `time` for default time dimensions | | Column | `visibility` | No | `public`, `internal`, or `hidden` | | Column | `description` | Strongly recommended | Human-readable business meaning | | Measure | `name` | Yes | Queryable metric name | | Measure | `expr` | Yes | SQL aggregation expression at the source grain | | Measure | `filter` | No | SQL predicate applied only to this measure | | Measure | `description` | Strongly recommended | Definition agents can cite and compare | | Segment | `name` | Yes | Reusable filter name | | Segment | `expr` | Yes | SQL predicate for the segment | | Join | `to` | Yes | Target semantic source name | | Join | `on` | Yes | SQL join condition using source names or aliases | | Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one` | | Join | `alias` | No | Query alias for repeated or clearer joins | Column visibility controls what agents see: | Visibility | Behavior | |------------|----------| | `public` | Included in agent queries and listings (default) | | `internal` | Available for joins and measures but not shown to agents | | `hidden` | Excluded entirely — useful for ETL columns | ### Writing a source ```bash ktx sl write orders --connection-id my-postgres --yaml ' name: orders table: public.orders grain: [order_id] columns: - name: order_id type: string - name: total_amount type: number measures: - name: total_revenue expr: SUM(total_amount) ' ``` You can also edit source files directly — they live at `semantic-layer//.yaml` in your project directory. ### Validating sources Validation checks a source definition against the actual database schema: ```bash ktx sl validate orders --connection-id my-postgres ``` This catches mismatches — columns that don't exist in the table, type mismatches, invalid join targets — before an agent tries to use the source. ### Querying The semantic layer compiles your measures and dimensions into SQL, optionally executing it against the database: ```bash # Compile a query to SQL ktx sl query \ --connection-id my-postgres \ --measure total_revenue \ --measure order_count \ --dimension "order_date" \ --filter "status = 'completed'" \ --order-by order_date:desc \ --limit 10 \ --format sql ``` This outputs the compiled SQL without executing it. To run the query: ```bash # Execute and return results ktx sl query \ --connection-id my-postgres \ --measure total_revenue \ --dimension "order_date" \ --execute \ --max-rows 100 ``` Query flags: | Flag | Description | |------|-------------| | `--measure ` | Measure to query (repeatable, at least one required) | | `--dimension ` | Dimension to group by (repeatable) | | `--filter ` | Filter expression (repeatable) | | `--segment ` | Named segment to apply (repeatable) | | `--order-by ` | Sort field, optionally with `:asc` or `:desc` (repeatable) | | `--limit ` | Maximum rows in the compiled query | | `--format ` | Output format: `json` (default) or `sql` | | `--execute` | Execute the query against the database | | `--max-rows ` | Maximum rows to return when executing | | `--include-empty` | Include empty/null rows in results | The query planner is grain-aware — it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining. ### Workflow: edit and validate a source 1. `ktx sl read orders --connection-id my-postgres > /tmp/orders.yaml` — capture the current definition. 2. Edit `/tmp/orders.yaml` to add columns, measures, joins, or descriptions. 3. `ktx sl write orders --connection-id my-postgres --yaml "$(cat /tmp/orders.yaml)"` — write the updated source. 4. `ktx sl validate orders --connection-id my-postgres` — check the definition against the live schema. 5. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` — compile a representative query. If validation fails, fix the YAML before asking an agent to use the source. Common validation failures are missing columns, invalid join targets, and measure expressions that reference fields outside the source. ## Knowledge Pages Knowledge pages are Markdown files that capture business context — definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it. ### What they are When an agent asks "what counts as an active user?" or "why do revenue numbers differ between the dashboard and the SQL query?", the answer isn't in the schema. It's tribal knowledge that lives in Slack threads, Notion pages, or someone's head. Knowledge pages make that context searchable and available to agents. ### Organization Knowledge pages are organized by scope: ``` knowledge/ ├── global/ # Cross-cutting definitions │ ├── order-status-definitions.md │ ├── revenue-recognition-rules.md │ └── data-freshness-sla.md └── user/ └── local/ # User-scoped context ├── schema-conventions.md └── known-data-issues.md ``` - **Global pages** apply across all connections — business definitions, metric standards, company terminology. - **User-scoped pages** are private to a user ID — personal notes, local gotchas, or context you do not want shared globally. ### Writing pages ```bash ktx wiki write order-status-definitions \ --scope global \ --summary "Business definitions for order status values" \ --content "## Order Statuses - **pending**: Order placed but not yet processed - **confirmed**: Payment received, awaiting fulfillment - **shipped**: Order dispatched to carrier - **delivered**: Order received by customer - **cancelled**: Order cancelled before shipment Orders in pending status for more than 48 hours are flagged for review." \ --tag orders \ --tag definitions \ --sl-ref orders ``` Write flags: | Flag | Description | |------|-------------| | `--scope ` | `global` (default) or `user` | | `--summary ` | Short description for search results (required) | | `--content ` | Full Markdown content (required) | | `--tag ` | Categorization tag (repeatable) | | `--ref ` | Reference to external resources (repeatable) | | `--sl-ref ` | Link to a semantic source (repeatable) | Knowledge page fields: | Field | Required | Description | |-------|----------|-------------| | Key | Yes | Stable page identifier passed to `ktx wiki read` | | Summary | Yes | Short text shown in search results | | Content | Yes | Full Markdown business context | | Scope | No | `global` for shared context or `user` for user-scoped notes | | Tags | No | Search and organization labels | | External refs | No | Links or identifiers for source-of-truth systems | | Semantic-layer refs | No | Source names the page explains or constrains | You can also create and edit knowledge pages directly as Markdown files in the `knowledge/` directory. ### Listing pages ```bash ktx wiki list ``` ### Reading a page ```bash ktx wiki read order-status-definitions ``` ### Searching ```bash ktx wiki search "revenue recognition" ``` Search uses both full-text matching and semantic similarity — it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question. ### Workflow: add searchable business context 1. Search first: `ktx wiki search "order status definitions"`. 2. If no page already covers the rule, write a page with `ktx wiki write`. 3. Include a concise `--summary`; agents see this before loading full content. 4. Add `--tag` values for the business area and `--sl-ref` values for related semantic sources. 5. Search again with the user's likely wording to confirm the page is discoverable. ## Common errors | Error or symptom | Likely cause | Recovery | |------------------|--------------|----------| | `ktx sl validate` reports a missing column | YAML references a column that is absent from the scanned table | Run a fresh scan or update the YAML to match the warehouse schema | | Query compilation double-counts a measure | Join relationship or grain is missing or wrong | Add `grain` and explicit `relationship` values, then validate and recompile | | Agent cannot find a metric | Measure name or description does not match business terminology | Add a measure description and a knowledge page with common synonyms | | Knowledge search misses a page | Summary and tags do not include likely user wording | Rewrite the summary and add relevant tags, then search again | | `ktx sl write` changes are hard to review | Large YAML was passed inline | Edit the source file directly or write from a temporary file, then review the git diff |