mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
341 lines
10 KiB
Text
341 lines
10 KiB
Text
---
|
|
title: Writing Context
|
|
description: Edit semantic sources and wiki pages so agents use your business logic.
|
|
---
|
|
|
|
KTX context is meant to be edited. Ingest gives you a grounded first draft, then
|
|
you refine source YAML and wiki Markdown until agents can answer data questions
|
|
with the same definitions your team uses.
|
|
|
|
Use this guide when you are adding measures, fixing joins, documenting business
|
|
rules, or reviewing context changes made by an agent.
|
|
|
|
## Editing workflow
|
|
|
|
Use this order for most context changes:
|
|
|
|
1. Discover existing context.
|
|
|
|
```bash
|
|
ktx sl list --json
|
|
ktx sl search "revenue" --json
|
|
ktx wiki search "revenue recognition" --json --limit 10
|
|
```
|
|
|
|
2. Edit the smallest relevant files under `semantic-layer/<connection-id>/` or
|
|
`wiki/`.
|
|
3. Validate semantic source changes.
|
|
|
|
```bash
|
|
ktx sl validate orders --connection-id warehouse
|
|
```
|
|
|
|
4. Compile a representative query before executing it.
|
|
|
|
```bash
|
|
ktx sl query \
|
|
--connection-id warehouse \
|
|
--measure orders.total_revenue \
|
|
--dimension orders.created_date \
|
|
--format sql
|
|
```
|
|
|
|
5. Search again using likely user wording to confirm the new context is
|
|
discoverable.
|
|
|
|
## Semantic sources
|
|
|
|
Semantic sources are YAML files that describe queryable entities. A source is
|
|
usually a table, but it can also point at a custom SQL expression. Sources
|
|
define the vocabulary agents use for measures, dimensions, segments, joins, and
|
|
grain-aware query planning.
|
|
|
|
Source files live at:
|
|
|
|
```text
|
|
semantic-layer/<connection-id>/<source-name>.yaml
|
|
```
|
|
|
|
### Minimal source
|
|
|
|
```yaml
|
|
name: orders
|
|
description: Customer orders with booked revenue.
|
|
table: public.orders
|
|
grain:
|
|
- order_id
|
|
columns:
|
|
- name: order_id
|
|
type: string
|
|
description: Unique order identifier.
|
|
- name: order_date
|
|
type: time
|
|
role: time
|
|
description: Date the order was placed.
|
|
- name: total_amount
|
|
type: number
|
|
description: Booked order value in USD.
|
|
measures:
|
|
- name: total_revenue
|
|
expr: SUM(total_amount)
|
|
description: Sum of booked order value before refunds.
|
|
```
|
|
|
|
### Full source shape
|
|
|
|
```yaml
|
|
name: orders
|
|
description: Customer orders with line-item totals.
|
|
table: public.orders
|
|
grain:
|
|
- order_id
|
|
|
|
columns:
|
|
- name: order_id
|
|
type: string
|
|
description: Unique order identifier.
|
|
|
|
- name: order_date
|
|
type: time
|
|
role: time
|
|
description: Date the order was placed.
|
|
|
|
- name: status
|
|
type: string
|
|
visibility: public
|
|
description: Current order status.
|
|
|
|
- name: _etl_loaded_at
|
|
type: time
|
|
visibility: hidden
|
|
description: Internal load timestamp.
|
|
|
|
- name: total_amount
|
|
type: number
|
|
description: Order total in USD.
|
|
|
|
measures:
|
|
- name: total_revenue
|
|
expr: SUM(total_amount)
|
|
description: Sum of all order values.
|
|
- name: order_count
|
|
expr: COUNT(DISTINCT order_id)
|
|
description: Number of distinct orders.
|
|
- name: avg_order_value
|
|
expr: AVG(total_amount)
|
|
description: Average booked order value.
|
|
- name: high_value_revenue
|
|
expr: SUM(total_amount)
|
|
filter: total_amount > 100
|
|
description: Revenue from orders over $100.
|
|
|
|
segments:
|
|
- name: completed_orders
|
|
expr: status = 'completed'
|
|
description: Orders that completed fulfillment.
|
|
|
|
joins:
|
|
- to: customers
|
|
on: orders.customer_id = customers.customer_id
|
|
relationship: many_to_one
|
|
- to: order_items
|
|
on: orders.order_id = order_items.order_id
|
|
relationship: one_to_many
|
|
alias: items
|
|
```
|
|
|
|
### Source fields
|
|
|
|
| Field | Required | Description |
|
|
|-------|----------|-------------|
|
|
| `name` | Yes | Source identifier. Use lowercase words and underscores. |
|
|
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
|
|
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
|
|
| `columns` | No | Column definitions with type, role, visibility, and descriptions. |
|
|
| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. |
|
|
| `segments` | No | Named predicates agents can reuse. |
|
|
| `joins` | No | Relationships to other semantic sources. |
|
|
| `inherits_columns_from` | No | Inherit column metadata from a manifest entry. |
|
|
|
|
### Component fields
|
|
|
|
| Component | Field | Required | Description |
|
|
|-----------|-------|----------|-------------|
|
|
| Column | `name` | Yes | Column identifier used in SQL expressions. |
|
|
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. |
|
|
| Column | `role` | No | Special role such as `time` for default time dimensions. |
|
|
| Column | `visibility` | No | `public`, `internal`, or `hidden`. |
|
|
| Column | `description` | Strongly recommended | Business meaning and usage notes. |
|
|
| Measure | `name` | Yes | Queryable metric name. |
|
|
| Measure | `expr` | Yes | SQL aggregation expression at the source grain. |
|
|
| Measure | `filter` | No | SQL predicate applied only to this measure. |
|
|
| Measure | `description` | Strongly recommended | Definition agents can cite and compare. |
|
|
| Segment | `name` | Yes | Reusable filter name. |
|
|
| Segment | `expr` | Yes | SQL predicate for the segment. |
|
|
| Join | `to` | Yes | Target semantic source name. |
|
|
| Join | `on` | Yes | SQL join condition using source names or aliases. |
|
|
| Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one`. |
|
|
| Join | `alias` | No | Query alias for repeated or clearer joins. |
|
|
|
|
### Visibility
|
|
|
|
| Visibility | Agent behavior |
|
|
|------------|----------------|
|
|
| `public` | Included in listings and available for agent queries. |
|
|
| `internal` | Available for joins and measures, but not highlighted to agents. |
|
|
| `hidden` | Excluded from agent-facing context. Use for ETL fields and sensitive internals. |
|
|
|
|
## Measures
|
|
|
|
Good measures have precise names, SQL expressions at the correct grain, and
|
|
descriptions that say what is included and excluded.
|
|
|
|
```yaml
|
|
measures:
|
|
- name: net_revenue
|
|
expr: SUM(total_amount - refunded_amount)
|
|
filter: status = 'completed'
|
|
description: Completed order revenue after refunds, excluding cancelled orders.
|
|
```
|
|
|
|
Prefer one canonical measure plus wiki synonyms over several nearly identical
|
|
measures. If your team uses multiple definitions, document the distinction in a
|
|
wiki page and link it with `sl_refs`.
|
|
|
|
## Joins and grain
|
|
|
|
`grain` and `relationship` prevent agents from producing double-counted SQL.
|
|
State the row grain even when it seems obvious.
|
|
|
|
```yaml
|
|
grain:
|
|
- order_id
|
|
joins:
|
|
- to: customers
|
|
on: orders.customer_id = customers.customer_id
|
|
relationship: many_to_one
|
|
```
|
|
|
|
Use `many_to_one` for dimensions such as customer, account, product, or plan.
|
|
Use `one_to_many` only when the target can fan out the source rows, such as
|
|
orders to order items.
|
|
|
|
## Validate and query
|
|
|
|
Validation checks source YAML against the live database schema:
|
|
|
|
```bash
|
|
ktx sl validate orders --connection-id warehouse
|
|
```
|
|
|
|
It catches missing columns, invalid join targets, and table-reference problems
|
|
before an agent relies on the source.
|
|
|
|
Compile a query to inspect generated SQL:
|
|
|
|
```bash
|
|
ktx sl query \
|
|
--connection-id warehouse \
|
|
--measure orders.total_revenue \
|
|
--dimension orders.order_date \
|
|
--filter "orders.status = 'completed'" \
|
|
--order-by orders.order_date:desc \
|
|
--limit 10 \
|
|
--format sql
|
|
```
|
|
|
|
Execute only when you need live rows:
|
|
|
|
```bash
|
|
ktx sl query \
|
|
--connection-id warehouse \
|
|
--measure orders.total_revenue \
|
|
--dimension orders.status \
|
|
--execute \
|
|
--max-rows 100
|
|
```
|
|
|
|
## Wiki pages
|
|
|
|
Wiki pages capture business context that does not belong in a single source
|
|
file: metric policies, dashboard caveats, company vocabulary, data freshness,
|
|
known issues, and source-of-truth notes.
|
|
|
|
Wiki files live under:
|
|
|
|
```text
|
|
wiki/
|
|
global/
|
|
user/<user-id>/
|
|
```
|
|
|
|
Use global pages for shared business rules. Use user-scoped pages for local
|
|
notes, personal conventions, or context that should not be shared broadly.
|
|
|
|
### Wiki page example
|
|
|
|
```markdown
|
|
---
|
|
summary: Revenue recognition rules for finance reporting.
|
|
tags: [revenue, finance, reporting]
|
|
sl_refs: [orders]
|
|
external_refs:
|
|
- type: notion
|
|
id: finance-revenue-policy
|
|
---
|
|
|
|
## Recognized Revenue
|
|
|
|
Recognized revenue includes completed orders after refunds. It excludes
|
|
cancelled orders, test orders, implementation fees, and tax.
|
|
|
|
Finance reporting uses order completion date, not invoice creation date.
|
|
```
|
|
|
|
Useful frontmatter:
|
|
|
|
| Field | Required | Description |
|
|
|-------|----------|-------------|
|
|
| `summary` | Yes | Short text shown in search results. |
|
|
| `tags` | No | Business terms and synonyms that improve search. |
|
|
| `sl_refs` | No | Semantic source names the page explains or constrains. |
|
|
| `external_refs` | No | Source-of-truth system links or ids. |
|
|
|
|
## Add searchable business context
|
|
|
|
1. Search first.
|
|
|
|
```bash
|
|
ktx wiki search "active customer definition" --json --limit 10
|
|
```
|
|
|
|
2. If no page covers the rule, create or edit a Markdown file under
|
|
`wiki/global/`.
|
|
3. Write a compact `summary` with the wording users are likely to ask.
|
|
4. Add tags for synonyms and related business areas.
|
|
5. Add `sl_refs` for relevant semantic sources.
|
|
6. Search again with a user-like phrase.
|
|
|
|
## Review context changes
|
|
|
|
Before accepting agent-written context:
|
|
|
|
```bash
|
|
git diff -- semantic-layer wiki
|
|
ktx sl validate orders --connection-id warehouse
|
|
ktx sl search "revenue" --json
|
|
ktx wiki search "revenue recognition" --json --limit 10
|
|
```
|
|
|
|
Check that definitions are specific, hidden columns stay hidden, joins have
|
|
explicit relationships, and measures compile into the expected SQL.
|
|
|
|
## Common errors
|
|
|
|
| Symptom | Likely cause | Recovery |
|
|
|---------|--------------|----------|
|
|
| `ktx sl validate` reports a missing column | YAML references a column absent from the scanned table | Refresh database context or update the YAML |
|
|
| Query compilation double-counts a measure | `grain` or join `relationship` is missing or wrong | Add explicit grain and relationship values, then recompile |
|
|
| Agent cannot find a metric | Measure name and description do not match business terminology | Add a clearer measure description and a wiki page with synonyms |
|
|
| Wiki search misses a page | Summary, tags, or content do not match user wording | Rewrite the summary and add likely synonyms |
|
|
| Context diff is hard to review | One edit changed too many concepts | Split the change into focused source and wiki edits |
|