mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-25 08:48:08 +02:00
Merge remote-tracking branch 'origin/main' into fix-sl-query-source-column-type
# Conflicts: # packages/context/skills/metabase_ingest/SKILL.md # packages/context/skills/sl_capture/SKILL.md
This commit is contained in:
commit
cd49d5d4ae
168 changed files with 3567 additions and 1621 deletions
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
name: sl
|
||||
description: KTX's semantic layer — a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `sl_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
|
||||
description: KTX's semantic layer - a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `sl_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
|
||||
---
|
||||
|
||||
# Semantic Layer
|
||||
|
|
@ -8,10 +8,10 @@ description: KTX's semantic layer — a structured catalog of sources (tables/vi
|
|||
KTX's semantic layer (SL) is a structured catalog. Each **source** represents a table, a SQL view, or an overlay that enriches a manifest-backed table with measures, computed columns, joins, and named segments. The catalog is the single source of truth for reusable business metrics.
|
||||
|
||||
This skill covers two parts:
|
||||
- **Part 1** — Schema reference (what an SL source looks like).
|
||||
- **Part 2** — Querying via `sl_query`.
|
||||
- **Part 1** - Schema reference (what an SL source looks like).
|
||||
- **Part 2** - Querying via `sl_query`.
|
||||
|
||||
Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent — see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.
|
||||
Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent - see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.
|
||||
|
||||
For capture-time identifier verification, load `sl_capture`. Synthesis writer
|
||||
skills must verify warehouse identifiers with `discover_data`,
|
||||
|
|
@ -19,7 +19,7 @@ skills must verify warehouse identifiers with `discover_data`,
|
|||
|
||||
---
|
||||
|
||||
## Part 1 — Schema reference
|
||||
## Part 1 - Schema reference
|
||||
|
||||
An SL source is a YAML file at `semantic-layer/<connectionId>/<source_name>.yaml`. There are three flavors:
|
||||
|
||||
|
|
@ -34,7 +34,7 @@ descriptions:
|
|||
measures:
|
||||
- name: total_revenue
|
||||
expr: sum(amount)
|
||||
description: Total order revenue — filter by status or region at query time
|
||||
description: Total order revenue - filter by status or region at query time
|
||||
columns: # computed dimensions only
|
||||
- name: is_large_order
|
||||
type: boolean
|
||||
|
|
@ -53,7 +53,7 @@ joins:
|
|||
```
|
||||
|
||||
Rules:
|
||||
- Do **not** repeat base-table columns, grain, `table`, or `source_type` in an overlay — those are inherited.
|
||||
- Do **not** repeat base-table columns, grain, `table`, or `source_type` in an overlay - those are inherited.
|
||||
- Overlay columns MUST be computed (`expr` + `type`).
|
||||
- Use `column_overrides` to add descriptions or metadata to inherited manifest columns. Do not put `type` or `expr` in `column_overrides`.
|
||||
- `exclude_columns` hides specific manifest columns; `disable_joins` suppresses specific auto-detected joins.
|
||||
|
|
@ -111,7 +111,7 @@ measures:
|
|||
expr: count(*)
|
||||
```
|
||||
|
||||
An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible — the `sl_capture` skill covers when SQL is justified.
|
||||
An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible - the `sl_capture` skill covers when SQL is justified.
|
||||
|
||||
### Columns
|
||||
|
||||
|
|
@ -124,7 +124,7 @@ Every standalone column requires `name` and `type`. Overlays have computed colum
|
|||
|
||||
### Grain
|
||||
|
||||
`grain: [col_a, col_b]` — the set of columns that uniquely identify one row. The query engine uses grain to prevent fan-out in joins. Overlays inherit grain from the manifest unless they override.
|
||||
`grain: [col_a, col_b]` - the set of columns that uniquely identify one row. The query engine uses grain to prevent fan-out in joins. Overlays inherit grain from the manifest unless they override.
|
||||
|
||||
### Joins
|
||||
|
||||
|
|
@ -133,7 +133,7 @@ joins:
|
|||
- to: customers # target source name
|
||||
on: "customer_id = customers.id" # local_col = TARGET.target_col
|
||||
relationship: many_to_one # or one_to_many, one_to_one
|
||||
alias: primary_customer # optional — lets you join the same target twice
|
||||
alias: primary_customer # optional - lets you join the same target twice
|
||||
```
|
||||
|
||||
- `on` format: `local_col = TARGET.target_col`. Always qualify the right side with the target source name.
|
||||
|
|
@ -145,13 +145,13 @@ joins:
|
|||
measures:
|
||||
- name: total_arr
|
||||
expr: sum(arr_amount)
|
||||
description: Sum of ARR — filter by plan_name at query time
|
||||
description: Sum of ARR - filter by plan_name at query time
|
||||
filter: "is_active = true"
|
||||
segments: [paid_non_refunded]
|
||||
```
|
||||
|
||||
- `name` (required, snake_case).
|
||||
- `expr` (required): any valid SQL aggregate — `sum(x)`, `count(*)`, `count(distinct user_id)`, `avg(score)`.
|
||||
- `expr` (required): any valid SQL aggregate - `sum(x)`, `count(*)`, `count(distinct user_id)`, `avg(score)`.
|
||||
- `description` (required on capture): what the measure computes and how to use it.
|
||||
- `filter` (optional): SQL predicate applied as a WHERE clause specific to this measure.
|
||||
- `segments` (optional): names of segments defined on the same source. The engine AND-composes each segment's `expr` into this measure's effective filter.
|
||||
|
|
@ -167,23 +167,23 @@ segments:
|
|||
description: Orders that were paid and not refunded
|
||||
```
|
||||
|
||||
Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in an `sl_query`. Segments are predicates only — they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.
|
||||
Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in an `sl_query`. Segments are predicates only - they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.
|
||||
|
||||
### Cross-references with the wiki
|
||||
|
||||
The reverse edge (wiki pages that cite this source) is derived automatically from each wiki's `sl_refs:` — you don't emit anything on the SL side. Author the edge once on the wiki via `sl_refs:`; the post-write reconciler populates the knowledge↔SL index.
|
||||
The reverse edge (wiki pages that cite this source) is derived automatically from each wiki's `sl_refs:` - you don't emit anything on the SL side. Author the edge once on the wiki via `sl_refs:`; the post-write reconciler populates the knowledge↔SL index.
|
||||
|
||||
---
|
||||
|
||||
## Part 2 — Querying via `sl_query`
|
||||
## Part 2 - Querying via `sl_query`
|
||||
|
||||
The `sl_query` tool generates correct SQL from a structured query. It handles joins, fan-out prevention, aggregation correctness, and filter classification automatically. Prefer it over writing raw SQL whenever the SL has the relevant sources.
|
||||
|
||||
### When to prefer sl_query over raw SQL
|
||||
|
||||
- A pre-defined measure already exists (`source.measure_name` appears in the catalog).
|
||||
- The question combines fields from multiple sources — the engine resolves the join path automatically.
|
||||
- The question asks for a standard metric (revenue, ARR, churn, retention, LTV, conversion, MAU, etc.) — even if no pre-defined measure exists, a runtime aggregation over a catalog column is usually correct.
|
||||
- The question combines fields from multiple sources - the engine resolves the join path automatically.
|
||||
- The question asks for a standard metric (revenue, ARR, churn, retention, LTV, conversion, MAU, etc.) - even if no pre-defined measure exists, a runtime aggregation over a catalog column is usually correct.
|
||||
|
||||
Use raw SQL (`sql_execution`) only when:
|
||||
- The computation requires multi-step CTEs whose intermediate grain is not a column in any source.
|
||||
|
|
@ -206,17 +206,17 @@ Use raw SQL (`sql_execution`) only when:
|
|||
- **`measures`**: mix pre-defined refs (`source.measure`) and runtime aggregations (`sum(source.column)`).
|
||||
- **`dimensions`**: column refs or `{ field, granularity }` objects for time grains (`day`, `week`, `month`, `quarter`, `year`).
|
||||
- **`filters`**: free-form SQL predicates. The engine auto-classifies each as WHERE or HAVING based on whether it references an aggregated measure.
|
||||
- **`segments`**: dotted `source.segment_name`. Each segment is AND-ed into the effective filter of every measure whose base source matches. Segments never become a global WHERE — use `filters` for cross-source predicates.
|
||||
- **`segments`**: dotted `source.segment_name`. Each segment is AND-ed into the effective filter of every measure whose base source matches. Segments never become a global WHERE - use `filters` for cross-source predicates.
|
||||
- **`order_by`**: string or `{ field, direction }`. Direction defaults to `asc`.
|
||||
- **`limit`**: integer row cap.
|
||||
|
||||
### Join resolution
|
||||
|
||||
You don't specify a base table. The engine infers the set of sources needed from the fields you reference and resolves the shortest join path through the catalog's declared joins. If no path exists between two sources, the query fails with a path-not-found error — check `discover_data` or `sl_discover` to see which sources are connected.
|
||||
You don't specify a base table. The engine infers the set of sources needed from the fields you reference and resolves the shortest join path through the catalog's declared joins. If no path exists between two sources, the query fails with a path-not-found error - check `discover_data` or `sl_discover` to see which sources are connected.
|
||||
|
||||
### Worked examples
|
||||
|
||||
Cross-source query — engine resolves `account_health_scores → accounts ← opportunities` automatically:
|
||||
Cross-source query - engine resolves `account_health_scores → accounts ← opportunities` automatically:
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue