mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-01 08:59:39 +02:00
* feat(sigma): add Sigma Computing context-source adapter Closes #168 Adds a full ingest adapter for Sigma Computing so `ktx ingest` can pull data model specs and workbook summaries into the ktx context layer. The implementation follows the same fetch → chunk → project → LLM pattern used by the Looker, Metabase, and MetricFlow adapters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sigma): address PR review comments - Remove manifest from rawFiles; moves to peerFileIndex so fetchedAt changes don't mark all work units dirty every run - Fix workbookFilter.updatedSince eviction bug: fetch full universe first, apply filter client-side, evict only on archived/deleted - Remove measure projection entirely; project() writes measures: [] and the sigma_ingest skill surfaces Lookup/aggregation formulas as wiki prose - Remove joins projection (v1 limitation); project() writes joins: [] and Lookup relationships are described in wiki prose instead - Remove write-back dead code: createDataModel, updateDataModel, SigmaDataModelPushResult, mutate/post/put - Fix emitBatches notes pluralization bug ('2 data modelss' → '2 data models') - Add tokenInflight dedup on ensureToken to coalesce concurrent auth requests - Retry spec fetch when existing staged spec is null (transient failure cache) - Drop unused WorkbookFilter import from client-port.ts - Note in docs that joins are not projected from Sigma data models in this release Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * updates * fix(sigma): restore sigma in local adapter test + small cleanups The gdrive↔sigma merge dropped 'sigma' from the expected adapter source list in local-adapters.test.ts while keeping gdrive, so the slow TS suite failed even though the source registers both. Add 'sigma' back at its registration position (after metabase, before gdrive). Also: - Move the orphaned SigmaPullConfig docstring onto the schema it documents and drop the stale BullMQ reference (standalone ktx has no BullMQ; the config lives in the ingest job's bundleRef.config). - Drop an O(n^2) find() round-trip in fetch() when building the active data-model list; filter once and reuse for the eviction id set. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com> Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
186 lines
6.8 KiB
Text
186 lines
6.8 KiB
Text
---
|
|
title: Building Context
|
|
description: Build and refresh ktx context from databases, context sources, query history, and text.
|
|
---
|
|
|
|
Build context after `ktx setup` creates `ktx.yaml` and at least one database or
|
|
context-source connection. **ktx** writes local semantic sources and wiki
|
|
pages for agents to use before writing SQL.
|
|
|
|
## The build loop
|
|
|
|
Most projects use this loop:
|
|
|
|
1. Check readiness with `ktx status`.
|
|
2. Build one connection with `ktx ingest <connectionId>`, or build everything
|
|
with `ktx ingest --all`.
|
|
3. Search or inspect the generated files under `semantic-layer/` and `wiki/`.
|
|
4. Edit source YAML or Markdown when business logic needs refinement.
|
|
5. Validate and query representative sources before handing the context to an
|
|
agent.
|
|
|
|
`ktx ingest --all` runs databases first, then context-source connections, so
|
|
external metadata can attach to known warehouse tables.
|
|
|
|
## Database ingest
|
|
|
|
Database ingest always builds enriched context: tables, columns, types,
|
|
constraints, and row counts, plus AI-generated descriptions, embeddings, and
|
|
relationship evidence.
|
|
|
|
```bash
|
|
# Build one configured database connection
|
|
ktx ingest warehouse
|
|
|
|
# Build all configured connections
|
|
ktx ingest --all
|
|
```
|
|
|
|
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
|
|
connections without that configuration fail before any work starts.
|
|
|
|
Local-auth backends keep provider credentials out of `ktx.yaml`:
|
|
|
|
```bash
|
|
ktx setup --llm-backend claude-code --no-input
|
|
ktx setup --llm-backend codex --no-input
|
|
```
|
|
|
|
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
|
|
for the current run. With `codex`, **ktx** restricts the temporary runtime MCP
|
|
server to the current run's tool set, disables Codex web search, requests a
|
|
read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and
|
|
CLI surface may still load user Codex config and built-in command execution or
|
|
read-only file capabilities, so use `claude-code` for stricter runtime tool
|
|
isolation.
|
|
|
|
## Query history
|
|
|
|
PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
|
|
filters, redaction rules, high-usage templates, and service-account exclusions.
|
|
When query history is enabled during setup, **ktx** reviews observed in-scope
|
|
roles and can write exact `filters.serviceAccounts` patterns for operational
|
|
traffic such as loader or refresh roles.
|
|
|
|
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
|
|
or request it for one run:
|
|
|
|
```bash
|
|
ktx ingest warehouse --query-history
|
|
# Set the lookback window for BigQuery or Snowflake query history
|
|
ktx ingest warehouse --query-history-window-days 30
|
|
```
|
|
|
|
Use `--no-query-history` when you want to skip a stored query-history setting
|
|
for one run.
|
|
|
|
## Relationship evidence
|
|
|
|
**ktx** scores relationship candidates during database ingest. The public CLI
|
|
does not expose separate relationship review subcommands.
|
|
|
|
## Context-source ingest
|
|
|
|
Context-source connections pull metadata from dbt, BI tools, Notion, and other
|
|
configured systems. Pass one connection id or `--all`.
|
|
|
|
```bash
|
|
# Build one context-source connection
|
|
ktx ingest dbt_main
|
|
|
|
# Build every configured database and context-source connection
|
|
ktx ingest --all
|
|
```
|
|
|
|
Supported source types:
|
|
|
|
| Driver | Typical source | Output |
|
|
|--------|----------------|--------|
|
|
| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata |
|
|
| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins |
|
|
| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins |
|
|
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
|
|
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
|
|
| `notion` | Notion API | Wiki pages and business knowledge |
|
|
| `sigma` | Sigma API | Data model specs, pages, element metadata, and workbook metadata |
|
|
|
|
Context-source ingest writes semantic source YAML and wiki Markdown, reconciling
|
|
with local edits.
|
|
|
|
## Text ingest
|
|
|
|
Use `ktx ingest --text` / `ktx ingest --file` for notes, Markdown, runbooks,
|
|
Slack exports, or other searchable memory.
|
|
|
|
```bash
|
|
# Capture a Markdown file
|
|
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
|
|
|
|
# Capture one stdin item
|
|
printf "Refunds are excluded from net revenue." | ktx ingest --file -
|
|
|
|
# Capture direct text
|
|
ktx ingest --text "ARR excludes one-time implementation fees."
|
|
```
|
|
|
|
Useful flags:
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--text <content>` | Capture inline text into memory; repeatable |
|
|
| `--file <path>` | Capture a text file (or `-` for stdin) into memory; repeatable |
|
|
| `--connection-id <connectionId>` | Attach the captured memory to a **ktx** connection |
|
|
| `--user-id <id>` | Attribute capture to a user scope, default `local-cli` |
|
|
| `--json` | Print structured output |
|
|
| `--fail-fast` | Stop after the first failed text/file item |
|
|
|
|
Use text ingest for small, high-signal documents. Prefer configured context-source
|
|
ingest for Notion, dbt, Metabase, and similar systems.
|
|
|
|
## Output and artifacts
|
|
|
|
Every ingest run prints a summary. Use `--json` for scripts and agents.
|
|
|
|
```bash
|
|
ktx ingest --all --json
|
|
```
|
|
|
|
Typical generated files:
|
|
|
|
| Path | Created by | Purpose |
|
|
|------|------------|---------|
|
|
| `semantic-layer/<connection-id>/*.yaml` | Database and context-source ingest | Queryable semantic source definitions |
|
|
| `wiki/global/*.md` | Context-source, text, and memory ingest | Shared business definitions and notes |
|
|
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
|
|
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |
|
|
|
|
Ingest transcripts include tool calls, LLM responses, and write decisions.
|
|
|
|
## Example: first full refresh
|
|
|
|
After interactive setup:
|
|
|
|
```bash
|
|
ktx status
|
|
ktx ingest --all
|
|
ktx status
|
|
```
|
|
|
|
Then inspect what changed:
|
|
|
|
```bash
|
|
git status --short
|
|
ktx sl --json
|
|
ktx wiki "revenue" --json --limit 10
|
|
```
|
|
|
|
## Common errors
|
|
|
|
| Symptom | Likely cause | Recovery |
|
|
|---------|--------------|----------|
|
|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
|
|
| Enrichment is not configured | LLM or embeddings are not setup-ready | Run `ktx setup` to configure a model and embeddings |
|
|
| Query history is unsupported | The selected database driver does not expose query history | Run ingest without query-history flags |
|
|
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
|
|
| Context-source flags have no effect | Query-history flags were supplied for a context-source connector | Use query-history flags only for database connections |
|
|
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |
|