mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-01 08:59:39 +02:00
feat(sigma): add Sigma Computing context-source adapter (#316)
* feat(sigma): add Sigma Computing context-source adapter Closes #168 Adds a full ingest adapter for Sigma Computing so `ktx ingest` can pull data model specs and workbook summaries into the ktx context layer. The implementation follows the same fetch → chunk → project → LLM pattern used by the Looker, Metabase, and MetricFlow adapters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sigma): address PR review comments - Remove manifest from rawFiles; moves to peerFileIndex so fetchedAt changes don't mark all work units dirty every run - Fix workbookFilter.updatedSince eviction bug: fetch full universe first, apply filter client-side, evict only on archived/deleted - Remove measure projection entirely; project() writes measures: [] and the sigma_ingest skill surfaces Lookup/aggregation formulas as wiki prose - Remove joins projection (v1 limitation); project() writes joins: [] and Lookup relationships are described in wiki prose instead - Remove write-back dead code: createDataModel, updateDataModel, SigmaDataModelPushResult, mutate/post/put - Fix emitBatches notes pluralization bug ('2 data modelss' → '2 data models') - Add tokenInflight dedup on ensureToken to coalesce concurrent auth requests - Retry spec fetch when existing staged spec is null (transient failure cache) - Drop unused WorkbookFilter import from client-port.ts - Note in docs that joins are not projected from Sigma data models in this release Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * updates * fix(sigma): restore sigma in local adapter test + small cleanups The gdrive↔sigma merge dropped 'sigma' from the expected adapter source list in local-adapters.test.ts while keeping gdrive, so the slow TS suite failed even though the source registers both. Add 'sigma' back at its registration position (after metabase, before gdrive). Also: - Move the orphaned SigmaPullConfig docstring onto the schema it documents and drop the stale BullMQ reference (standalone ktx has no BullMQ; the config lives in the ingest job's bundleRef.config). - Drop an O(n^2) find() round-trip in fetch() when building the active data-model list; filter once and reuse for the eviction id set. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com> Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
This commit is contained in:
parent
139ac08320
commit
acd20ac248
41 changed files with 3610 additions and 6 deletions
|
|
@ -8,7 +8,7 @@ can also capture free-form text into **ktx** memory. Database connections build
|
|||
enriched context — schema plus AI-generated descriptions, embeddings, and
|
||||
relationship evidence — and require a configured model and embeddings.
|
||||
Context-source connections ingest metadata from tools such as dbt, Looker,
|
||||
Metabase, MetricFlow, LookML, and Notion. Pass `--text` or `--file` to capture
|
||||
Metabase, MetricFlow, LookML, Notion, and Sigma. Pass `--text` or `--file` to capture
|
||||
inline text or text files into memory instead.
|
||||
|
||||
## Command signature
|
||||
|
|
|
|||
|
|
@ -193,7 +193,7 @@ sources. This is equivalent to passing `--skip-sources` in scripted setup.
|
|||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--source <type>` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, or `notion` |
|
||||
| `--source <type>` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, `notion`, or `sigma` |
|
||||
| `--source-connection-id <id>` | Connection id for context-source setup |
|
||||
| `--source-path <path>` | Local source path for dbt, MetricFlow, or LookML |
|
||||
| `--source-git-url <url>` | Git URL for dbt, MetricFlow, or LookML |
|
||||
|
|
@ -278,6 +278,13 @@ ktx setup \
|
|||
--notion-crawl-mode selected_roots \
|
||||
--notion-root-page-id abc123def456
|
||||
|
||||
# Add a Sigma source
|
||||
ktx setup \
|
||||
--source sigma \
|
||||
--source-connection-id sigma-main \
|
||||
--source-client-id your-client-id \
|
||||
--source-client-secret-ref env:SIGMA_CLIENT_SECRET
|
||||
|
||||
# Install project-scoped agent integration for Codex
|
||||
ktx setup --agents --target codex
|
||||
```
|
||||
|
|
|
|||
|
|
@ -119,6 +119,7 @@ context-source drivers share the map.
|
|||
| `dbt` | Context source | `driver`, one of `source_dir` or `repo_url` | `branch`, `path`, `profiles_path`, `target`, `project_name` |
|
||||
| `metricflow` | Context source | `driver`, `metricflow.repoUrl` | `metricflow.branch`, `metricflow.path`, `metricflow.auth_token_ref` |
|
||||
| `notion` | Context source | `driver`, `auth_token_ref` | `crawl_mode`, `root_*_ids`, `max_*_per_run` |
|
||||
| `sigma` | Context source | `driver`, `client_id`, `client_secret_ref` | `api_url` |
|
||||
|
||||
### Warehouse drivers
|
||||
|
||||
|
|
@ -345,6 +346,31 @@ connections:
|
|||
| `max_knowledge_creates_per_run` | Max new wiki pages created per run (0-25). |
|
||||
| `max_knowledge_updates_per_run` | Max existing wiki pages updated per run (0-100). |
|
||||
|
||||
### Sigma
|
||||
|
||||
```yaml
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
api_url: https://api.sigmacomputing.com
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
workbookFilter:
|
||||
includeArchived: false
|
||||
includeExplorations: false
|
||||
updatedSince: "2026-01-01T00:00:00Z"
|
||||
```
|
||||
|
||||
| Field | Purpose |
|
||||
|-------|---------|
|
||||
| `api_url` | Sigma API base URL. Defaults to `https://api.sigmacomputing.com` (GCP US). Override for AWS US (`https://aws-api.sigmacomputing.com`) or other regions. |
|
||||
| `client_id` | Sigma OAuth client ID. Required. |
|
||||
| `client_secret` / `client_secret_ref` | Literal secret or reference. Prefer the `_ref`. |
|
||||
| `connectionMappings` | Maps Sigma internal connection UUIDs to **ktx** warehouse connection IDs. Enables `sl_validate` for projected semantic-layer sources. |
|
||||
| `workbookFilter.includeArchived` | Include archived workbooks during ingest. Default: `false`. |
|
||||
| `workbookFilter.includeExplorations` | Include exploration workbooks during ingest. Default: `false`. |
|
||||
| `workbookFilter.updatedSince` | ISO 8601 date string. Only workbooks updated on or after this date are fetched. Useful for limiting ingest scope at large scale. |
|
||||
|
||||
## `setup`
|
||||
|
||||
Captured by the setup wizard. The only field **ktx** still reads is
|
||||
|
|
|
|||
|
|
@ -102,6 +102,7 @@ Supported source types:
|
|||
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
|
||||
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
|
||||
| `notion` | Notion API | Wiki pages and business knowledge |
|
||||
| `sigma` | Sigma API | Data model specs, pages, element metadata, and workbook metadata |
|
||||
|
||||
Context-source ingest writes semantic source YAML and wiki Markdown, reconciling
|
||||
with local edits.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Context Sources
|
||||
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, Notion, and Google Drive.
|
||||
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, Notion, Sigma, and Google Drive.
|
||||
---
|
||||
|
||||
Context sources feed your existing analytics tooling into **ktx**. During ingestion, **ktx** extracts metadata from each source and uses a reconciliation agent to reconcile it with your existing semantic layer and knowledge base - preserving accepted edits rather than overwriting.
|
||||
|
|
@ -27,7 +27,7 @@ LookML uses top-level `repoUrl`, and MetricFlow uses nested
|
|||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, `notion`, or `gdrive` |
|
||||
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, `notion`, `sigma`, or `gdrive` |
|
||||
| `source_dir` | For local file sources | Absolute or project-relative source directory |
|
||||
| `repo_url` | For Git-hosted dbt sources | Git repository URL |
|
||||
| `repoUrl` | For Git-hosted LookML sources | Git repository URL |
|
||||
|
|
@ -378,6 +378,101 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
|
|||
|
||||
---
|
||||
|
||||
## Sigma
|
||||
|
||||
Ingests data model definitions and workbook metadata from a Sigma workspace as semantic context. Uses the Sigma REST API to fetch data model specs and workbook summaries.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Data model names, folder paths, and ownership metadata
|
||||
- Page and element definitions within each data model
|
||||
- Column identifiers and data types where available
|
||||
- Workbook names, paths, descriptions, and version metadata
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
api_url: https://api.sigmacomputing.com # Omit for GCP US (default)
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
```
|
||||
|
||||
For the AWS US region, override `api_url`:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
api_url: https://aws-api.sigmacomputing.com
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| OAuth client credentials | `client_id` + `client_secret_ref: env:SIGMA_CLIENT_SECRET` |
|
||||
|
||||
Generate a client in Sigma: **Administration → Developer Access → Add New Client**.
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Active data model specs, organized by folder into work units
|
||||
- Workbook metadata (name, path, description, version) — archived and exploration workbooks excluded by default
|
||||
- Models backed by CSV uploads or unsupported connector subtypes are listed in the manifest but skipped during spec fetch (a Sigma API limitation)
|
||||
|
||||
### Warehouse connection mapping
|
||||
|
||||
`connectionMappings` is optional. Without it, **ktx** produces wiki knowledge only — no semantic-layer sources are written and warehouse validation is skipped. To get semantic-layer output and enable `sl_validate`, map each Sigma internal connection UUID to a **ktx** warehouse connection ID:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
connectionMappings:
|
||||
"<sigma-internal-uuid>": snowflake-prod # data models using this connection get SL sources
|
||||
```
|
||||
|
||||
Find the Sigma connection UUID in **Administration → Connections** or from the `source.connectionId` field in a fetched data model spec. Data model elements whose `connectionId` has no mapping are ingested as wiki-only.
|
||||
|
||||
### Workbook filter
|
||||
|
||||
At large scale, you can limit which workbooks are fetched during ingest using `workbookFilter`:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
workbookFilter:
|
||||
includeArchived: false # default
|
||||
includeExplorations: false # default
|
||||
updatedSince: "2026-01-01T00:00:00Z" # only recently updated workbooks
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `includeArchived` | `false` | Include archived workbooks |
|
||||
| `includeExplorations` | `false` | Include exploration workbooks |
|
||||
| `updatedSince` | — | ISO 8601 date; only workbooks updated on or after this date are fetched |
|
||||
|
||||
### Notes
|
||||
|
||||
- `connectionMappings` is optional for wiki-only ingest; it is required to generate semantic-layer sources and run warehouse validation
|
||||
- Context ingest (`ktx ingest sigma-main`) fetches from the Sigma API directly
|
||||
- Ingest is incremental: items whose `updatedAt` timestamp is unchanged since the last run are skipped
|
||||
- Models backed by CSV uploads or unsupported connector subtypes cannot have their spec exported; these are skipped with a warning (a Sigma API limitation)
|
||||
- Joins are not projected from Sigma data models in this release; `joins: []` is always written by the projection step. Lookup relationships visible in data model specs are captured as wiki knowledge instead.
|
||||
|
||||
---
|
||||
|
||||
## Google Drive
|
||||
|
||||
Ingests Google Docs from a shared Google Drive folder as wiki-ready knowledge content. This v1 implementation is knowledge-only and ingests Google Docs MIME types only.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue