mirror of
https://github.com/Kaelio/ktx.git
synced 2026-07-01 08:59:39 +02:00
feat(sigma): add Sigma Computing context-source adapter (#316)
* feat(sigma): add Sigma Computing context-source adapter Closes #168 Adds a full ingest adapter for Sigma Computing so `ktx ingest` can pull data model specs and workbook summaries into the ktx context layer. The implementation follows the same fetch → chunk → project → LLM pattern used by the Looker, Metabase, and MetricFlow adapters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sigma): address PR review comments - Remove manifest from rawFiles; moves to peerFileIndex so fetchedAt changes don't mark all work units dirty every run - Fix workbookFilter.updatedSince eviction bug: fetch full universe first, apply filter client-side, evict only on archived/deleted - Remove measure projection entirely; project() writes measures: [] and the sigma_ingest skill surfaces Lookup/aggregation formulas as wiki prose - Remove joins projection (v1 limitation); project() writes joins: [] and Lookup relationships are described in wiki prose instead - Remove write-back dead code: createDataModel, updateDataModel, SigmaDataModelPushResult, mutate/post/put - Fix emitBatches notes pluralization bug ('2 data modelss' → '2 data models') - Add tokenInflight dedup on ensureToken to coalesce concurrent auth requests - Retry spec fetch when existing staged spec is null (transient failure cache) - Drop unused WorkbookFilter import from client-port.ts - Note in docs that joins are not projected from Sigma data models in this release Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * updates * fix(sigma): restore sigma in local adapter test + small cleanups The gdrive↔sigma merge dropped 'sigma' from the expected adapter source list in local-adapters.test.ts while keeping gdrive, so the slow TS suite failed even though the source registers both. Add 'sigma' back at its registration position (after metabase, before gdrive). Also: - Move the orphaned SigmaPullConfig docstring onto the schema it documents and drop the stale BullMQ reference (standalone ktx has no BullMQ; the config lives in the ingest job's bundleRef.config). - Drop an O(n^2) find() round-trip in fetch() when building the active data-model list; filter once and reuse for the eviction id set. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com> Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
This commit is contained in:
parent
139ac08320
commit
acd20ac248
41 changed files with 3610 additions and 6 deletions
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
title: Context Sources
|
||||
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, Notion, and Google Drive.
|
||||
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, Notion, Sigma, and Google Drive.
|
||||
---
|
||||
|
||||
Context sources feed your existing analytics tooling into **ktx**. During ingestion, **ktx** extracts metadata from each source and uses a reconciliation agent to reconcile it with your existing semantic layer and knowledge base - preserving accepted edits rather than overwriting.
|
||||
|
|
@ -27,7 +27,7 @@ LookML uses top-level `repoUrl`, and MetricFlow uses nested
|
|||
|
||||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, `notion`, or `gdrive` |
|
||||
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, `notion`, `sigma`, or `gdrive` |
|
||||
| `source_dir` | For local file sources | Absolute or project-relative source directory |
|
||||
| `repo_url` | For Git-hosted dbt sources | Git repository URL |
|
||||
| `repoUrl` | For Git-hosted LookML sources | Git repository URL |
|
||||
|
|
@ -378,6 +378,101 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
|
|||
|
||||
---
|
||||
|
||||
## Sigma
|
||||
|
||||
Ingests data model definitions and workbook metadata from a Sigma workspace as semantic context. Uses the Sigma REST API to fetch data model specs and workbook summaries.
|
||||
|
||||
### What it provides
|
||||
|
||||
- Data model names, folder paths, and ownership metadata
|
||||
- Page and element definitions within each data model
|
||||
- Column identifiers and data types where available
|
||||
- Workbook names, paths, descriptions, and version metadata
|
||||
|
||||
### Connection config
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
api_url: https://api.sigmacomputing.com # Omit for GCP US (default)
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
```
|
||||
|
||||
For the AWS US region, override `api_url`:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
api_url: https://aws-api.sigmacomputing.com
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
| Method | Config |
|
||||
|--------|--------|
|
||||
| OAuth client credentials | `client_id` + `client_secret_ref: env:SIGMA_CLIENT_SECRET` |
|
||||
|
||||
Generate a client in Sigma: **Administration → Developer Access → Add New Client**.
|
||||
|
||||
### What gets ingested
|
||||
|
||||
- Active data model specs, organized by folder into work units
|
||||
- Workbook metadata (name, path, description, version) — archived and exploration workbooks excluded by default
|
||||
- Models backed by CSV uploads or unsupported connector subtypes are listed in the manifest but skipped during spec fetch (a Sigma API limitation)
|
||||
|
||||
### Warehouse connection mapping
|
||||
|
||||
`connectionMappings` is optional. Without it, **ktx** produces wiki knowledge only — no semantic-layer sources are written and warehouse validation is skipped. To get semantic-layer output and enable `sl_validate`, map each Sigma internal connection UUID to a **ktx** warehouse connection ID:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
connectionMappings:
|
||||
"<sigma-internal-uuid>": snowflake-prod # data models using this connection get SL sources
|
||||
```
|
||||
|
||||
Find the Sigma connection UUID in **Administration → Connections** or from the `source.connectionId` field in a fetched data model spec. Data model elements whose `connectionId` has no mapping are ingested as wiki-only.
|
||||
|
||||
### Workbook filter
|
||||
|
||||
At large scale, you can limit which workbooks are fetched during ingest using `workbookFilter`:
|
||||
|
||||
```yaml title="ktx.yaml"
|
||||
connections:
|
||||
sigma-main:
|
||||
driver: sigma
|
||||
client_id: "<your-client-id>"
|
||||
client_secret_ref: env:SIGMA_CLIENT_SECRET
|
||||
workbookFilter:
|
||||
includeArchived: false # default
|
||||
includeExplorations: false # default
|
||||
updatedSince: "2026-01-01T00:00:00Z" # only recently updated workbooks
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `includeArchived` | `false` | Include archived workbooks |
|
||||
| `includeExplorations` | `false` | Include exploration workbooks |
|
||||
| `updatedSince` | — | ISO 8601 date; only workbooks updated on or after this date are fetched |
|
||||
|
||||
### Notes
|
||||
|
||||
- `connectionMappings` is optional for wiki-only ingest; it is required to generate semantic-layer sources and run warehouse validation
|
||||
- Context ingest (`ktx ingest sigma-main`) fetches from the Sigma API directly
|
||||
- Ingest is incremental: items whose `updatedAt` timestamp is unchanged since the last run are skipped
|
||||
- Models backed by CSV uploads or unsupported connector subtypes cannot have their spec exported; these are skipped with a warning (a Sigma API limitation)
|
||||
- Joins are not projected from Sigma data models in this release; `joins: []` is always written by the projection step. Lookup relationships visible in data model specs are captured as wiki knowledge instead.
|
||||
|
||||
---
|
||||
|
||||
## Google Drive
|
||||
|
||||
Ingests Google Docs from a shared Google Drive folder as wiki-ready knowledge content. This v1 implementation is knowledge-only and ingests Google Docs MIME types only.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue