Commit graph

2 commits

Author SHA1 Message Date
Matt Senick (Sigma)
acd20ac248
feat(sigma): add Sigma Computing context-source adapter (#316)
* feat(sigma): add Sigma Computing context-source adapter

Closes #168

Adds a full ingest adapter for Sigma Computing so `ktx ingest` can pull
data model specs and workbook summaries into the ktx context layer. The
implementation follows the same fetch → chunk → project → LLM pattern
used by the Looker, Metabase, and MetricFlow adapters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sigma): address PR review comments

- Remove manifest from rawFiles; moves to peerFileIndex so fetchedAt
  changes don't mark all work units dirty every run
- Fix workbookFilter.updatedSince eviction bug: fetch full universe first,
  apply filter client-side, evict only on archived/deleted
- Remove measure projection entirely; project() writes measures: [] and
  the sigma_ingest skill surfaces Lookup/aggregation formulas as wiki prose
- Remove joins projection (v1 limitation); project() writes joins: [] and
  Lookup relationships are described in wiki prose instead
- Remove write-back dead code: createDataModel, updateDataModel,
  SigmaDataModelPushResult, mutate/post/put
- Fix emitBatches notes pluralization bug ('2 data modelss' → '2 data models')
- Add tokenInflight dedup on ensureToken to coalesce concurrent auth requests
- Retry spec fetch when existing staged spec is null (transient failure cache)
- Drop unused WorkbookFilter import from client-port.ts
- Note in docs that joins are not projected from Sigma data models in this release

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* updates

* fix(sigma): restore sigma in local adapter test + small cleanups

The gdrive↔sigma merge dropped 'sigma' from the expected adapter source
list in local-adapters.test.ts while keeping gdrive, so the slow TS suite
failed even though the source registers both. Add 'sigma' back at its
registration position (after metabase, before gdrive).

Also:
- Move the orphaned SigmaPullConfig docstring onto the schema it documents
  and drop the stale BullMQ reference (standalone ktx has no BullMQ; the
  config lives in the ingest job's bundleRef.config).
- Drop an O(n^2) find() round-trip in fetch() when building the active
  data-model list; filter once and reuse for the eviction id set.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
2026-07-01 01:14:57 +02:00
Andrey Avtomonov
56985b7e09
test: split cli tests from source tree (#216)
* feat(cli): define full warehouse dialect contract

* test(cli): keep dialect edge tests focused

* fix(cli): stabilize dialect contract foundation

* refactor(connectors): own read-only query preparation

* refactor(connectors): resolve dialects through registry

* refactor(connectors): keep concrete dialect classes internal

* chore(workspace): enforce dialect import boundary

* refactor(cli): resolve relationship dialect at scan boundary

* refactor(cli): use dialect display parsing for entity details

* refactor(cli): use dialect display parsing for warehouse catalog

* refactor(cli): use dialect SQL in relationship workflows

* test(cli): verify solid dialect scan workflow closure

* test: split cli tests from source tree

* refactor(cli): standardize BigQuery scope listing

* feat(sqlite): implement connector scope listing

* test(connectors): cover required table listing

* feat(cli): add warehouse driver registry

* refactor(setup): route scope discovery through driver registry

* refactor(cli): route local query execution through driver registry

* refactor(historic-sql): route dialect support through driver registry

* refactor(cli): test warehouse connections through driver registry

* fix(cli): close driver registry type export gaps

* Improve setup daemon diagnostics

* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback

Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.

* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match

The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.

Align the picker boundary with the canonical 3-level KtxTableRef:

- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
  resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
  (resolveEnabledTables already accepts the 3-part shape) and
  schemasFromEnabledTables now goes through parseDottedTableEntry so it
  recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
  reuse.

Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).

* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00