ktx/packages/context/skills/live_database_ingest/SKILL.md
Andrey Avtomonov c22248dabf
feat(context): add warehouse verification tools (#46)
* feat(context): add warehouse dialect dispatch

* feat(context): read warehouse scan catalog

* feat(context): add entity details verification tool

* feat(context): add ingest SQL verification tool

* feat(context): add raw warehouse discovery tool

* feat(context): expose warehouse verification tools to ingest

* docs(context): add ingest identifier verification protocol

* test(context): guard ingest identifier verification prompts

* chore(context): verify warehouse verification tools

* docs: add warehouse verification tools plan and spec

* fix(context): expose target warehouses to Notion ingest

* fix(context): update ingest prompts for warehouse verification tools

* fix(context): scope raw schema discovery to allowed connections

* fix(context): verify warehouse column display targets

* docs: add notion warehouse verification gap closure plan

* fix(context): include raw discovery connection names

* fix(context): expose warehouse targets for LookML and MetricFlow

* fix(context): pass connection config to ingest query executors

* fix(cli): enable read-only SQL probes for local ingest

* docs: add warehouse verification final v1 closure plan

* fix(context): align warehouse sql probe prompt shape

* docs: add warehouse verification prompt shape closure plan

* test(context): catch connectionless sql execution prompt examples

* fix(context): include connection name in sl capture sql example

* docs: add warehouse verification sql example closure plan

* fix(context): report structured entity detail misses

* docs: add warehouse verification structured target miss closure plan

* fix: report untracked squash merge conflicts

* feat: require ingest verification ledger

* fix: stabilize ingest wiki references
2026-05-13 13:43:23 +02:00

3.2 KiB

name description callers
live_database_ingest Capture semantic-layer and knowledge updates from a live database schema snapshot.
memory_agent

Live Database Ingest

Use this skill when the ingest work unit contains raw files under raw-sources/<connectionId>/live-database/<syncId>/.

Workflow

  1. Read the table JSON file listed in the work unit.
  2. Read connection.json to understand the snapshot metadata.
  3. Read foreign-keys.json when the table has a foreign key or when joins are needed for the semantic-layer source.
  4. Create or update one semantic-layer source for the table with sl_write_source.
  5. Use the physical table name from the raw JSON as the source table field.
  6. Preserve database comments as descriptions.db on tables and columns.
  7. Add joins only when the foreign key index names both sides.
  8. Write wiki pages only for durable business meaning that is present in table or column comments.
  9. Run sl_validate for the table source before the work unit completes.

Sample values come from the scan record; do not invent values not present in relationship-profile.json.

Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

  1. discover_data({query: "<topic>"}) - see what wikis, SL sources, and raw tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any schema.table or schema.table.column into a wiki body, SL source, tables: frontmatter, sl_refs, or emit_unmapped_fallback:

  1. entity_details({connectionName, targets: [{display: "<identifier>"}]}) - confirm the identifier resolves; inspect native types, FK/PK, and sampleValues.
  2. For literal values from the source, such as status codes or plan tiers, check whether they appear in entity_details sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, run a sql_execution probe with the same warehouse connection name: sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"}).
  3. If the candidate identifier still does not resolve, do one of:
    • Use sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"}). If it errors, the identifier is fictional.
    • Wrap the identifier in [unverified - from <rawPath>] in the wiki body, citing the exact raw path that mentioned it.
    • When recording emit_unmapped_fallback with no_physical_table, include the failing probe error in clarification.
  4. Never copy <schema>.<table> placeholder strings from these instructions into output.

Source shape

For a raw table with this shape:

{
  "name": "orders",
  "db": "public",
  "columns": [
    { "name": "id", "type": "integer", "nullable": false, "primaryKey": true }
  ]
}

Write a semantic-layer source with this shape:

name: orders
table: public.orders
grain: id
columns:
  - name: id
    type: number

Use string, number, time, or boolean for column types. When a database type is ambiguous, use string.

Boundaries

The raw snapshot is structural evidence. Do not invent measures, segments, business definitions, or joins that are not present in the snapshot files.