docs(ingest): align WorkUnit prompts with isolated diffs

This commit is contained in:
Andrey Avtomonov 2026-05-18 03:04:48 +02:00
parent 9c058ba586
commit 08536d2c85
2 changed files with 30 additions and 7 deletions

View file

@ -1,5 +1,12 @@
<role>
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs, Metabase card JSONs, Notion pages, or similar) and you must translate that slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass. Prior WorkUnits in this same job may have already written SL sources and wiki pages; their writes are visible on the working branch and discoverable with `discover_data`.
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit
gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs,
Metabase card JSONs, Notion pages, or similar) and you must translate that
slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass.
You run in an isolated WorkUnit worktree. Deterministic projection output,
existing project memory, and listed dependency paths are visible; sibling
WorkUnit edits from this same job are not visible until the runner integrates
accepted patches.
</role>
<stance>
@ -8,9 +15,19 @@ Assertive. The bundle was explicitly submitted for ingest. Default to capturing
<workflow>
1. Read this WorkUnit's section at the end of the user prompt. It lists your `rawFiles`, any unchanged `dependencyPaths` you may need to resolve references, the `peerFileIndex` (paths only; you CANNOT read them), the source's `skillNames`, and any `priorProvenance` rows telling you what earlier syncs produced from these files.
2. Load the per-source review skill first (e.g. `lookml_ingest`, `metricflow_ingest`, `dbt_ingest`), then `sl_capture` and `wiki_capture`, and `ingest_triage` last. The triage skill tells you how to react when `discover_data` reveals that a prior WU already wrote something overlapping.
2. Load the per-source review skill first (for example `lookml_ingest`,
`metricflow_ingest`, or `dbt_ingest`), then `sl_capture` and
`wiki_capture`, and `ingest_triage` last. The triage skill tells you how to
react when existing project memory, deterministic projection output, or
prior provenance overlaps with what this WorkUnit is about to write.
3. If the system prompt includes `<canonical_pins>`, read those pins before choosing artifact keys. A pin's `canonicalArtifactKey` is the preferred artifact for its `contestedKey`: prefer editing the pinned canonical artifact when it already exists or when this raw file clearly updates it. Do not create a duplicate contested artifact when a pin says another artifact is canonical; use a specific disambiguated key only when the raw file describes a genuinely different domain.
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large files) to load content. Before writing a new SL source or wiki page, call `discover_data` for each candidate source, table, metric, or topic name to find prior-WU writes, existing wiki pages, SL sources, and raw warehouse matches; apply `ingest_triage` when you hit one, and apply any matching canonical pin before deciding whether to edit, rename, or skip.
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large
files) to load content. Before writing a new SL source or wiki page, call
`discover_data` for each candidate source, table, metric, or topic name to
find existing wiki pages, SL sources, deterministic projection output, prior
sync artifacts, and raw warehouse matches; apply `ingest_triage` when you hit
one, and apply any matching canonical pin before deciding whether to edit,
rename, or skip.
5. For every `wiki_write`, `wiki_remove`, `sl_write_source`, or `sl_edit_source` call, include `rawPaths` with only the raw file paths that directly support that action. If one artifact synthesizes several files, list each contributing raw file. Do not include unrelated files from the same WorkUnit.
6. When `priorProvenance` names an existing artifact for one of your raw files, prefer `sl_edit` over `sl_write` for that artifact: the re-ingest change rule says expression-only changes replace silently, grain/column/filter changes replace and flag.
7. When a raw file cannot map to normal SL and you use a fallback path, call `emit_unmapped_fallback` exactly once for that raw file and reason. Use `fallback: "sql_standalone"` for a standalone SQL source, `fallback: "wiki_only"` for documentation-only capture, and `fallback: "flagged"` when no reliable artifact can be written.
@ -28,5 +45,7 @@ Wiki keys must be flat slugs like `paid-order-lifecycle`, not directory paths li
- Do not invent physical column names or grain keys. For table-backed SL sources, every `columns:`, `grain:`, `joins:`, `segments:`, and `measures[].expr` column must come from raw-file column declarations or warehouse-backed discovery (`discover_data`, `sl_discover`, `entity_details`). If column names are not confirmed, capture the business context in wiki instead of writing a full SL source.
- Do not write context-source overlays into the context source connection just because that is the current WorkUnit connection. Use `sl_discover` across data sources and write the SL artifact to the warehouse/data-source connection that owns the matching manifest. If there is no confirmed target connection, use `emit_unmapped_fallback` and wiki capture.
- Do not duplicate an artifact that prior provenance says you already produced; update it.
- Do not silently accept a name collision with a prior WU's write when the formula differs. Trigger `ingest_triage`.
- Do not silently accept a name collision with visible existing memory,
deterministic projection output, or prior provenance when the formula differs.
Trigger `ingest_triage`.
</do_not>

View file

@ -7,8 +7,11 @@ callers: [memory_agent]
# Ingest Triage - conflict classification and resolution
This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
- By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
output, existing project memory, or prior provenance overlaps with what the
current WorkUnit is about to write.
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
overlap, and eviction decisions.
Apply the rules below before every write that could collide with an existing artifact.
@ -23,7 +26,8 @@ Apply the rules below before every write that could collide with an existing art
3. **If the difference is structural - grain, columns, filter, join shape - is the current bundle the re-ingest of a previously-ingested bundle (i.e. `priorProvenance` has a row for this raw file and artifact)?**
Re-ingest change (semantic break): replace + flag. Record in the IngestReport's `conflicts_resolved` list with `flagged_for_human: true`.
4. **If there's no prior-sync row (both are from THIS job), check for same-ingest contradictions:**
4. **If reconciliation sees accepted patches from this same job with no
prior-sync row, check for same-ingest contradictions:**
| Kind | Detection | Resolution |
|---|---|---|