Refine unified ingest UX design after adversarial review iteration 1

2026-07-25 12:01:03 +02:00 · 2026-05-13 17:28:08 +02:00 · 2026-05-13 17:28:08 +02:00 · 63f6d645e9
commit 63f6d645e9
parent bf12d51731
1 changed files with 103 additions and 14 deletions
--- a/docs/superpowers/specs/2026-05-13-unified-ingest-ux-design.md
+++ b/docs/superpowers/specs/2026-05-13-unified-ingest-ux-design.md
@ -94,11 +94,32 @@ connections, not adapters.
 `ktx scan` is no longer a documented public command. Database schema scanning
 continues as an internal phase of database ingest.

+Stored report inspection is separate from live context-build control.
+`ktx ingest status [runId]`, `ktx ingest replay <runId>`, and `--report-file`
+remain valid report-viewing surfaces unless the implementation plan replaces
+them with an equivalent status command. `ktx ingest watch` is no longer a normal
+public verb because `watch` conflicts with the foreground-only model. If a
+stored-report visual replay remains useful, expose it as `replay` or hide it
+under an advanced/debug namespace.
+
 ## Database ingest depth

 Database ingest always includes a schema baseline. The depth controls how much
 extra work KTX may perform.

+Depth is the public abstraction over the current scan engine:
+
+- `fast` maps to `KtxScanMode: structural` with `detectRelationships: false`.
+- `deep` maps to `KtxScanMode: enriched` with `detectRelationships: true`.
+- The internal `relationships` scan mode remains an advanced implementation
+  detail. It is not a separate public depth in this v1.
+
+Deep mode includes relationship discovery when the project's
+`scan.relationships.enabled` setting is true. Relationship validation thresholds
+and budgets remain governed by the existing internal `scan.relationships`
+configuration; users do not get a separate public relationship flag in this
+surface.
+
 ### Fast

 `--fast` means KTX builds deterministic schema context quickly.
@ -119,18 +140,24 @@ large unknown warehouses.

 ### Deep

-`--deep` means KTX builds richer database context and may use slower
-capabilities.
+`--deep` means KTX builds richer database context through the enriched scan path
+and uses slower capabilities.

- May use LLMs and embeddings when configured.
+- Requires LLM, embedding, and scan-enrichment readiness before work starts.
+- Generates table and column descriptions.
+- Generates embeddings.
 - May sample or query data through read-only connector capabilities.
- May generate table and column descriptions.
- May discover and validate relationships.
+- Discovers and validates relationships when relationship discovery is enabled.
 - May process query history into usage patterns when query history is enabled.

 Deep mode is the best agent-readiness mode, but it can take longer and can
 require model, embedding, and database permissions.

+KTX must not silently downgrade an explicit or stored `deep` request to `fast`.
+If the project is missing the model, embedding, or scan-enrichment configuration
+required for deep ingest, KTX errors before starting the run and tells the user
+to run `ktx setup` or rerun with `--fast`.
+
 ### Flag rules

 `--fast` and `--deep` are mutually exclusive. Passing both is an error.
@ -218,15 +245,39 @@ connections:
        enabled: true
        windowDays: 90
        minExecutions: 5
-        serviceAccountPatterns:
-          - "^svc_"
+        filters:
+          dropTrivialProbes: true
+          serviceAccounts:
+            mode: exclude
+            patterns:
+              - "^svc_"
        redactionPatterns: []
 ```

-`ingest.adapters` is no longer normal user config. The implementation plan can
-remove it from generated config or keep it as an internal advanced override.
-KTX must not require users to list `live-database` to ingest a database
-connection.
+`context.queryHistory` is the canonical user-facing shape. Runtime code maps it
+to the existing historic-SQL pull config as follows:
+
+- `dialect` is derived from the database driver (`postgres`, `bigquery`, or
+  `snowflake`) and is not normally user-authored.
+- `windowDays`, `minExecutions`, and `redactionPatterns` copy through directly.
+- `filters.dropTrivialProbes` defaults to `true`.
+- `filters.serviceAccounts.patterns` and `filters.serviceAccounts.mode` map to
+  the existing service-account filter fields. The default mode is `exclude`.
+
+Existing `connection.historicSql` blocks are legacy cutover input. Setup or the
+config rewrite path must migrate them into `connection.context.queryHistory`
+while preserving `windowDays`, `minExecutions`, `redactionPatterns`,
+`filters.dropTrivialProbes`, and service-account `patterns` and `mode`. If both
+`context.queryHistory` and `historicSql` are present, `context.queryHistory`
+wins and KTX emits a config-cleanup warning instead of running both.
+
+`ingest.adapters` is no longer normal user config. Existing `ingest.adapters`
+entries load as advanced/internal overrides during the transition, but
+`live-database` and `historic-sql` entries must not be required for public
+`ktx ingest <connectionId>` behavior, must not be regenerated in normal
+`ktx.yaml`, and must not cause config-load failure solely because they are
+present. The implementation plan can remove adapter parsing after checked-in
+configs and examples no longer need it.

 ## Setup flow

@ -240,7 +291,7 @@ connection is configured or when setup reaches the context-build step:
 How much database context should KTX build?

 Fast: schema only, no AI, quickest
-Deep: richer context, may use AI and take longer
+Deep: AI descriptions, embeddings, relationships, slower
 ```

 The recommended selection depends on readiness:
@ -253,6 +304,22 @@ foreground context build uses that stored default. Setup can still expose a
 non-prominent automation flag later, such as `--context-depth fast`, if
 headless setup needs it, but the main product surface is guided.

+Setup readiness is depth-aware:
+
+- For `fast`, a database context is ready when the latest non-dry-run
+  structural scan for the connection completed and wrote schema manifest shards.
+  Model, embedding, description-enrichment, and scan-enrichment checks are
+  skipped for fast contexts.
+- For `deep`, a database context is ready only when the enriched scan completed
+  table descriptions, column descriptions, embeddings, and schema manifest
+  shards. Relationship artifacts are also required when relationship discovery
+  is enabled.
+
+The missing-input gate uses the same rule. Missing model, embedding, or
+scan-enrichment configuration must not block a user who selected `fast`. The
+same missing inputs must block `deep` before the foreground build starts, with a
+message that offers `fast` as the no-AI path.
+
 ## Foreground progress UX

 KTX keeps a rich foreground progress view. It removes detach and background
@ -344,10 +411,22 @@ Warnings are non-fatal when KTX can still perform the requested ingest.
 - Ignored query-history flag on an unsupported database: warn and continue if
  schema ingest can run.
 - Both `--fast` and `--deep`: error before any work starts.
+- Explicit or stored `deep` without required model, embedding, or
+  scan-enrichment readiness: error before any work starts.
+- `--query-history` without required model, embedding, or scan-enrichment
+  readiness: error before any work starts because query history upgrades the
+  run to `deep`.
 - Query-history requested without required grants: fail that query-history
  facet and keep schema results when schema ingest succeeded.
 - Database schema ingest failure: fail that database target.

+`--all` isolates target failures. It runs all database targets first, then all
+source targets, even when one or more database targets fail. Source targets may
+therefore run against previously completed database context if the current
+database refresh failed. The final exit code is non-zero when any target or
+required facet fails, and the summary identifies partial failures by
+connection.
+
 Failure messages focus on the connection and user action:

 ```text
@ -364,11 +443,23 @@ The implementation is complete when these conditions hold:
 - `ktx ingest <connectionId>` works for database and source connections.
 - `ktx ingest --all` runs database targets before source targets.
 - `--fast` and `--deep` control database depth and are mutually exclusive.
+- `--fast` maps to structural database ingest without relationship detection.
+- `--deep` maps to enriched database ingest with relationship detection enabled.
+- `--deep` and `--query-history` fail before work starts when required model,
+  embedding, or scan-enrichment configuration is missing.
+- `ktx ingest --all` continues independent targets after partial failures and
+  exits non-zero when any target or required facet fails.
 - `ktx setup` stores a database context depth without exposing top-level
  `--fast` or `--deep`.
+- `ktx setup` treats fast database context as ready after completed structural
+  schema ingest and does not require AI descriptions or embeddings for fast.
 - Generated `ktx.yaml` does not include `live-database` for normal projects.
+- Generated `ktx.yaml` uses `connections.<id>.context.queryHistory`, not
+  `connections.<id>.historicSql`, for query-history configuration.
 - Normal CLI help and output do not mention `live-database`.
 - Normal CLI help and output do not present `scan` as a public verb.
+- Normal CLI help and output do not present `ktx ingest watch` as live context
+  build control.
 - Query history is optional, connection-local, and overridable per ingest run.
 - Context build has no detach, attach, watch, resume, stop, or background
  execution path.
@ -384,8 +475,6 @@ The implementation plan must decide these lower-level details:
  temporary undocumented debug command.
 - Whether old `ktx ingest run --connection-id ... --adapter ...` is removed,
  hidden, or moved to `ktx dev ingest run`.
- Whether `ingest.adapters` is removed from config parsing or retained as an
-  advanced override.
 - Whether internal artifact paths keep `raw-sources/<connection>/live-database`
  for the first implementation.
 - Whether setup needs a headless `--context-depth fast|deep` flag for CI.