Merge origin/main into merge-scan-into-ingest-v1

2026-06-13 08:15:14 +02:00 · 2026-05-14 01:40:11 +02:00 · 2026-05-14 01:40:11 +02:00 · e501d1d81c
commit e501d1d81c
parent 3ee48e0752 1a472cf3ed
28 changed files with 432 additions and 71 deletions
--- a/docs-site/content/docs/integrations/context-sources.mdx
+++ b/docs-site/content/docs/integrations/context-sources.mdx
@ -19,13 +19,19 @@ Agents must configure and ingest context sources in this order:
 5. Review generated `semantic-layer/` YAML and `wiki/` Markdown files in git.
 6. Validate changed semantic sources with `ktx sl validate`.

-## Shared source fields
+## Common source fields
+
+Git repository fields are source-specific. dbt uses top-level `repo_url`,
+LookML uses top-level `repoUrl`, and MetricFlow uses nested
+`metricflow.repoUrl`.

 | Field | Required | Description |
 |-------|----------|-------------|
 | `driver` | Yes | Source adapter: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` |
 | `source_dir` | For local file sources | Absolute or project-relative source directory |
-| `repo_url` | For Git-hosted sources | Git repository URL |
+| `repo_url` | For Git-hosted dbt sources | Git repository URL |
+| `repoUrl` | For Git-hosted LookML sources | Git repository URL |
+| `metricflow.repoUrl` | For Git-hosted MetricFlow sources | Git repository URL |
 | `branch` | No | Git branch to read |
 | `path` | No | Subdirectory inside a monorepo |
 | `auth_token_ref` | For private APIs/repos | `env:NAME` or `file:/path/to/secret` token reference |
@ -351,7 +357,7 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
 | `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
 | `root_database_ids` | Database IDs to include | `[]` |
 | `max_pages_per_run` | Pages processed per sync | `1000` |
-| `max_knowledge_creates_per_run` | New pages created per sync | `5` |
+| `max_knowledge_creates_per_run` | New pages created per sync | `25` |
 | `max_knowledge_updates_per_run` | Pages updated per sync | `20` |

 ### What gets ingested
@ -365,13 +371,14 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in

 - Notion is knowledge-only — it does not produce semantic layer sources
 - Rate limits apply; large workspaces may require multiple ingestion runs
- `last_successful_cursor` is auto-managed for incremental sync
+- Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add
+  `last_successful_cursor` to `ktx.yaml`

 ## Common errors

 | Error or symptom | Likely cause | Recovery |
 |------------------|--------------|----------|
-| Adapter cannot read source files | `source_dir`, `repo_url`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials |
+| Adapter cannot read source files | `source_dir`, `repo_url`, `repoUrl`, `metricflow.repoUrl`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials |
 | Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file |
 | Ingest creates duplicate context | Existing source names or wiki pages do not match imported terminology | Review the diff, rename duplicates, and add wiki pages with canonical names |
 | Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully |
--- a/docs-site/content/docs/integrations/primary-sources.mdx
+++ b/docs-site/content/docs/integrations/primary-sources.mdx
@ -27,6 +27,9 @@ Agents should prefer environment or file references over literal secrets.
 | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan |
 | `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it |
 | `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference |
+| `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job |
+| `job_timeout_ms` | No | BigQuery | BigQuery query job timeout in milliseconds |
+| `project_id` | No | BigQuery | Optional local descriptor and mapping metadata; not used for BigQuery authentication |

 ## PostgreSQL

@ -216,6 +219,9 @@ For multiple datasets:
 | Environment variable | `credentials_json: env:BIGQUERY_CREDENTIALS_JSON` |

 The project ID is extracted automatically from the service account JSON file.
+If you set `project_id` in `ktx.yaml`, KTX treats it as local descriptor and
+mapping metadata. The BigQuery connector still authenticates with the
+`project_id` inside `credentials_json`.

 ### Features

@ -254,7 +260,7 @@ staged artifact shape as Postgres and Snowflake.
 - Parameter binding uses named `@param` syntax
 - Arrays flattened to comma-separated strings in results
 - Location specified at query execution time
- Supports `maxBytesBilled` and `jobTimeoutMs` limits
+- Supports `max_bytes_billed` and `job_timeout_ms` limits from `ktx.yaml`

 ---