ktx/docs-site/content/docs/integrations/context-sources.mdx

---
title: Context Sources
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, and Notion.
---

Context sources feed your existing analytics tooling into **ktx**. During ingestion, **ktx** extracts metadata from each source and uses a reconciliation agent to reconcile it with your existing semantic layer and knowledge base - preserving accepted edits rather than overwriting.

All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value.

## Ingestion workflow

Agents must configure and ingest context sources in this order:

1. Add the context source connection in `ktx.yaml` or with `ktx setup`.
2. Store tokens as `env:NAME` or `file:/path/to/secret`.
3. Run `ktx ingest <connectionId>` for one source or `ktx ingest --all` for
   every configured source.
4. Review the foreground ingest output.
5. Review generated `semantic-layer/` YAML and `wiki/` Markdown files in git.
6. Validate changed semantic sources with `ktx sl validate`.

## Common source fields

Git repository fields are source-specific. dbt uses top-level `repo_url`,
LookML uses top-level `repoUrl`, and MetricFlow uses nested
`metricflow.repoUrl`.

| Field | Required | Description |
|-------|----------|-------------|
| `driver` | Yes | Source connector: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` |
| `source_dir` | For local file sources | Absolute or project-relative source directory |
| `repo_url` | For Git-hosted dbt sources | Git repository URL |
| `repoUrl` | For Git-hosted LookML sources | Git repository URL |
| `metricflow.repoUrl` | For Git-hosted MetricFlow sources | Git repository URL |
| `branch` | No | Git branch to read |
| `path` | No | Subdirectory inside a monorepo |
| `auth_token_ref` | For private APIs/repos | `env:NAME` or `file:/path/to/secret` token reference |

## dbt

Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project.

### What it provides

- Model and source definitions from `schema.yml` files
- Column descriptions and types
- Test coverage signals
- Semantic model references (if using dbt semantic layer)
- Data lineage between models

### Connection config

```yaml title="ktx.yaml"
connections:
  my-dbt:
    driver: dbt
    source_dir: /path/to/dbt/project
```

For a Git-hosted project:

```yaml title="ktx.yaml"
connections:
  my-dbt:
    driver: dbt
    repo_url: https://github.com/org/dbt-repo
    branch: main
    path: analytics/dbt          # For monorepos
    auth_token_ref: env:GITHUB_TOKEN
```

### Authentication

| Method | Config |
|--------|--------|
| Local path | `source_dir: /absolute/path/to/dbt/project` |
| Public repo | `repo_url: https://github.com/org/repo` |
| Private repo | `repo_url` + `auth_token_ref: env:GITHUB_TOKEN` |

**Optional fields:**

| Field | Description |
|-------|-------------|
| `profiles_path` | Path to `profiles.yml` (if non-standard location) |
| `target` | dbt target name (e.g., `dev`, `prod`) |
| `project_name` | Override auto-detected project name |

### What gets ingested

- YAML semantic sources generated from dbt schema files
- One work unit per semantic source (for projects with >25 YAML files) or all at once for smaller projects
- Column descriptions, tests, and relationships are preserved

---

## MetricFlow

Ingests MetricFlow semantic models and metric definitions. Useful when your team defines metrics in MetricFlow's YAML format.

### What it provides

- Semantic model definitions (entities, dimensions, measures)
- Cross-model metric definitions
- Dimension and entity relationships between models

### Connection config

```yaml title="ktx.yaml"
connections:
  my-metricflow:
    driver: metricflow
    metricflow:
      repoUrl: https://github.com/org/metricflow-repo
      branch: main
      path: dbt_metrics           # Subdirectory for monorepos
      auth_token_ref: env:GITHUB_TOKEN
```

For a local path:

```yaml
    metricflow:
      repoUrl: file:///absolute/path/to/project
```

### Authentication

| Method | Config |
|--------|--------|
| Public repo | `repoUrl: https://github.com/org/repo` |
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
| Local path | `repoUrl: file:///path/to/project` |

### What gets ingested

- Semantic models with their entities, dimensions, and measures
- Metric definitions with their expressions and filters
- Work units organized by connected component (metrics + related semantic models grouped together)

---

## LookML

Ingests LookML view and model definitions from a Git repository. Extracts field definitions, SQL table references, and join relationships.

### What it provides

- View definitions (dimensions, measures, derived tables)
- Model explore definitions and joins
- SQL table name references
- Field-level descriptions and labels

### Connection config

```yaml title="ktx.yaml"
connections:
  my-lookml:
    driver: lookml
    repoUrl: https://github.com/org/lookml-repo
    branch: main
    path: analytics                # Subdirectory for monorepos
    auth_token_ref: env:GITHUB_TOKEN
```

For a local path:

```yaml
    repoUrl: file:///absolute/path/to/lookml
```

### Authentication

| Method | Config |
|--------|--------|
| Public repo | `repoUrl: https://github.com/org/repo` |
| Private repo | `repoUrl` + `auth_token_ref: env:GITHUB_TOKEN` |
| Local path | `repoUrl: file:///path/to/project` |

### What gets ingested

- View and model definitions organized by connected component
- LookML field types mapped to semantic layer column types
- Join definitions and relationship cardinalities
- SQL table references for warehouse mapping validation

### Warehouse mapping

Optionally validate that LookML references match your expected Looker connection:

```yaml
    mappings:
      expectedLookerConnectionName: postgres_connection
```

This validates that LookML model `connection:` declarations match expectations, flagging mismatches during ingestion.

---

## Metabase

Ingests dashboards, questions, and their underlying SQL queries from a Metabase instance. Maps Metabase databases to your **ktx** warehouse connections.

### What it provides

- Dashboard metadata and organization
- Question/query definitions (native SQL and structured queries)
- Table and column usage patterns from queries
- Database-to-warehouse relationship mapping

### Connection config

```yaml title="ktx.yaml"
connections:
  my-metabase:
    driver: metabase
    api_url: https://metabase.company.com
    api_key_ref: env:METABASE_API_KEY
    mappings:
      databaseMappings:
        "3": postgres-main         # Metabase DB ID → ktx connection
      syncEnabled:
        "3": true
      syncMode: ONLY               # Only ingest mapped databases
```

### Authentication

| Method | Config |
|--------|--------|
| API key | `api_key_ref: env:METABASE_API_KEY` |

Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys**.

### What gets ingested

- Semantic sources generated from SQL queries in questions
- Wiki pages for dashboards (purpose, key metrics, relationships)
- Work units per dashboard and per question

### Warehouse mapping

Metabase databases must be mapped to **ktx** connections so ingested context links to the correct warehouse:

```yaml
mappings:
  databaseMappings:
    "<metabase_db_id>": "<ktx_connection_id>"
  syncEnabled:
    "<metabase_db_id>": true
  syncMode: ONLY    # ONLY = restrict to mapped DBs
```

Find Metabase database IDs in **Admin > Databases** - the ID is in the URL when editing a database.

---

## Looker

Ingests explores, looks, and dashboards from a Looker instance via the Looker API. Maps Looker connections to your **ktx** warehouse connections.

### What it provides

- Explore definitions and field metadata
- Dashboard and look configurations
- Query patterns and usage signals
- Looker folder structure for organization context

### Connection config

```yaml title="ktx.yaml"
connections:
  my-looker:
    driver: looker
    base_url: https://looker.company.com
    client_id: your-looker-client-id
    client_secret_ref: env:LOOKER_CLIENT_SECRET
    mappings:
      connectionMappings:
        postgres_connection: postgres-main   # Looker conn → ktx conn
```

### Authentication

| Method | Config |
|--------|--------|
| OAuth client credentials | `client_id` + `client_secret_ref: env:LOOKER_CLIENT_SECRET` |

Generate API credentials in Looker: **Admin > Users > Edit > API Keys**.

### What gets ingested

- Semantic sources from explore field definitions
- Wiki pages for dashboards (purpose, audience, key metrics)
- Triage signals for automated content classification
- Work units per explore and per dashboard

### Warehouse mapping

Map Looker connection names to **ktx** connections so explores link to the correct warehouse:

```yaml
mappings:
  connectionMappings:
    "<looker_connection_name>": "<ktx_connection_id>"
```

Find Looker connection names in **Admin > Database > Connections**.

---

## Notion

Ingests pages and databases from a Notion workspace as wiki pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context.

### What it provides

- Wiki pages synthesized from Notion content
- Page hierarchy and relationships
- Database schemas (when Notion databases describe primary sources)
- Semantic clustering for organized ingestion

### Connection config

```yaml title="ktx.yaml"
connections:
  my-notion:
    driver: notion
    auth_token_ref: env:NOTION_TOKEN
    crawl_mode: selected_roots
    root_page_ids:
      - "abc123def456..."
```

For crawling all accessible pages:

```yaml title="ktx.yaml"
connections:
  my-notion:
    driver: notion
    auth_token_ref: env:NOTION_TOKEN
    crawl_mode: all_accessible
```

### Authentication

| Method | Config |
|--------|--------|
| Internal integration token | `auth_token_ref: env:NOTION_TOKEN` |

Create an integration at [notion.so/my-integrations](https://www.notion.so/my-integrations), then share target pages with the integration.

### Configuration options

| Field | Description | Default |
|-------|-------------|---------|
| `crawl_mode` | `all_accessible` or `selected_roots` | - |
| `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
| `root_database_ids` | Database IDs to include | `[]` |
| `max_pages_per_run` | Pages processed per sync | `1000` |
| `max_knowledge_creates_per_run` | New pages created per sync | `25` |
| `max_knowledge_updates_per_run` | Pages updated per sync | `20` |

### What gets ingested

- Wiki pages synthesized from Notion content (not raw copies)
- Domain context extracted and organized by topic
- Triage signals for classifying page relevance
- Work units clustered by semantic similarity for efficient processing

### Notes

- Notion is knowledge-only - it does not produce semantic layer sources
- Rate limits apply; large workspaces may require multiple ingestion runs
- Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add
  `last_successful_cursor` to `ktx.yaml`

## Common errors

| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| Connector cannot read source files | `source_dir`, `repo_url`, `repoUrl`, `metricflow.repoUrl`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials |
| Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file |
| Ingest creates duplicate context | Existing source names or wiki pages do not match imported terminology | Review the diff, rename duplicates, and add wiki pages with canonical names |
| Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully |
| Generated semantic sources fail validation | Tool metadata does not match the live warehouse schema | Map BI/source databases to primary warehouse connections and rerun validation |