feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
# External-Source Connectors
|
|
|
|
|
|
2026-06-19 02:21:25 -05:00
|
|
|
> Status: **v2.1.27** — GitHub Issues + Redmine reference connectors, plus
|
|
|
|
|
> source-aware investigation filters for search. Tracking issue:
|
feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
> [#57](https://github.com/samvallad33/vestige/issues/57).
|
|
|
|
|
|
|
|
|
|
Connectors let Vestige act as a durable, local **retrieval and reasoning layer**
|
|
|
|
|
over a long-lived external system — a ticket tracker, an issue board, a support
|
|
|
|
|
queue — **without replacing it**. The external system stays the source of truth.
|
|
|
|
|
Vestige indexes its records, embeds them for semantic recall, links them into the
|
|
|
|
|
memory graph, and **cites back** to the canonical record.
|
|
|
|
|
|
|
|
|
|
## Why this is different from a ticket-system MCP
|
|
|
|
|
|
|
|
|
|
The official GitHub / Jira MCP servers are **live API proxies**: every query hits
|
|
|
|
|
the upstream API, is rate-limited, keyword-only, online-only, and has no memory
|
|
|
|
|
of past state. Vestige instead keeps a **durable local index** of the records, so
|
|
|
|
|
you can:
|
|
|
|
|
|
|
|
|
|
- search the history **offline** and **semantically** (embeddings, not just
|
|
|
|
|
keywords),
|
|
|
|
|
- **join** ticket history with the rest of your memory in one search,
|
|
|
|
|
- see a **point-in-time** view (records carry temporal validity),
|
|
|
|
|
- and re-sync **idempotently** — re-running never duplicates a record.
|
|
|
|
|
|
|
|
|
|
## Quick start (GitHub Issues)
|
|
|
|
|
|
|
|
|
|
1. (Optional but recommended) export a token so you get the authenticated rate
|
|
|
|
|
limit (5,000 req/hr vs 60 for anonymous) and access to private repos:
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
export GITHUB_TOKEN=ghp_xxx # or VESTIGE_GITHUB_TOKEN
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The token is read **only** from the environment — never passed as a tool
|
|
|
|
|
argument, never logged.
|
|
|
|
|
|
|
|
|
|
2. Ask your agent to run the `source_sync` MCP tool:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{ "repo": "samvallad33/vestige" }
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
3. Search as normal. Connector-sourced results carry a `sourceRecord` object with
|
|
|
|
|
the canonical issue URL:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"content": "[samvallad33/vestige#57] Roadmap: external source connectors …",
|
|
|
|
|
"sourceRecord": {
|
|
|
|
|
"system": "github",
|
|
|
|
|
"id": "57",
|
|
|
|
|
"url": "https://github.com/samvallad33/vestige/issues/57",
|
|
|
|
|
"project": "samvallad33/vestige",
|
|
|
|
|
"type": "issue",
|
|
|
|
|
"author": "samvallad33",
|
|
|
|
|
"tombstoned": false
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2026-06-19 02:21:25 -05:00
|
|
|
## Quick start (Redmine)
|
|
|
|
|
|
|
|
|
|
Redmine stays the system of record; Vestige indexes a project's issues +
|
|
|
|
|
journals (comments and status/assignment history).
|
|
|
|
|
|
|
|
|
|
1. Point Vestige at the Redmine host and key (env only, never tool args):
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
export REDMINE_URL=https://redmine.example.com
|
|
|
|
|
export REDMINE_API_KEY=xxxxxxxx # or VESTIGE_REDMINE_API_KEY
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The instance must have the REST API enabled (Administration → Settings → API)
|
|
|
|
|
or every call returns 401/403 even with a valid key.
|
|
|
|
|
|
|
|
|
|
2. Run `source_sync`:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{ "source": "redmine", "project": "infra" }
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Results cite the canonical `https://redmine.example.com/issues/<id>` URL.
|
|
|
|
|
|
feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
## The `source_sync` tool
|
|
|
|
|
|
|
|
|
|
| Field | Type | Default | Meaning |
|
|
|
|
|
|---|---|---|---|
|
2026-06-19 02:21:25 -05:00
|
|
|
| `source` | string | `github` | `github` or `redmine`. |
|
|
|
|
|
| `repo` | string | — | **GitHub:** `owner/name`, e.g. `samvallad33/vestige`. |
|
|
|
|
|
| `project` | string | — | **Redmine:** project identifier (host from `REDMINE_URL`). |
|
feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
| `reconcile` | bool | `false` | Also tombstone local memories for issues no longer visible upstream (an extra full-enumeration pass). |
|
2026-06-19 02:21:25 -05:00
|
|
|
| `max_pages` | int | `10` | API pages to fetch this run (≤100 issues each). Lets a first sync of a large project resume across calls. |
|
feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
|
|
|
|
|
The tool returns counts (`created` / `updated` / `unchanged` / `tombstoned`),
|
|
|
|
|
the saved `cursor`, whether it ran authenticated, and a `hint` for the next step.
|
|
|
|
|
|
2026-06-19 02:21:25 -05:00
|
|
|
## Investigation filters (Phase 4)
|
|
|
|
|
|
|
|
|
|
`search` accepts source-aware filters so an agent can scope a query to indexed
|
|
|
|
|
records. All are optional post-filters; combine with a larger `limit` if you
|
|
|
|
|
expect heavy thinning. A source-scoped query excludes non-connector memories.
|
|
|
|
|
|
|
|
|
|
| Filter | Matches |
|
|
|
|
|
|---|---|
|
|
|
|
|
| `source_system` | `github`, `redmine`, … |
|
|
|
|
|
| `source_project` | repo / project (exact) |
|
|
|
|
|
| `source_id` | a specific issue/ticket id |
|
|
|
|
|
| `source_type` | `issue`, `comment`, … |
|
|
|
|
|
| `source_author` | reporter/author (not assignee) |
|
|
|
|
|
| `source_updated_after` / `source_updated_before` | RFC3339 date range (inclusive) |
|
|
|
|
|
| `source_status` | `valid` (default `any`) or `tombstoned` |
|
|
|
|
|
|
|
|
|
|
Status, tracker, and priority are filterable through the existing `tag_prefix`
|
|
|
|
|
(the connectors emit lowercase `status:`, `tracker:`, `priority:`, and GitHub
|
|
|
|
|
`label:` / `state:` tags) — e.g. `tag_prefix: "status:open"`. Assignee and
|
|
|
|
|
linked-issue graph traversal are not yet exposed (see below).
|
|
|
|
|
|
feat(connectors): external-source connector layer + GitHub Issues (#57)
Make Vestige a durable, local, semantically-searchable retrieval layer over an
external system of record (GitHub Issues first), citing back to the canonical
record. Unlike a live ticket-system MCP proxy, Vestige keeps a durable embedded
index: searchable offline, joinable with the rest of memory, temporally
versioned, and re-syncable idempotently with no duplication.
Phases 1-2 of #57 plus a GitHub reference connector and source-aware search:
- Source envelope on KnowledgeNode/IngestInput (source_system, source_id,
source_url, source_updated_at, content_hash, synced_at, source_project,
source_type, source_author). Migration V17: nullable columns (additive),
partial UNIQUE index on (source_system, source_id), connector_cursors table.
- Idempotent sync primitives in vestige-core: upsert_by_source (content-hash
change detection), connector cursor checkpoints, reconcile_source_tombstones
(invalidate-don't-delete via bitemporal valid_until).
- Connector contract + run_sync driver + GitHub Issues connector behind the
optional `connectors` feature (on by default in vestige-mcp, off in the core
library default so non-connector consumers link no HTTP client).
- source_sync MCP tool ({"repo": "owner/name"}); token from GITHUB_TOKEN env
only. Search results gain a sourceRecord citation for connector memories.
Adversarial review fixes: GitHub `since` Z-form (the `+00:00` offset corrupted
the cursor server-side), un-tombstone clears superseded_by too, cursor never
advances past a failing record, Link next-url host-pinned (token-leak guard),
records_seen counts new records only.
Verified: cargo check/test/clippy -D warnings green across the workspace
(default and connectors features); 483 core tests pass. Version bump to 2.1.27
and tag deferred to release.
Refs #57
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 01:21:59 -05:00
|
|
|
### Idempotent, incremental sync
|
|
|
|
|
|
|
|
|
|
Each run:
|
|
|
|
|
|
|
|
|
|
1. resumes from the saved cursor (the high-water mark on the record's upstream
|
|
|
|
|
update time), minus a small overlap window so same-second / clock-skewed
|
|
|
|
|
updates are never missed;
|
|
|
|
|
2. pages issues in ascending update order (`state=all`, so closing an issue is
|
|
|
|
|
**not** mistaken for a deletion), folding each issue + its comments into one
|
|
|
|
|
memory;
|
|
|
|
|
3. routes each record through an **idempotent upsert** keyed on
|
|
|
|
|
`(source_system, source_id)`:
|
|
|
|
|
- unseen record → **insert**,
|
|
|
|
|
- changed content (by content hash) → **update in place** + re-embed,
|
|
|
|
|
- unchanged content → **no-op** (only the "last seen" time advances);
|
|
|
|
|
4. advances and persists the cursor only after the run, so an interruption
|
|
|
|
|
re-scans rather than skips.
|
|
|
|
|
|
|
|
|
|
Re-running `source_sync` on the same repo is therefore safe and cheap — it picks
|
|
|
|
|
up only what changed.
|
|
|
|
|
|
|
|
|
|
### Deletions (tombstoning)
|
|
|
|
|
|
|
|
|
|
Neither GitHub nor Redmine exposes a deletion feed, so an incremental sync can
|
|
|
|
|
never *see* a delete. Pass `reconcile: true` to run a reconciliation pass: Vestige
|
|
|
|
|
enumerates the currently-visible issue ids and **invalidates** (does not purge)
|
|
|
|
|
any local record no longer present. A tombstoned record keeps its content for
|
|
|
|
|
audit but drops out of "currently valid" retrieval (`sourceRecord.tombstoned` is
|
|
|
|
|
`true`). If the record reappears upstream, the next sync un-tombstones it.
|
|
|
|
|
|
|
|
|
|
## The source envelope
|
|
|
|
|
|
|
|
|
|
Every connector-ingested memory carries structured provenance, distinct from the
|
|
|
|
|
legacy free-form `source` label:
|
|
|
|
|
|
|
|
|
|
| Field | Purpose |
|
|
|
|
|
|---|---|
|
|
|
|
|
| `source_system` | `github`, `redmine`, … (namespaces ids). |
|
|
|
|
|
| `source_id` | Native id (issue number, ticket id). |
|
|
|
|
|
| `source_url` | Canonical link back — the citation. |
|
|
|
|
|
| `source_updated_at` | Upstream update time (the sync cursor field). |
|
|
|
|
|
| `content_hash` | Change detector → idempotency. |
|
|
|
|
|
| `synced_at` | When the connector last saw the record live. |
|
|
|
|
|
| `source_project` | Repo / project / space. |
|
|
|
|
|
| `source_type` | `issue`, `comment`, … |
|
|
|
|
|
| `source_author` | Reporter / author upstream. |
|
|
|
|
|
|
|
|
|
|
`(source_system, source_id)` is enforced unique, so there is exactly one memory
|
|
|
|
|
per external record. Legacy memories (agent- or user-authored) have no envelope
|
|
|
|
|
and are completely unaffected.
|
|
|
|
|
|
|
|
|
|
## Building
|
|
|
|
|
|
|
|
|
|
The connector HTTP client is behind the `connectors` cargo feature, which is
|
|
|
|
|
**on by default in the MCP server** (`vestige-mcp`). A build without it still
|
|
|
|
|
exposes the `source_sync` tool but returns a clear "rebuild with `--features
|
|
|
|
|
connectors`" message. The core library (`vestige-core`) leaves the feature
|
|
|
|
|
**off** by default, so library consumers that don't need connectors link no HTTP
|
|
|
|
|
client.
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
# default MCP build already includes connectors
|
|
|
|
|
cargo build -p vestige-mcp --release
|
|
|
|
|
|
|
|
|
|
# explicit, or for the core lib
|
|
|
|
|
cargo build -p vestige-core --features connectors
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Writing a new connector
|
|
|
|
|
|
|
|
|
|
Implement the `Connector` trait in `vestige_core::connectors` (fetch a window of
|
|
|
|
|
records updated since a cursor, page forward, and optionally enumerate live ids
|
|
|
|
|
for reconciliation), produce `NormalizedRecord`s with a filled
|
2026-06-19 02:21:25 -05:00
|
|
|
`SourceEnvelope`, and hand them to `run_sync`. Two reference connectors show the
|
|
|
|
|
shape — `crates/vestige-core/src/connectors/github.rs` (Link-header pagination,
|
|
|
|
|
opaque-url cursor) and `crates/vestige-core/src/connectors/redmine.rs`
|
|
|
|
|
(offset pagination, two-phase list-then-detail fetch). The sync driver,
|
|
|
|
|
idempotent upsert, cursor checkpointing, and tombstone reconciliation are all
|
|
|
|
|
reused for free.
|
|
|
|
|
|
|
|
|
|
## Not yet supported
|
|
|
|
|
|
|
|
|
|
- **Assignee filter** — the envelope stores `source_author` (reporter) only; no
|
|
|
|
|
assignee column yet.
|
|
|
|
|
- **Tracker / version dedicated filter params** — reachable today via
|
|
|
|
|
`tag_prefix` (`tracker:`, and `version:`/`category:` when emitted).
|
|
|
|
|
- **Linked-issue graph traversal** — connectors import relations into the memory
|
|
|
|
|
body, but issue-to-issue graph edges are not yet exposed in search.
|