8.3 KiB
External-Source Connectors
Status: v2.1.27 — GitHub Issues + Redmine reference connectors, plus source-aware investigation filters for search. Tracking issue: #57.
Connectors let Vestige act as a durable, local retrieval and reasoning layer over a long-lived external system — a ticket tracker, an issue board, a support queue — without replacing it. The external system stays the source of truth. Vestige indexes its records, embeds them for semantic recall, links them into the memory graph, and cites back to the canonical record.
Why this is different from a ticket-system MCP
The official GitHub / Jira MCP servers are live API proxies: every query hits the upstream API, is rate-limited, keyword-only, online-only, and has no memory of past state. Vestige instead keeps a durable local index of the records, so you can:
- search the history offline and semantically (embeddings, not just keywords),
- join ticket history with the rest of your memory in one search,
- see a point-in-time view (records carry temporal validity),
- and re-sync idempotently — re-running never duplicates a record.
Quick start (GitHub Issues)
-
(Optional but recommended) export a token so you get the authenticated rate limit (5,000 req/hr vs 60 for anonymous) and access to private repos:
export GITHUB_TOKEN=ghp_xxx # or VESTIGE_GITHUB_TOKENThe token is read only from the environment — never passed as a tool argument, never logged.
-
Ask your agent to run the
source_syncMCP tool:{ "repo": "samvallad33/vestige" } -
Search as normal. Connector-sourced results carry a
sourceRecordobject with the canonical issue URL:{ "content": "[samvallad33/vestige#57] Roadmap: external source connectors …", "sourceRecord": { "system": "github", "id": "57", "url": "https://github.com/samvallad33/vestige/issues/57", "project": "samvallad33/vestige", "type": "issue", "author": "samvallad33", "tombstoned": false } }
Quick start (Redmine)
Redmine stays the system of record; Vestige indexes a project's issues + journals (comments and status/assignment history).
-
Point Vestige at the Redmine host and key (env only, never tool args):
export REDMINE_URL=https://redmine.example.com export REDMINE_API_KEY=xxxxxxxx # or VESTIGE_REDMINE_API_KEYThe instance must have the REST API enabled (Administration → Settings → API) or every call returns 401/403 even with a valid key.
-
Run
source_sync:{ "source": "redmine", "project": "infra" }Results cite the canonical
https://redmine.example.com/issues/<id>URL.
The source_sync tool
| Field | Type | Default | Meaning |
|---|---|---|---|
source |
string | github |
github or redmine. |
repo |
string | — | GitHub: owner/name, e.g. samvallad33/vestige. |
project |
string | — | Redmine: project identifier (host from REDMINE_URL). |
reconcile |
bool | false |
Also tombstone local memories for issues no longer visible upstream (an extra full-enumeration pass). |
max_pages |
int | 10 |
API pages to fetch this run (≤100 issues each). Lets a first sync of a large project resume across calls. |
The tool returns counts (created / updated / unchanged / tombstoned),
the saved cursor, whether it ran authenticated, and a hint for the next step.
Investigation filters (Phase 4)
search accepts source-aware filters so an agent can scope a query to indexed
records. All are optional post-filters; combine with a larger limit if you
expect heavy thinning. A source-scoped query excludes non-connector memories.
| Filter | Matches |
|---|---|
source_system |
github, redmine, … |
source_project |
repo / project (exact) |
source_id |
a specific issue/ticket id |
source_type |
issue, comment, … |
source_author |
reporter/author (not assignee) |
source_updated_after / source_updated_before |
RFC3339 date range (inclusive) |
source_status |
valid (default any) or tombstoned |
Status, tracker, and priority are filterable through the existing tag_prefix
(the connectors emit lowercase status:, tracker:, priority:, and GitHub
label: / state: tags) — e.g. tag_prefix: "status:open". Assignee and
linked-issue graph traversal are not yet exposed (see below).
Idempotent, incremental sync
Each run:
- resumes from the saved cursor (the high-water mark on the record's upstream update time), minus a small overlap window so same-second / clock-skewed updates are never missed;
- pages issues in ascending update order (
state=all, so closing an issue is not mistaken for a deletion), folding each issue + its comments into one memory; - routes each record through an idempotent upsert keyed on
(source_system, source_id):- unseen record → insert,
- changed content (by content hash) → update in place + re-embed,
- unchanged content → no-op (only the "last seen" time advances);
- advances and persists the cursor only after the run, so an interruption re-scans rather than skips.
Re-running source_sync on the same repo is therefore safe and cheap — it picks
up only what changed.
Deletions (tombstoning)
Neither GitHub nor Redmine exposes a deletion feed, so an incremental sync can
never see a delete. Pass reconcile: true to run a reconciliation pass: Vestige
enumerates the currently-visible issue ids and invalidates (does not purge)
any local record no longer present. A tombstoned record keeps its content for
audit but drops out of "currently valid" retrieval (sourceRecord.tombstoned is
true). If the record reappears upstream, the next sync un-tombstones it.
The source envelope
Every connector-ingested memory carries structured provenance, distinct from the
legacy free-form source label:
| Field | Purpose |
|---|---|
source_system |
github, redmine, … (namespaces ids). |
source_id |
Native id (issue number, ticket id). |
source_url |
Canonical link back — the citation. |
source_updated_at |
Upstream update time (the sync cursor field). |
content_hash |
Change detector → idempotency. |
synced_at |
When the connector last saw the record live. |
source_project |
Repo / project / space. |
source_type |
issue, comment, … |
source_author |
Reporter / author upstream. |
(source_system, source_id) is enforced unique, so there is exactly one memory
per external record. Legacy memories (agent- or user-authored) have no envelope
and are completely unaffected.
Building
The connector HTTP client is behind the connectors cargo feature, which is
on by default in the MCP server (vestige-mcp). A build without it still
exposes the source_sync tool but returns a clear "rebuild with --features connectors" message. The core library (vestige-core) leaves the feature
off by default, so library consumers that don't need connectors link no HTTP
client.
# default MCP build already includes connectors
cargo build -p vestige-mcp --release
# explicit, or for the core lib
cargo build -p vestige-core --features connectors
Writing a new connector
Implement the Connector trait in vestige_core::connectors (fetch a window of
records updated since a cursor, page forward, and optionally enumerate live ids
for reconciliation), produce NormalizedRecords with a filled
SourceEnvelope, and hand them to run_sync. Two reference connectors show the
shape — crates/vestige-core/src/connectors/github.rs (Link-header pagination,
opaque-url cursor) and crates/vestige-core/src/connectors/redmine.rs
(offset pagination, two-phase list-then-detail fetch). The sync driver,
idempotent upsert, cursor checkpointing, and tombstone reconciliation are all
reused for free.
Not yet supported
- Assignee filter — the envelope stores
source_author(reporter) only; no assignee column yet. - Tracker / version dedicated filter params — reachable today via
tag_prefix(tracker:, andversion:/category:when emitted). - Linked-issue graph traversal — connectors import relations into the memory body, but issue-to-issue graph edges are not yet exposed in search.