vestige/docs/plans/0002-phase-2-postgres-backend.md
Jan De Landtsheer 9c633c172b
Added postgres admin
added amends to the postgres backend/phase2
2026-04-22 12:10:39 +02:00

1270 lines
61 KiB
Markdown

# Phase 2 Plan: PostgreSQL Backend
**Status**: Draft
**Depends on**: Phase 1 (MemoryStore + Embedder traits, embedding_model registry, domain columns)
**Related**: docs/adr/0001-pluggable-storage-and-network-access.md (Phase 2), docs/prd/001-getting-centralized-vestige.md, docs/plans/local-dev-postgres-setup.md (local cluster provisioning)
---
## Scope
### In scope
- `PgMemoryStore` struct implementing the Phase 1 `MemoryStore` trait against `sqlx::PgPool`, including compile-time checked queries via `sqlx::query!` / `sqlx::query_as!`.
- First-class `pgvector` integration: typed `Vector` columns, HNSW index (`vector_cosine_ops`, `m = 16`, `ef_construction = 64`), and use of the cosine-distance operator `<=>`.
- First-class Postgres FTS: GENERATED `tsvector` column (`search_vec`) with `setweight` (A=content, B=node_type, C=tags), GIN index, and `websearch_to_tsquery` at query time.
- Hybrid search via Reciprocal Rank Fusion (RRF) expressed as a single SQL statement with CTEs for FTS and vector subqueries, with optional domain filter through array overlap (`&&`).
- sqlx migrations directory at `crates/vestige-core/migrations/postgres/`, numbered `{NNNN}_{name}.up.sql` / `{NNNN}_{name}.down.sql`, runnable by `sqlx::migrate!` at startup and by `sqlx-cli`.
- Offline query cache committed under `crates/vestige-core/.sqlx/` so a DATABASE_URL is not required at build time.
- Backend selection via `vestige.toml`: `[storage]` section with `backend = "sqlite" | "postgres"` plus the per-backend subsection (`[storage.sqlite]`, `[storage.postgres]`). Exclusive at compile time via `postgres-backend` feature, exclusive at runtime via the enum.
- CLI: `vestige migrate --from sqlite --to postgres --sqlite-path <p> --postgres-url <u>` -- streaming copy with progress output.
- CLI: `vestige migrate --reembed --model=<new>` -- O(n) re-embed under a new `Embedder`, registry update, HNSW rebuild.
- Testcontainer-based integration tests using the `pgvector/pgvector:pg16` image, behind the `postgres-backend` feature so SQLite-only builds remain untouched.
- `PgMemoryStore` parity with `SqliteMemoryStore` across every public `MemoryStore` method defined in Phase 1.
### Out of scope
- Phase 3 (network access): HTTP MCP transport, API key auth, `vestige keys` CLI. The `api_keys` DDL is declared by Phase 3; Phase 2 does not create it.
- Phase 4 (emergent domain classification): `DomainClassifier`, HDBSCAN, discover / rename / merge CLI. Phase 2 provisions the `domains` and `domain_scores` columns and the `domains` table structure so Phase 4 slots in without further migration, but does not compute or classify.
- Phase 5 (federation): cross-node sync. The `review_events` table is declared in Phase 1; Phase 2 only references it where FSRS writes happen.
- Changes to the cognitive engine, Phase 1 traits, or the embedding pipeline itself. Phase 2 only adds a backend.
- SQLCipher parity for Postgres. Operator responsibility (TLS to Postgres, pgcrypto, disk-level encryption) is out of scope for this phase.
---
## Prerequisites
### Expected Phase 1 artifacts (consumed, not produced)
Phase 2 treats all of the following as fixed interfaces. Each path is the expected Phase 1 location.
- `crates/vestige-core/src/storage/mod.rs` -- re-exports the trait and the two concrete backends.
- `crates/vestige-core/src/storage/memory_store.rs` -- defines the `MemoryStore` trait (generated by `trait_variant::make` from `LocalMemoryStore`) with the full CRUD, search, FSRS, graph, and domain surface from the PRD. Phase 2 implements every method here.
- `crates/vestige-core/src/storage/types.rs` -- shared value types: `MemoryRecord`, `SchedulingState`, `SearchQuery`, `SearchResult`, `MemoryEdge`, `Domain`, `StoreStats`, `HealthStatus`.
- `crates/vestige-core/src/storage/error.rs` -- `StoreError` enum plus `pub type StoreResult<T> = Result<T, StoreError>`. Phase 2 extends this with `StoreError::Postgres(sqlx::Error)` and `StoreError::Migrate(sqlx::migrate::MigrateError)` via `From` impls (the variants themselves MUST live behind `#[cfg(feature = "postgres-backend")]`).
- `crates/vestige-core/src/embedder/mod.rs` -- `Embedder` trait with `embed`, `model_name`, `dimension`, `model_hash`. Phase 2 calls `model_name()`, `dimension()`, and `model_hash()` for the registry.
- `crates/vestige-core/src/storage/sqlite.rs` -- `SqliteMemoryStore: MemoryStore`. Phase 2's `migrate --from sqlite --to postgres` uses this as the source.
- `crates/vestige-core/src/storage/registry.rs` -- `EmbeddingModelRegistry` abstraction that both backends implement. Phase 2 supplies a Postgres version writing to `embedding_model`.
- `crates/vestige-core/migrations/sqlite/` -- V12 (Phase 1) adds `domains TEXT` (JSON-encoded array), `domain_scores TEXT` (JSON), `embedding_model(name, dimension, hash, created_at)`, and `review_events(id, memory_id, timestamp, rating, prior_state, new_state)`. Phase 2 mirrors every column and table in Postgres.
If any of the above is missing when Phase 2 starts, the first action is to surface the gap back to Phase 1 -- do NOT backfill a partial trait in Phase 2.
### Required crates (declared in Phase 2, not installed by this doc)
The agent running Phase 2 uses `cargo add` in `crates/vestige-core/` for each dependency below. Exact versions and feature flags:
- `sqlx@0.8` with features `runtime-tokio`, `tls-rustls`, `postgres`, `uuid`, `chrono`, `json`, `migrate`, `macros`. Optional (gated by `postgres-backend`).
- `pgvector@0.4` with features `sqlx`. Optional (gated by `postgres-backend`).
- `deadpool` is NOT needed; `sqlx::PgPool` is the pool.
- `toml@0.8` (no features) for `vestige.toml` parsing. Moved to non-optional because both backends share the config surface.
- `figment@0.10` with features `toml`, `env` -- optional, only if Phase 1 has not already picked a config loader. If Phase 1 ships a loader, skip `figment` and reuse.
- `dirs@6` -- already a transitive `directories` dependency; reuse existing.
- `tokio-stream@0.1` (no features). Used by migrate commands for streamed iteration.
- `indicatif@0.17` (no features). Progress bars for the migrate CLI.
- `futures@0.3` with features `std`. Consumed by sqlx stream combinators.
Dev-only (under `[dev-dependencies]` in `crates/vestige-core/Cargo.toml`, gated by `postgres-backend`):
- `testcontainers@0.22` with features `blocking` off, `async` on (default).
- `testcontainers-modules@0.10` with features `postgres`.
- `tokio@1` features `macros`, `rt-multi-thread` (already present for core tests).
- `criterion@0.5` already present; add a new `[[bench]]` entry.
Feature additions in `crates/vestige-core/Cargo.toml`:
```
[features]
postgres-backend = ["dep:sqlx", "dep:pgvector", "dep:tokio-stream", "dep:futures"]
```
`postgres-backend` is OFF by default. `default = ["embeddings", "vector-search", "bundled-sqlite"]` stays unchanged. `vestige-mcp` forwards a new feature `postgres-backend = ["vestige-core/postgres-backend"]`.
### External tooling
- PostgreSQL 16 or newer (uses `gen_random_uuid()` from `pgcrypto` bundled via `CREATE EXTENSION pgcrypto` in migration 0001; pgvector HNSW indexes require pgvector 0.5+).
- The `pgvector` extension installed in the target database (our migration issues `CREATE EXTENSION IF NOT EXISTS vector`).
- `sqlx-cli@0.8` installed on the developer machine for `cargo sqlx prepare --workspace` and `cargo sqlx migrate add` (not a build-time requirement once `.sqlx/` is committed).
- Docker or Podman reachable by the test harness for `testcontainers-modules::postgres` to spin up `pgvector/pgvector:pg16`.
- A local Postgres cluster for `sqlx prepare`, manual migration work, and `vestige migrate --to postgres` smoke runs. The recipe for standing one up on Arch/CachyOS (install, initdb, role + db, pgvector, connection string at `~/.vestige_pg_pw`) lives in `docs/plans/local-dev-postgres-setup.md`. Postgres 18 from the Arch repo satisfies the "16 or newer" requirement above. Phase 2 work assumes `DATABASE_URL` points at that cluster once migrations are applied.
### Assumed Rust toolchain
- Rust 2024 edition.
- MSRV 1.91 (per `CLAUDE.md`). `sqlx 0.8` is compatible.
- `rustflags` unchanged. No `nightly`-only features.
---
## Deliverables
1. Feature gate `postgres-backend` in `crates/vestige-core/Cargo.toml` and `crates/vestige-mcp/Cargo.toml` that cleanly disables all Postgres code paths when off.
2. `crates/vestige-core/src/storage/postgres/mod.rs` -- `PgMemoryStore` struct and `MemoryStore` trait impl (public entry point).
3. `crates/vestige-core/src/storage/postgres/pool.rs` -- `PgMemoryStore::connect(config)` and pool configuration.
4. `crates/vestige-core/src/storage/postgres/search.rs` -- RRF hybrid search query builder and row -> `SearchResult` mapping.
5. `crates/vestige-core/src/storage/postgres/migrations.rs` -- wraps `sqlx::migrate!("./migrations/postgres")` and surfaces typed errors.
6. `crates/vestige-core/src/storage/postgres/registry.rs` -- Postgres `EmbeddingModelRegistry` implementation writing `embedding_model`.
7. `crates/vestige-core/migrations/postgres/0001_init.up.sql` + `0001_init.down.sql` -- extensions, `memories`, `scheduling`, `edges`, `domains`, `embedding_model`, `review_events`, all indexes.
8. `crates/vestige-core/migrations/postgres/0002_hnsw.up.sql` + `0002_hnsw.down.sql` -- HNSW index creation separated so it can be `CREATE INDEX CONCURRENTLY` during reembed.
9. `crates/vestige-core/src/config.rs` -- `VestigeConfig`, `StorageConfig`, `SqliteConfig`, `PostgresConfig`, `EmbeddingsConfig`, plus a single `VestigeConfig::load(path: Option<&Path>)` returning `Result<Self, ConfigError>`.
10. `crates/vestige-core/src/storage/postgres/migrate_cli.rs` -- streaming SQLite-to-Postgres copy, domain-aware, with `indicatif` progress.
11. `crates/vestige-core/src/storage/postgres/reembed.rs` -- `ReembedPlan` and its driver; re-encodes all memories via a supplied `Embedder`, updates `embedding_model`, rebuilds HNSW.
12. `crates/vestige-mcp/src/bin/cli.rs` -- two new `clap` subcommands `Migrate` (union of `--from/--to` and `--reembed` variants, one subcommand or two, see Open Questions) wired to deliverables 10 and 11.
13. `crates/vestige-core/.sqlx/` -- offline query cache, committed.
14. `tests/phase_2/` -- six integration test files listed in the Test Plan.
15. `crates/vestige-core/benches/pg_hybrid_search.rs` -- Criterion benches for RRF search at 1k and 100k memories, gated by `postgres-backend`.
16. `docs/runbook/postgres.md` -- brief ops note covering extension install, `max_connections`, backup discipline, and rollback caveats. (Short; only required for the "rollback of migrate" deliverable.)
---
## Detailed Task Breakdown
### D1. `postgres-backend` feature gate
- **File**: `crates/vestige-core/Cargo.toml`, `crates/vestige-mcp/Cargo.toml`
- **Depends on**: nothing; this is the first change.
- **Rust snippets**:
```toml
# crates/vestige-core/Cargo.toml
[features]
default = ["embeddings", "vector-search", "bundled-sqlite"]
bundled-sqlite = ["rusqlite/bundled"]
encryption = ["rusqlite/bundled-sqlcipher"]
postgres-backend = [
"dep:sqlx",
"dep:pgvector",
"dep:tokio-stream",
"dep:futures",
]
[dependencies]
sqlx = { version = "0.8", default-features = false, features = [
"runtime-tokio", "tls-rustls", "postgres", "uuid", "chrono",
"json", "migrate", "macros",
], optional = true }
pgvector = { version = "0.4", features = ["sqlx"], optional = true }
tokio-stream = { version = "0.1", optional = true }
futures = { version = "0.3", optional = true }
toml = "0.8"
indicatif = "0.17"
```
- **Behavior notes**: keep the two backends mutually compilable per `CLAUDE.md`. Every `use sqlx::...` sits under `#[cfg(feature = "postgres-backend")]`. Every module under `crates/vestige-core/src/storage/postgres/` carries `#![cfg(feature = "postgres-backend")]` as its file-level attribute.
### D2. `PgMemoryStore` core struct
- **File**: `crates/vestige-core/src/storage/postgres/mod.rs`
- **Depends on**: D1, Phase 1 `MemoryStore` trait and value types.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use std::sync::Arc;
use std::time::Duration;
use chrono::{DateTime, Utc};
use pgvector::Vector;
use sqlx::postgres::{PgConnectOptions, PgPoolOptions};
use sqlx::PgPool;
use uuid::Uuid;
use crate::embedder::Embedder;
use crate::storage::error::{StoreError, StoreResult};
use crate::storage::types::{
Domain, HealthStatus, MemoryEdge, MemoryRecord, SchedulingState,
SearchQuery, SearchResult, StoreStats,
};
use crate::storage::memory_store::LocalMemoryStore;
pub mod migrations;
pub mod pool;
pub mod registry;
pub mod search;
pub mod migrate_cli;
pub mod reembed;
/// Postgres-backed implementation of `MemoryStore`.
///
/// Cheaply cloneable. Methods take `&self`; interior state lives inside
/// the `PgPool` (which already provides `Sync` via `Arc` internally).
#[derive(Clone)]
pub struct PgMemoryStore {
pool: PgPool,
embedding_dim: i32,
embedding_model: Arc<EmbeddingModelDescriptor>,
}
#[derive(Debug, Clone)]
pub struct EmbeddingModelDescriptor {
pub name: String,
pub dimension: i32,
pub hash: String,
}
impl PgMemoryStore {
/// Construct a new store. Runs migrations, reads the registry, validates
/// that the embedder matches the registered model.
pub async fn connect(
url: &str,
max_connections: u32,
embedder: &dyn Embedder,
) -> StoreResult<Self>;
/// Low-level constructor for tests: supply an existing pool, skip migrate.
pub async fn from_pool(
pool: PgPool,
embedder: &dyn Embedder,
) -> StoreResult<Self>;
/// Accessor used by migrate/reembed CLI.
pub fn pool(&self) -> &PgPool { &self.pool }
pub fn embedding_dim(&self) -> i32 { self.embedding_dim }
}
#[trait_variant::make(crate::storage::memory_store::MemoryStore: Send)]
impl LocalMemoryStore for PgMemoryStore {
async fn init(&self) -> StoreResult<()>;
async fn health_check(&self) -> StoreResult<HealthStatus>;
async fn insert(&self, record: &MemoryRecord) -> StoreResult<Uuid>;
async fn get(&self, id: Uuid) -> StoreResult<Option<MemoryRecord>>;
async fn update(&self, record: &MemoryRecord) -> StoreResult<()>;
async fn delete(&self, id: Uuid) -> StoreResult<()>;
async fn search(&self, query: &SearchQuery) -> StoreResult<Vec<SearchResult>>;
async fn fts_search(&self, text: &str, limit: usize) -> StoreResult<Vec<SearchResult>>;
async fn vector_search(&self, embedding: &[f32], limit: usize) -> StoreResult<Vec<SearchResult>>;
async fn get_scheduling(&self, memory_id: Uuid) -> StoreResult<Option<SchedulingState>>;
async fn update_scheduling(&self, state: &SchedulingState) -> StoreResult<()>;
async fn get_due_memories(
&self,
before: DateTime<Utc>,
limit: usize,
) -> StoreResult<Vec<(MemoryRecord, SchedulingState)>>;
async fn add_edge(&self, edge: &MemoryEdge) -> StoreResult<()>;
async fn get_edges(&self, node_id: Uuid, edge_type: Option<&str>) -> StoreResult<Vec<MemoryEdge>>;
async fn remove_edge(&self, source: Uuid, target: Uuid, edge_type: &str) -> StoreResult<()>;
async fn get_neighbors(&self, node_id: Uuid, depth: usize) -> StoreResult<Vec<(MemoryRecord, f64)>>;
async fn list_domains(&self) -> StoreResult<Vec<Domain>>;
async fn get_domain(&self, id: &str) -> StoreResult<Option<Domain>>;
async fn upsert_domain(&self, domain: &Domain) -> StoreResult<()>;
async fn delete_domain(&self, id: &str) -> StoreResult<()>;
async fn classify(&self, embedding: &[f32]) -> StoreResult<Vec<(String, f64)>>;
async fn count(&self) -> StoreResult<usize>;
async fn get_stats(&self) -> StoreResult<StoreStats>;
async fn vacuum(&self) -> StoreResult<()>;
}
```
- **SQL (inline within impl methods)**: every call uses `sqlx::query!` or `sqlx::query_as!` for compile-time validation. Examples:
```rust
// insert
sqlx::query!(
r#"
INSERT INTO memories (
id, domains, domain_scores, content, node_type, tags,
embedding, metadata, created_at, updated_at
) VALUES ($1, $2, $3, $4, $5, $6, $7::vector, $8, $9, $10)
"#,
record.id,
&record.domains as &[String],
serde_json::to_value(&record.domain_scores)?,
record.content,
record.node_type,
&record.tags as &[String],
record.embedding.as_ref().map(|v| Vector::from(v.clone())) as Option<Vector>,
record.metadata,
record.created_at,
record.updated_at,
)
.execute(&self.pool)
.await?;
```
- **Behavior notes**:
- `StoreError` gets two new variants behind the feature:
```rust
#[cfg(feature = "postgres-backend")]
#[error("postgres error: {0}")]
Postgres(#[from] sqlx::Error),
#[cfg(feature = "postgres-backend")]
#[error("postgres migration error: {0}")]
Migrate(#[from] sqlx::migrate::MigrateError),
```
- `classify()` on Postgres implements the PRD's cosine-similarity-to-centroid computation inside SQL using `1 - (centroid <=> $1::vector)` over the `domains` table and returns rows sorted descending. This mirrors the behavior a `DomainClassifier` in Phase 4 uses; Phase 2 ships the backend capability but does not call it.
- Connection pool defaults (see D3): `max_connections = 10`, `acquire_timeout = 30s`, `idle_timeout = 600s`, `test_before_acquire = false` (cheap queries; avoid per-acquire roundtrip).
- All methods are `async fn` and use sqlx's `tokio` runtime feature; no blocking `block_on`.
### D3. Pool construction and config wiring
- **File**: `crates/vestige-core/src/storage/postgres/pool.rs`
- **Depends on**: D1, D2, D9.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use sqlx::postgres::{PgConnectOptions, PgPoolOptions};
use sqlx::{ConnectOptions, PgPool};
use std::str::FromStr;
use std::time::Duration;
use crate::config::PostgresConfig;
use crate::storage::error::{StoreError, StoreResult};
pub async fn build_pool(cfg: &PostgresConfig) -> StoreResult<PgPool> {
let mut opts = PgConnectOptions::from_str(&cfg.url)?;
opts = opts
.application_name("vestige")
.statement_cache_capacity(256)
.log_statements(tracing::log::LevelFilter::Debug);
let pool = PgPoolOptions::new()
.max_connections(cfg.max_connections.unwrap_or(10))
.min_connections(0)
.acquire_timeout(Duration::from_secs(cfg.acquire_timeout_secs.unwrap_or(30)))
.idle_timeout(Some(Duration::from_secs(600)))
.max_lifetime(Some(Duration::from_secs(1800)))
.test_before_acquire(false)
.connect_with(opts)
.await?;
Ok(pool)
}
```
- **Behavior notes**: acquire timeout chosen to exceed the 30-second testcontainer spin-up requirement. `application_name = "vestige"` makes `pg_stat_activity` readable from `psql` during debugging.
### D4. sqlx migrations directory
- **File**: `crates/vestige-core/migrations/postgres/0001_init.up.sql`, `0001_init.down.sql`, `0002_hnsw.up.sql`, `0002_hnsw.down.sql`.
- **Depends on**: none (pure SQL).
`0001_init.up.sql`:
```sql
-- Extensions
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS vector;
-- Embedding model registry
-- Mirrors the SQLite table created in Phase 1.
CREATE TABLE embedding_model (
id SMALLINT PRIMARY KEY DEFAULT 1 CHECK (id = 1),
name TEXT NOT NULL,
dimension INTEGER NOT NULL CHECK (dimension > 0),
hash TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Domains table (populated by Phase 4 DomainClassifier; Phase 2 only creates
-- the empty table so list/get/upsert/delete work against both backends).
CREATE TABLE domains (
id TEXT PRIMARY KEY,
label TEXT NOT NULL,
centroid vector,
top_terms TEXT[] NOT NULL DEFAULT '{}',
memory_count INTEGER NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
metadata JSONB NOT NULL DEFAULT '{}'::jsonb
);
-- Core memories table
CREATE TABLE memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
domains TEXT[] NOT NULL DEFAULT '{}',
domain_scores JSONB NOT NULL DEFAULT '{}'::jsonb,
content TEXT NOT NULL,
node_type TEXT NOT NULL DEFAULT 'general',
tags TEXT[] NOT NULL DEFAULT '{}',
embedding vector,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
search_vec TSVECTOR GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(content, '')), 'A') ||
setweight(to_tsvector('english', coalesce(node_type, '')), 'B') ||
setweight(to_tsvector('english', coalesce(array_to_string(tags, ' '), '')), 'C')
) STORED
);
-- FSRS scheduling state (1:1 with memories)
CREATE TABLE scheduling (
memory_id UUID PRIMARY KEY REFERENCES memories(id) ON DELETE CASCADE,
stability DOUBLE PRECISION NOT NULL DEFAULT 0.0,
difficulty DOUBLE PRECISION NOT NULL DEFAULT 0.0,
retrievability DOUBLE PRECISION NOT NULL DEFAULT 1.0,
last_review TIMESTAMPTZ,
next_review TIMESTAMPTZ,
reps INTEGER NOT NULL DEFAULT 0,
lapses INTEGER NOT NULL DEFAULT 0
);
-- Graph edges (spreading activation)
CREATE TABLE edges (
source_id UUID NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
target_id UUID NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
edge_type TEXT NOT NULL DEFAULT 'related',
weight DOUBLE PRECISION NOT NULL DEFAULT 1.0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (source_id, target_id, edge_type)
);
-- FSRS review event log (Phase 1 creates this; Phase 2 mirrors it for Postgres).
-- Append-only. Used for future federation (Phase 5).
CREATE TABLE review_events (
id BIGSERIAL PRIMARY KEY,
memory_id UUID NOT NULL REFERENCES memories(id) ON DELETE CASCADE,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
rating SMALLINT NOT NULL,
prior_state JSONB NOT NULL,
new_state JSONB NOT NULL
);
-- Indexes on memories (vector index is declared separately in 0002_hnsw.up.sql)
CREATE INDEX idx_memories_fts ON memories USING GIN (search_vec);
CREATE INDEX idx_memories_domains ON memories USING GIN (domains);
CREATE INDEX idx_memories_tags ON memories USING GIN (tags);
CREATE INDEX idx_memories_node_type ON memories (node_type);
CREATE INDEX idx_memories_created ON memories (created_at);
CREATE INDEX idx_memories_updated ON memories (updated_at);
-- Indexes on scheduling
CREATE INDEX idx_scheduling_next_review ON scheduling (next_review);
CREATE INDEX idx_scheduling_last_review ON scheduling (last_review);
-- Indexes on edges
CREATE INDEX idx_edges_target ON edges (target_id);
CREATE INDEX idx_edges_source ON edges (source_id);
CREATE INDEX idx_edges_type ON edges (edge_type);
-- Indexes on review_events
CREATE INDEX idx_review_events_memory ON review_events (memory_id);
CREATE INDEX idx_review_events_ts ON review_events (timestamp);
-- Update trigger on memories.updated_at
CREATE OR REPLACE FUNCTION memories_set_updated_at() RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at := now();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_memories_updated_at
BEFORE UPDATE ON memories
FOR EACH ROW EXECUTE FUNCTION memories_set_updated_at();
```
`0001_init.down.sql`:
```sql
DROP TRIGGER IF EXISTS trg_memories_updated_at ON memories;
DROP FUNCTION IF EXISTS memories_set_updated_at();
DROP INDEX IF EXISTS idx_review_events_ts;
DROP INDEX IF EXISTS idx_review_events_memory;
DROP INDEX IF EXISTS idx_edges_type;
DROP INDEX IF EXISTS idx_edges_source;
DROP INDEX IF EXISTS idx_edges_target;
DROP INDEX IF EXISTS idx_scheduling_last_review;
DROP INDEX IF EXISTS idx_scheduling_next_review;
DROP INDEX IF EXISTS idx_memories_updated;
DROP INDEX IF EXISTS idx_memories_created;
DROP INDEX IF EXISTS idx_memories_node_type;
DROP INDEX IF EXISTS idx_memories_tags;
DROP INDEX IF EXISTS idx_memories_domains;
DROP INDEX IF EXISTS idx_memories_fts;
DROP TABLE IF EXISTS review_events;
DROP TABLE IF EXISTS edges;
DROP TABLE IF EXISTS scheduling;
DROP TABLE IF EXISTS memories;
DROP TABLE IF EXISTS domains;
DROP TABLE IF EXISTS embedding_model;
```
`0002_hnsw.up.sql` (separated so reembed can drop-and-recreate without touching the rest of the schema):
```sql
-- HNSW index on memories.embedding.
-- pgvector requires the column to have a typmod (fixed dimension) for HNSW.
-- The dimension is stamped by the application at startup via ALTER TABLE
-- using the embedder's dimension() method (see PgMemoryStore::connect).
-- We express the index with the generic vector_cosine_ops operator class.
CREATE INDEX idx_memories_embedding_hnsw
ON memories USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
```
`0002_hnsw.down.sql`:
```sql
DROP INDEX IF EXISTS idx_memories_embedding_hnsw;
```
- **Behavior notes**:
- pgvector HNSW requires a typmod. `PgMemoryStore::connect` runs `ALTER TABLE memories ALTER COLUMN embedding TYPE vector($N)` with `$N = embedder.dimension()` exactly once, guarded by a check against `embedding_model` (first startup ever) or validated against it on subsequent starts. If `embedder.dimension()` differs from the stored one and `embedding_model` is non-empty, return `StoreError::EmbeddingDimensionMismatch` -- the user must run `vestige migrate --reembed`.
- `ALTER COLUMN ... TYPE vector($N)` on a populated column fails unless the data fits; that is the desired safety net.
- The `tsvector` GENERATED column uses `array_to_string(tags, ' ')` rather than `array_to_tsvector` from the PRD sketch, because `array_to_tsvector` is not a core function in Postgres 16 and would require an extension. The behavior is equivalent for weight C.
- `gen_random_uuid()` comes from `pgcrypto`. In Postgres 13+ it is also available from core; we keep the extension for older compatibility paths.
- MVCC: all table writes are transactional; no explicit locks. `INSERT ... ON CONFLICT DO UPDATE` is used in `upsert_domain`, `update_scheduling`, and edge idempotency.
### D5. Hybrid search via RRF
- **File**: `crates/vestige-core/src/storage/postgres/search.rs`
- **Depends on**: D2, D4.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use pgvector::Vector;
use sqlx::PgPool;
use uuid::Uuid;
use crate::storage::error::StoreResult;
use crate::storage::types::{SearchQuery, SearchResult};
const RRF_K: i32 = 60; // constant from Cormack et al. 2009
const OVERFETCH_MULT: i64 = 3; // matches Phase 1 SQLite overfetch
pub(crate) async fn rrf_search(
pool: &PgPool,
query: &SearchQuery,
) -> StoreResult<Vec<SearchResult>>;
```
SQL for the full hybrid RRF query. Placeholders:
- `$1` = text query (string, may be empty)
- `$2` = embedding (vector)
- `$3` = overfetch limit per branch (int)
- `$4` = final limit (int)
- `$5` = domain filter (text[] or NULL)
- `$6` = node_type filter (text[] or NULL)
- `$7` = tag filter (text[] or NULL)
```sql
WITH params AS (
SELECT
$1::text AS q_text,
$2::vector AS q_vec,
$3::int AS overfetch,
$4::int AS final_limit,
$5::text[] AS dom_filter,
$6::text[] AS nt_filter,
$7::text[] AS tag_filter
),
fts AS (
SELECT m.id,
ts_rank_cd(m.search_vec, websearch_to_tsquery('english', p.q_text)) AS score,
ROW_NUMBER() OVER (
ORDER BY ts_rank_cd(m.search_vec, websearch_to_tsquery('english', p.q_text)) DESC
) AS rank
FROM memories m, params p
WHERE p.q_text <> ''
AND m.search_vec @@ websearch_to_tsquery('english', p.q_text)
AND (p.dom_filter IS NULL OR m.domains && p.dom_filter)
AND (p.nt_filter IS NULL OR m.node_type = ANY(p.nt_filter))
AND (p.tag_filter IS NULL OR m.tags && p.tag_filter)
LIMIT (SELECT overfetch FROM params)
),
vec AS (
SELECT m.id,
1 - (m.embedding <=> p.q_vec) AS score,
ROW_NUMBER() OVER (
ORDER BY m.embedding <=> p.q_vec
) AS rank
FROM memories m, params p
WHERE m.embedding IS NOT NULL
AND p.q_vec IS NOT NULL
AND (p.dom_filter IS NULL OR m.domains && p.dom_filter)
AND (p.nt_filter IS NULL OR m.node_type = ANY(p.nt_filter))
AND (p.tag_filter IS NULL OR m.tags && p.tag_filter)
LIMIT (SELECT overfetch FROM params)
),
fused AS (
SELECT COALESCE(f.id, v.id) AS id,
COALESCE(1.0 / (60 + f.rank), 0.0)
+ COALESCE(1.0 / (60 + v.rank), 0.0) AS rrf_score,
f.score AS fts_score,
v.score AS vector_score
FROM fts f FULL OUTER JOIN vec v ON f.id = v.id
)
SELECT m.id AS "id!: Uuid",
m.domains AS "domains!: Vec<String>",
m.domain_scores AS "domain_scores!: serde_json::Value",
m.content AS "content!",
m.node_type AS "node_type!",
m.tags AS "tags!: Vec<String>",
m.embedding AS "embedding?: Vector",
m.metadata AS "metadata!: serde_json::Value",
m.created_at AS "created_at!: chrono::DateTime<chrono::Utc>",
m.updated_at AS "updated_at!: chrono::DateTime<chrono::Utc>",
fused.rrf_score AS "rrf_score!: f64",
fused.fts_score AS "fts_score?: f64",
fused.vector_score AS "vector_score?: f64"
FROM fused
JOIN memories m ON m.id = fused.id
ORDER BY fused.rrf_score DESC
LIMIT (SELECT final_limit FROM params);
```
- **Behavior notes**:
- `OVERFETCH_MULT * query.limit` is passed as `$3`. Final `$4` is `query.limit`.
- Empty text query is allowed; the `fts` CTE returns zero rows (`p.q_text <> ''`) and the result degrades to pure vector search, which matches `vector_search` behavior.
- Null embedding is allowed; the `vec` CTE returns zero rows and the result degrades to pure FTS, which matches `fts_search` behavior.
- `fts_search` and `vector_search` are separate public methods on the trait. Each uses a simpler single-CTE query derived from the above by removing the other branch. Implementing them as thin wrappers over `rrf_search` with nullified inputs is acceptable but adds one extra plan per call; the explicit implementations win on latency.
- `min_retrievability` in `SearchQuery` is applied as a final filter by joining on `scheduling` in the outer `SELECT`. Adding that join unconditionally regresses simple searches; add it only when `query.min_retrievability.is_some()`.
### D6. `embedding_model` registry impl
- **File**: `crates/vestige-core/src/storage/postgres/registry.rs`
- **Depends on**: D1, D4 (table exists), Phase 1 `EmbeddingModelRegistry` trait.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use sqlx::PgPool;
use crate::embedder::Embedder;
use crate::storage::error::{StoreError, StoreResult};
pub(crate) async fn ensure_registry(
pool: &PgPool,
embedder: &dyn Embedder,
) -> StoreResult<()> {
let row = sqlx::query!(
r#"SELECT name, dimension, hash FROM embedding_model WHERE id = 1"#
)
.fetch_optional(pool)
.await?;
match row {
None => {
sqlx::query!(
r#"
INSERT INTO embedding_model (id, name, dimension, hash)
VALUES (1, $1, $2, $3)
"#,
embedder.model_name(),
embedder.dimension() as i32,
embedder.model_hash(),
)
.execute(pool)
.await?;
// First-ever run: stamp the vector column typmod.
let ddl = format!(
"ALTER TABLE memories ALTER COLUMN embedding TYPE vector({})",
embedder.dimension()
);
sqlx::query(&ddl).execute(pool).await?;
Ok(())
}
Some(r) if r.name == embedder.model_name()
&& r.dimension == embedder.dimension() as i32
&& r.hash == embedder.model_hash() => Ok(()),
Some(r) => Err(StoreError::EmbeddingMismatch {
expected: format!("{} ({}d, {})", r.name, r.dimension, r.hash),
got: format!(
"{} ({}d, {})",
embedder.model_name(),
embedder.dimension(),
embedder.model_hash()
),
}),
}
}
pub(crate) async fn update_registry(
pool: &PgPool,
embedder: &dyn Embedder,
) -> StoreResult<()> {
// Used only by `vestige migrate --reembed` after a full re-encode.
sqlx::query!(
r#"
UPDATE embedding_model
SET name = $1, dimension = $2, hash = $3, created_at = now()
WHERE id = 1
"#,
embedder.model_name(),
embedder.dimension() as i32,
embedder.model_hash(),
)
.execute(pool)
.await?;
Ok(())
}
```
- **Behavior notes**:
- `StoreError::EmbeddingMismatch { expected, got }` already exists in Phase 1; Phase 2 just constructs it.
- The `ALTER TABLE ... TYPE vector(N)` DDL is only issued on first init. On subsequent inits the existing typmod already matches.
- Re-embed flow also uses this module, but the DDL path is different -- see D11.
### D7. `VestigeConfig`: `vestige.toml` backend selection
- **File**: `crates/vestige-core/src/config.rs` (Phase 1 may already own this file; Phase 2 extends, not replaces)
- **Depends on**: D1.
- **Signatures**:
```rust
use std::path::{Path, PathBuf};
use serde::Deserialize;
#[derive(Debug, Clone, Deserialize)]
pub struct VestigeConfig {
#[serde(default)]
pub embeddings: EmbeddingsConfig,
#[serde(default)]
pub storage: StorageConfig,
#[serde(default)]
pub server: ServerConfig,
#[serde(default)]
pub auth: AuthConfig,
}
#[derive(Debug, Clone, Deserialize)]
pub struct EmbeddingsConfig {
pub provider: String, // "fastembed"
pub model: String, // "BAAI/bge-base-en-v1.5"
}
#[derive(Debug, Clone, Deserialize)]
#[serde(tag = "backend", rename_all = "lowercase")]
pub enum StorageConfig {
Sqlite(SqliteConfig),
#[cfg(feature = "postgres-backend")]
Postgres(PostgresConfig),
}
#[derive(Debug, Clone, Deserialize)]
pub struct SqliteConfig {
pub path: PathBuf,
}
#[cfg(feature = "postgres-backend")]
#[derive(Debug, Clone, Deserialize)]
pub struct PostgresConfig {
pub url: String,
#[serde(default)]
pub max_connections: Option<u32>,
#[serde(default)]
pub acquire_timeout_secs: Option<u64>,
}
#[derive(Debug, Clone, Default, Deserialize)]
pub struct ServerConfig { /* Phase 3 fills this in */ }
#[derive(Debug, Clone, Default, Deserialize)]
pub struct AuthConfig { /* Phase 3 fills this in */ }
impl VestigeConfig {
pub fn load(path: Option<&Path>) -> Result<Self, ConfigError>;
pub fn default_path() -> PathBuf; // ~/.vestige/vestige.toml
}
#[derive(Debug, thiserror::Error)]
pub enum ConfigError {
#[error("io: {0}")]
Io(#[from] std::io::Error),
#[error("toml: {0}")]
Toml(#[from] toml::de::Error),
#[error("invalid config: {0}")]
Invalid(String),
}
```
- **Behavior notes**:
- The serde representation matches the PRD: `[storage]` with `backend = "sqlite"` and a matching `[storage.sqlite]` or `[storage.postgres]` subsection.
- Because `StorageConfig` is `#[serde(tag = "backend")]`, an unknown backend string returns a clear error.
- If `postgres-backend` is compiled off and the user writes `backend = "postgres"`, deserialization returns "unknown variant `postgres`" -- loud failure. Phase 2 wraps this into `ConfigError::Invalid("postgres-backend feature not compiled in")`.
- `env`-override hooks (e.g., `VESTIGE_POSTGRES_URL`) are a Phase 3 concern; not added here.
### D8. `vestige migrate --from sqlite --to postgres`
- **File**: `crates/vestige-core/src/storage/postgres/migrate_cli.rs`
- **Depends on**: D2, D6, D7, Phase 1 `SqliteMemoryStore`.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use std::path::Path;
use std::sync::Arc;
use futures::{StreamExt, TryStreamExt};
use indicatif::{ProgressBar, ProgressStyle};
use uuid::Uuid;
use crate::embedder::Embedder;
use crate::storage::error::{StoreError, StoreResult};
use crate::storage::postgres::PgMemoryStore;
use crate::storage::sqlite::SqliteMemoryStore;
#[derive(Debug, Clone)]
pub struct SqliteToPostgresPlan {
pub sqlite_path: std::path::PathBuf,
pub postgres_url: String,
pub max_connections: u32,
pub batch_size: usize, // default 500
}
pub struct MigrationReport {
pub memories_copied: u64,
pub scheduling_rows: u64,
pub edges_copied: u64,
pub review_events_copied: u64,
pub domains_copied: u64,
pub errors: Vec<(Uuid, StoreError)>,
}
pub async fn run_sqlite_to_postgres(
plan: SqliteToPostgresPlan,
embedder: Arc<dyn Embedder>,
) -> StoreResult<MigrationReport>;
```
Algorithm:
1. Open source `SqliteMemoryStore` in read-only mode (`?mode=ro`).
2. Check source `embedding_model` registry; refuse if it disagrees with the supplied embedder unless the user also passed `--reembed`.
3. Open destination `PgMemoryStore` via `connect` (runs migrations, stamps dim).
4. Stream source rows in batches of `plan.batch_size` via a windowed query ordered by `created_at, id` (stable cursor; survives resume).
5. For each batch: begin a Postgres transaction, `INSERT INTO memories ... ON CONFLICT (id) DO NOTHING` for all rows, `INSERT INTO scheduling` likewise, commit. Copy domain assignments (`domains`, `domain_scores`) verbatim -- they are `[]` and `{}` for pre-Phase-4 SQLite data.
6. After memories finish, stream edges and review_events the same way.
7. Emit progress via `indicatif::ProgressBar` (one bar per table, multi-bar). Each 1000 rows log to tracing at INFO.
8. Return `MigrationReport` for the caller to print.
- **Behavior notes**:
- Memory-bounded: batch size 500 and sqlx streams mean memory usage stays O(batch * row_size), not O(total_rows).
- Idempotent: re-running replays only the rows not already present; `ON CONFLICT DO NOTHING` means partial runs recover.
- UUID strings from SQLite are parsed via `Uuid::parse_str` -- any mangled ID pushes to `errors` instead of aborting.
- The FTS `search_vec` is regenerated by Postgres via the GENERATED column; no data to copy.
- `review_events` may not exist in Phase 1 SQLite for pre-V12 databases. The migrator detects missing tables via `SELECT name FROM sqlite_master` and skips gracefully.
- A separate `--dry-run` flag prints the counts per table without writing.
### D9. `vestige migrate --reembed --model=<new>`
- **File**: `crates/vestige-core/src/storage/postgres/reembed.rs`
- **Depends on**: D2, D6, Phase 1 `Embedder`.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use std::sync::Arc;
use std::time::Instant;
use futures::TryStreamExt;
use indicatif::{ProgressBar, ProgressStyle};
use sqlx::PgPool;
use uuid::Uuid;
use crate::embedder::Embedder;
use crate::storage::error::{StoreError, StoreResult};
use crate::storage::postgres::PgMemoryStore;
#[derive(Debug, Clone)]
pub struct ReembedPlan {
pub batch_size: usize, // default 128 (embedder batch)
pub drop_hnsw_first: bool, // default true
pub concurrent_index: bool, // default false; use CREATE INDEX (not CONCURRENTLY)
}
pub struct ReembedReport {
pub rows_updated: u64,
pub duration_secs: f64,
pub index_rebuild_secs: f64,
}
pub async fn run_reembed(
store: &PgMemoryStore,
new_embedder: Arc<dyn Embedder>,
plan: ReembedPlan,
) -> StoreResult<ReembedReport>;
```
Algorithm:
1. Verify `new_embedder.dimension()` != stored dimension OR `new_embedder.model_hash()` != stored hash -- otherwise no-op and return `rows_updated = 0`.
2. `BEGIN; ALTER TABLE memories ALTER COLUMN embedding DROP NOT NULL`; not actually needed (column is already nullable) but shown here for documentation.
3. If `plan.drop_hnsw_first`, execute `DROP INDEX IF EXISTS idx_memories_embedding_hnsw;` so updates are not slowed by index maintenance. This is the recommended path; `REINDEX` is kept in the Open Questions as an alternative.
4. Stream all `id, content` from `memories` ordered by `id`.
5. For each batch of `plan.batch_size`: call `new_embedder.embed_batch(&texts)` (Phase 1 trait exposes batched embedding when available; otherwise loop single `embed`). Then:
```sql
UPDATE memories
SET embedding = v.embedding::vector
FROM UNNEST($1::uuid[], $2::real[][]) AS v(id, embedding)
WHERE memories.id = v.id;
```
6. After all rows updated: run `ALTER TABLE memories ALTER COLUMN embedding TYPE vector($NEW_DIM)` if dimension changed.
7. Rebuild HNSW. If `plan.concurrent_index`, execute `CREATE INDEX CONCURRENTLY idx_memories_embedding_hnsw ...`; else `CREATE INDEX idx_memories_embedding_hnsw ...`.
8. `update_registry` with the new embedder.
9. Return `ReembedReport`.
- **Behavior notes**:
- Memory-bounded: batch_size * 2 (old + new texts) vectors in RAM at any time.
- The dimension change must happen AFTER all rows are updated (pgvector validates typmod on write when a typmod is present; we relax-then-tighten).
- `CONCURRENTLY` builds do not hold `AccessExclusiveLock`, but fail inside a transaction. That's why the outer driver runs index DDL as an autocommit statement (sqlx `execute` outside a pool transaction).
- For `--dry-run`, emit what *would* happen (row count, estimated embedder calls, estimated time using `rows / 50`-per-second baseline for local fastembed) and exit.
### D10. CLI wiring in `vestige-mcp`
- **File**: `crates/vestige-mcp/src/bin/cli.rs`
- **Depends on**: D8, D9, D7. Requires `vestige-mcp` Cargo feature `postgres-backend`.
- **Signatures**:
```rust
#[derive(Subcommand)]
enum Commands {
// existing variants: Stats, Health, Consolidate, Restore, Backup,
// Export, Gc, Dashboard, Ingest, Serve ...
/// Migrate between backends or re-embed memories.
#[cfg(feature = "postgres-backend")]
Migrate(MigrateArgs),
}
#[derive(clap::Args)]
#[cfg(feature = "postgres-backend")]
struct MigrateArgs {
#[command(subcommand)]
action: MigrateAction,
}
#[derive(Subcommand)]
#[cfg(feature = "postgres-backend")]
enum MigrateAction {
/// Copy all memories from SQLite to Postgres.
#[command(name = "copy")]
Copy {
#[arg(long)]
from: String, // "sqlite"
#[arg(long)]
to: String, // "postgres"
#[arg(long)]
sqlite_path: PathBuf,
#[arg(long)]
postgres_url: String,
#[arg(long, default_value = "500")]
batch_size: usize,
#[arg(long)]
dry_run: bool,
},
/// Re-embed all memories with a new embedder.
#[command(name = "reembed")]
Reembed {
#[arg(long)]
model: String,
#[arg(long, default_value = "128")]
batch_size: usize,
#[arg(long, default_value_t = true)]
drop_hnsw_first: bool,
#[arg(long)]
concurrent_index: bool,
#[arg(long)]
dry_run: bool,
},
}
```
The user-facing invocation collapses to the exact string requested by the ADR:
```
vestige migrate copy --from sqlite --to postgres \
--sqlite-path ~/.vestige/vestige.db \
--postgres-url postgresql://localhost/vestige
vestige migrate reembed --model=BAAI/bge-large-en-v1.5
```
An alternate top-level layout (single `vestige migrate` with flags `--from`, `--to`, `--reembed`) is equivalent; the subcommand split is preferred because the two flag sets are disjoint (see Open Question 1).
- **Behavior notes**:
- `--from`/`--to` values are validated; the current Phase 2 build accepts only `sqlite` and `postgres`.
- For `reembed`, the `--model` string resolves to an `Embedder` via a factory already provided by Phase 1 (`Embedder::from_name(&str)`); Phase 2 does not invent new embedder constructors.
- Progress output on `stderr`; machine-readable summary on `stdout` as one-line JSON when `--json` is set (skipped for Phase 2 unless trivial).
### D11. Offline query cache (`.sqlx/`)
- **File**: `crates/vestige-core/.sqlx/` (committed directory of `query-*.json`)
- **Depends on**: all `sqlx::query!` call sites being final.
- **Procedure**: the developer runs `cargo sqlx prepare --workspace` with a live Postgres having the schema applied. Output goes into `crates/vestige-core/.sqlx/`. This directory is committed. CI enforces freshness by running `cargo sqlx prepare --workspace --check` against the same live Postgres (or failing that, any dev can reproduce by setting `SQLX_OFFLINE=true`).
- **Behavior notes**: `SQLX_OFFLINE=true` in `build.rs` or env is the default on CI and for downstream consumers. The `vestige-core` docs add a one-liner in README for contributors: "if you change any SQL in Phase 2 modules, rerun `cargo sqlx prepare` with a live DB."
### D12. Testcontainer harness (integration)
- **File**: `tests/phase_2/common/mod.rs` (the `common` convention used in `tests/phase_2/` crates)
- **Depends on**: D2 through D11.
- **Signatures**:
```rust
#![cfg(feature = "postgres-backend")]
use std::sync::Arc;
use testcontainers_modules::postgres::Postgres;
use testcontainers::{runners::AsyncRunner, ContainerAsync};
use vestige_core::embedder::Embedder;
use vestige_core::storage::postgres::PgMemoryStore;
pub struct PgHarness {
pub container: ContainerAsync<Postgres>,
pub store: PgMemoryStore,
}
impl PgHarness {
pub async fn start(embedder: Arc<dyn Embedder>) -> anyhow::Result<Self> {
let container = Postgres::default()
.with_tag("pg16")
.with_name("pgvector/pgvector")
.start()
.await?;
let port = container.get_host_port_ipv4(5432).await?;
let url = format!(
"postgresql://postgres:postgres@127.0.0.1:{}/postgres", port
);
let store = PgMemoryStore::connect(&url, 4, embedder.as_ref()).await?;
Ok(Self { container, store })
}
}
```
- **Behavior notes**:
- Image `pgvector/pgvector:pg16` bundles pgvector into the official postgres:16 image.
- Pool size 4 is enough for tests without starving the container's default `max_connections = 100`.
- `ContainerAsync` is held for the whole test scope; drop tears down the container.
- A fake `TestEmbedder` in `common/test_embedder.rs` provides a deterministic hash-based embedding (no ONNX dependency in CI).
---
## Test Plan
### Unit tests (colocated in `src/`)
Under `crates/vestige-core/src/storage/postgres/`:
- `pool.rs` -- one test per `build_pool` branch: defaults, explicit `max_connections`, invalid URL returns `StoreError::Postgres`.
- `registry.rs` -- three tests: first-init writes row and alters typmod, reopen with same embedder returns Ok, reopen with different dimension returns `EmbeddingMismatch`.
- `search.rs` -- query-builder unit tests for parameter packing: empty text, null embedding, all three filters null, all three filters populated.
- `migrate_cli.rs` -- `SqliteToPostgresPlan::default` returns sane defaults; plan validation rejects empty URL.
- `reembed.rs` -- `ReembedPlan::no_change` returns `rows_updated == 0` when embedder matches registry (no network call).
- `config.rs` -- five tests covering: valid postgres config, valid sqlite config, unknown backend string, missing subsection, feature-gated postgres without feature compiled in.
### Integration tests (in `tests/phase_2/`)
Each file is a full integration test crate (`[[test]]` in workspace root Cargo).
**`tests/phase_2/pg_trait_parity.rs`**
- Declares the same test matrix as Phase 1's SQLite trait tests, parameterized over `impl MemoryStore`.
- Runs every method: `insert`, `get`, `update`, `delete`, `search`, `fts_search`, `vector_search`, `get_scheduling`, `update_scheduling`, `get_due_memories`, `add_edge`, `get_edges`, `remove_edge`, `get_neighbors`, `list_domains`, `get_domain`, `upsert_domain`, `delete_domain`, `classify`, `count`, `get_stats`, `vacuum`, `health_check`.
- Each test is written once as `async fn roundtrip_<method>(store: &dyn MemoryStore)` and invoked from two wrappers, one for SQLite and one for Postgres.
- Acceptance: every method returns equal results (except for `Uuid` ordering in `list_domains` where the test sorts before comparing).
**`tests/phase_2/pg_hybrid_search_rrf.rs`**
- Inserts 20 memories with known content ("rust async trait", "postgres hnsw vector", "fastembed onnx model", ...).
- Case 1: pure FTS. `SearchQuery { text: Some("rust trait"), embedding: None, ... }` returns the three Rust-related rows in order; `fts_score` populated, `vector_score` null.
- Case 2: pure vector. `SearchQuery { text: None, embedding: Some(embed("rust trait")), ... }` returns the same three rows via cosine; `vector_score` populated, `fts_score` null.
- Case 3: hybrid. Both set -- top hit has both scores; `rrf_score >= 1/(60+1) + 1/(60+1) = 0.0328`.
- Case 4: domain filter. 10 memories tagged with `domains = ["dev"]`, 10 with `["home"]`. Query with `domains: Some(vec!["dev"])` returns only dev memories.
- Case 5: edge case -- empty FTS query plus an embedding behaves identically to `vector_search`; empty embedding plus FTS query behaves identically to `fts_search`.
**`tests/phase_2/pg_migration_sqlite_to_postgres.rs`**
- Populate a fresh SQLite with 10,000 memories (seeded RNG, deterministic content), 4,000 scheduling rows, 2,000 edges.
- Run `run_sqlite_to_postgres` with a test embedder.
- Assert: `count() == 10_000` on destination; spot-check 25 memories byte-for-byte (content, tags, metadata, domains, domain_scores).
- Assert: FSRS fields (`stability`, `difficulty`, `next_review`) preserved per memory.
- Assert: edges preserved by `(source_id, target_id, edge_type)`.
- Assert: re-running the migration is a no-op (`ON CONFLICT DO NOTHING` path); row count unchanged.
**`tests/phase_2/pg_migration_reembed.rs`**
- Start with a fresh store using `TestEmbedder768` (768-dim, hash `h1`). Insert 500 memories.
- Swap to `TestEmbedder1024` (1024-dim, hash `h2`). Run `run_reembed(store, Arc::new(TestEmbedder1024), ReembedPlan::default())`.
- Assert: `rows_updated == 500`; `embedding_model` now has `(name=TestEmbedder1024, dimension=1024, hash=h2)`.
- Assert: `SELECT DISTINCT vector_dims(embedding) FROM memories` returns only `1024`.
- Assert: HNSW index exists after reembed (`SELECT indexrelid FROM pg_indexes WHERE indexname = 'idx_memories_embedding_hnsw'`).
- Assert: memory IDs unchanged (compare pre/post id sets).
- Assert: a hybrid search using `TestEmbedder1024` returns results (post-reembed vectors are queryable).
**`tests/phase_2/pg_config_parsing.rs`**
- Parse six `vestige.toml` snippets:
- sqlite + fastembed -> `StorageConfig::Sqlite`.
- postgres + fastembed -> `StorageConfig::Postgres` with `max_connections = 10`.
- postgres with custom `max_connections = 25` and `acquire_timeout_secs = 60`.
- unknown backend `"mysql"` -> `ConfigError`.
- missing subsection `[storage.postgres]` while `backend = "postgres"` -> `ConfigError`.
- malformed URL (empty) -> `ConfigError::Invalid`.
**`tests/phase_2/pg_concurrency.rs`**
- Spawn 16 tasks, each inserting 100 memories in parallel for 1,600 total.
- Spawn 4 tasks concurrently running `search` queries; none should fail.
- Spawn 2 tasks concurrently running `update_scheduling` on overlapping IDs -- last write wins (MVCC), neither errors.
- Assert: all 1,600 rows present, no deadlocks, every task returns `Ok`.
- Run time < 10 seconds on a cold container.
### Compile-time query verification
- CI step: `cargo sqlx prepare --workspace --check` against a CI-provisioned Postgres (GitHub Actions / Forgejo Actions services block). Fails CI if any `query!` macro goes stale.
- Alternative offline run for contributors: `SQLX_OFFLINE=true cargo check -p vestige-core --features postgres-backend`. CI runs both forms to ensure `.sqlx/` is up to date.
- `.sqlx/` is committed to the repo. A `.gitattributes` entry marks it as `linguist-generated=true` so it doesn't inflate language stats.
### Benchmarks
Under `crates/vestige-core/benches/pg_hybrid_search.rs` (Criterion), gated by `postgres-backend`.
- `pg_search_1k` -- populate 1,000 memories once per bench suite, measure `rrf_search` p50/p99 over 500 iterations. Target: p50 < 10ms, p99 < 30ms on a local container.
- `pg_search_100k` -- 100,000 memories. Target: p50 < 50ms, p99 < 150ms. Validates HNSW scaling.
- Testcontainer shared across both benches via `once_cell`.
- Bench entry in `vestige-core/Cargo.toml`:
```
[[bench]]
name = "pg_hybrid_search"
harness = false
required-features = ["postgres-backend"]
```
---
## Acceptance Criteria
- [ ] `cargo build -p vestige-core --features postgres-backend` -- zero warnings.
- [ ] `cargo build -p vestige-core` (SQLite-only, default features) -- zero warnings; no Postgres symbols referenced.
- [ ] `cargo build -p vestige-mcp --features postgres-backend` -- zero warnings; `vestige` binary exposes the `migrate` subcommand.
- [ ] `cargo clippy --workspace --all-targets --all-features -- -D warnings` -- clean.
- [ ] `cargo sqlx prepare --workspace --check` -- returns success; `.sqlx/` is current.
- [ ] `cargo test -p vestige-core --features postgres-backend --test pg_trait_parity --test pg_hybrid_search_rrf --test pg_migration_sqlite_to_postgres --test pg_migration_reembed --test pg_config_parsing --test pg_concurrency` -- all green.
- [ ] Testcontainer spin-up p50 under 30 seconds on a developer laptop with a warm Docker daemon.
- [ ] `pg_search_100k` Criterion bench reports p50 < 50ms on reference hardware (logged in the ADR comment trail).
- [ ] `vestige migrate copy --from sqlite --to postgres` on a 10,000-memory corpus completes without data loss: row count parity, content byte-parity on a 1 percent sample, FSRS state preserved (stability, difficulty, reps, lapses, next_review), edge count parity.
- [ ] `vestige migrate reembed` with a dimension-changing embedder returns to a fully queryable state: HNSW present, `embedding_model` updated, no stale vectors, memory IDs untouched.
- [ ] Trait parity: every method on `MemoryStore` has at least one passing test against `PgMemoryStore`.
- [ ] Phase 1's existing SQLite suite continues to pass with zero changes required (Phase 2 is additive).
- [ ] The `postgres-backend` feature does not compile in SQLCipher (`encryption`) simultaneously (mutually exclusive at compile time, per project rule).
---
## Rollback Notes
- Every `*.up.sql` has a matching `*.down.sql` in `crates/vestige-core/migrations/postgres/`. `sqlx migrate revert` walks them in reverse order. Manual operator procedure: `sqlx migrate revert --database-url $URL --source crates/vestige-core/migrations/postgres`.
- `vestige migrate copy` is a one-way operation. The source SQLite DB is read-only during the run and untouched afterward; users retain their original file indefinitely. Recommended discipline: copy the SQLite file aside before starting, retain for 30 days.
- `vestige migrate reembed` is destructive to the `embedding` column. Recommended discipline: take a logical backup (`pg_dump --table=memories --table=embedding_model --table=scheduling`) before a reembed run. The tool prints that recommendation before starting and exits non-zero unless `--yes` is passed or the user is on a TTY that confirms.
- Feature-gate strategy: the default build remains SQLite-only. Downstream users pull `postgres-backend` explicitly: `cargo install --features postgres-backend vestige-mcp`. If the Postgres implementation fails in the field, users fall back to SQLite simply by flipping `vestige.toml`'s `[storage] backend = "sqlite"` and restarting. No data re-migration is needed if they retained their SQLite file.
- The `docs/runbook/postgres.md` deliverable (D16) captures this discipline as a one-page ops note.
---
## Open Implementation Questions
Each item has a recommendation. Ship that unless a reviewer objects.
### Q1. CLI shape: subcommand split vs flag union
- **Options**: (a) `vestige migrate copy --from sqlite --to postgres ...` and `vestige migrate reembed --model=...` (subcommand split); (b) `vestige migrate --from sqlite --to postgres ...` and `vestige migrate --reembed --model=...` under one `clap` command with disjoint flag groups (flag union).
- **RECOMMENDATION**: (a) subcommand split. The flag sets do not overlap and clap expresses the constraint more cleanly. The ADR string `vestige migrate --from sqlite --to postgres` can still be documented as a canonical alias by having `copy` accept it verbatim when `--from` is present.
### Q2. Feature flag name
- **Options**: `postgres-backend`, `postgres`, `backend-postgres`, `pg`.
- **RECOMMENDATION**: `postgres-backend`. Matches the ADR text and is explicit in `Cargo.toml` feature listings.
### Q3. sqlx offline mode strategy
- **Options**: (a) commit `.sqlx/` so downstream builds never need DATABASE_URL; (b) require `DATABASE_URL` at build time.
- **RECOMMENDATION**: (a). The repo already ships as a library; many downstream users will build from crates.io with no Postgres available. Committing `.sqlx/` costs ~100 kB.
### Q4. HNSW rebuild strategy during reembed
- **Options**: (a) `DROP INDEX; CREATE INDEX`; (b) `REINDEX INDEX CONCURRENTLY`; (c) `CREATE INDEX CONCURRENTLY` on a new name then swap.
- **RECOMMENDATION**: (a) by default for speed on empty / near-empty tables; expose `--concurrent-index` for large production corpora where locking the table is unacceptable. `REINDEX CONCURRENTLY` on pgvector HNSW is supported in pgvector 0.6+ but the community still reports edge cases with `maintenance_work_mem` -- skip unless a user explicitly opts in.
### Q5. Connection pool sizing default
- **Options**: 4, 10, 20, `cpus() * 2`.
- **RECOMMENDATION**: 10. Matches the PRD example, covers a single-operator load, and does not exhaust the default Postgres `max_connections = 100`. Configurable via `vestige.toml`.
### Q6. Testcontainer image pinning
- **Options**: (a) `pgvector/pgvector:pg16`; (b) `pgvector/pgvector:pg16.2-0.7.4` (exact tag); (c) maintain local Dockerfile.
- **RECOMMENDATION**: (b) pin exact. The float tag `pg16` has shipped breaking changes in the past (e.g., pg 16.0 to 16.1 interop). Pin to a specific pgvector minor and Postgres patch. CI bumps the tag via a single-line change.
### Q7. Empty-text and null-embedding behavior in `search`
- **Options**: (a) return an error if both are missing; (b) return an empty result; (c) return all memories sorted by `created_at DESC`.
- **RECOMMENDATION**: (a). A `search` call with no query is a bug in the caller; returning empty silently would hide the bug. The existing Phase 1 SQLite behavior (TBD but likely errors) is the tiebreaker.
### Q8. `classify()` SQL vs Rust
- **Options**: (a) compute cosine to all centroids in SQL (`SELECT id, 1 - (centroid <=> $1::vector) FROM domains ORDER BY ...`); (b) load centroids, compute in Rust.
- **RECOMMENDATION**: (a). Leverages pgvector's SIMD paths and avoids round-tripping centroid vectors. At Phase 4 scale (tens of centroids) the difference is marginal, but the SQL path is simpler and matches the rest of the backend.
### Q9. FSRS `review_events` writes: trait method vs implicit on `update_scheduling`
- **Options**: (a) add an explicit `record_review(memory_id, rating, prior, new)` method to the Phase 1 trait; (b) have `update_scheduling` write the event atomically.
- **RECOMMENDATION**: this is a Phase 1 question, not Phase 2. Phase 2 implements whichever Phase 1 chose. If Phase 1 missed it, Phase 2 raises a blocker rather than deciding alone.
### Q10. `tsvector` weight for tags -- PRD used `array_to_tsvector`, we used `array_to_string`
- **Options**: (a) `array_to_tsvector(tags)` (requires the `tsvector_extra` extension or similar); (b) `to_tsvector('english', array_to_string(tags, ' '))` (plain core Postgres).
- **RECOMMENDATION**: (b). Equivalent ranking, zero extra extensions. If a future tag matches a stopword (`"the"`), it gets dropped, but that is correct behavior for ranking.
### Q11. `PgMemoryStore::connect` runs migrations automatically?
- **Options**: (a) always run `sqlx::migrate!` on connect; (b) require the user to run `vestige migrate-schema` explicitly before starting the server.
- **RECOMMENDATION**: (a) during Phase 2; revisit in Phase 3 when the server binary exists. Developer ergonomics win now, and the migrations are idempotent.
### Q12. Offline query cache freshness vs `sqlx-cli` version skew
- **Options**: (a) pin `sqlx-cli` version in CI `actions/cache` step; (b) let CI install whatever version `sqlx` depends on.
- **RECOMMENDATION**: (a) pin to the same 0.8.x as the crate. `sqlx prepare` output changes between 0.7 and 0.8 and must match the runtime.
---
## Sequencing
The Phase 2 agent executes deliverables in this order; deliverables not listed can run in any order relative to each other.
1. D1 (feature gate + Cargo deps) -- unblocks everything.
2. D7 (config) -- required to construct `PgMemoryStore`.
3. D4 (migrations SQL) -- required before any `query!` compiles.
4. D3 (pool) + D6 (registry) -- small, used by D2.
5. D2 (`PgMemoryStore` core + trait impl) -- the bulk of Phase 2.
6. D5 (RRF search) -- after D2; requires the trait to exist.
7. D12 (test harness) + parity and search tests -- validates D2 and D5 in isolation.
8. D8 (sqlite->pg migrate) + its integration test.
9. D9 (reembed) + its integration test.
10. D10 (CLI wiring).
11. D11 (`.sqlx/` offline cache) -- last, after SQL is frozen.
12. D15 (benches) + D16 (runbook) -- after acceptance tests pass.
Each deliverable PR includes its own tests; the final Phase 2 PR stacks them (or lands as a single branch if the Phase 1 trait is stable enough to avoid rebase churn).
### Critical Files for Implementation
- /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/postgres/mod.rs
- /home/delandtj/prppl/vestige/crates/vestige-core/migrations/postgres/0001_init.up.sql
- /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/postgres/search.rs
- /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/postgres/migrate_cli.rs
- /home/delandtj/prppl/vestige/crates/vestige-mcp/src/bin/cli.rs