Pluggable storage backend, network access, and emergent domain classification. Introduces MemoryStore + Embedder traits, PgMemoryStore alongside SqliteMemoryStore, HTTP MCP + API key auth, and HDBSCAN-based domain clustering. Phase 5 federation deferred to a follow-up ADR. - docs/adr/0001-pluggable-storage-and-network-access.md -- Accepted - docs/plans/0001-phase-1-storage-trait-extraction.md - docs/plans/0002-phase-2-postgres-backend.md - docs/plans/0003-phase-3-network-access.md - docs/plans/0004-phase-4-emergent-domain-classification.md - docs/prd/001-getting-centralized-vestige.md -- source RFC
62 KiB
Phase 1 Plan: Storage Trait Extraction
Status: Draft Depends on: none Related: docs/adr/0001-pluggable-storage-and-network-access.md (Phase 1)
Scope
In scope
- Introduce a new module
crates/vestige-core/src/storage/memory_store.rsdefining:LocalMemoryStorebase trait (Sync + 'static)MemoryStoreSend-bound alias generated via#[trait_variant::make(MemoryStore: Send)]- Supporting data types referenced by the trait:
MemoryRecord,SchedulingState,SearchQuery,SearchResult,MemoryEdge,Domain,ClassificationResult,StoreStats,HealthStatus,MemoryStoreError.
- Introduce a new module
crates/vestige-core/src/embedder/defining:Embedderasync trait withembed,model_name,dimensionplusmodel_hash(for the registry) and optionalembed_batchwith a default implementation.- Move/adapt the existing
EmbeddingServiceimpl into a new structFastembedEmbedderthat implementsEmbedder.
- Refactor
Storage(existingcrates/vestige-core/src/storage/sqlite.rs) intoSqliteMemoryStore:- Keep the struct, the
writer/readerMutex<Connection>pair, theFSRSScheduler, and the USearchVectorIndex. - Rename the type alias
StoragetoSqliteMemoryStorewith apub type Storage = SqliteMemoryStore;alias for backward source compatibility during the transition. (The trait method surface is the new public contract.) - Implement
LocalMemoryStoreby wrapping existing synchronousrusqlitemethods insideasync fnbodies that call a smallspawn_blocking-or-inline adapter. Bodies MAY block; theasync fnsignature exists becauseLocalMemoryStoreis async.
- Keep the struct, the
- Add a
schema_version = 12migration that introduces two schema additions:embedding_modelregistry table (one-row constraint enforced in code).- Two new TEXT columns on
knowledge_nodes:domains TEXT NOT NULL DEFAULT '[]'anddomain_scores TEXT NOT NULL DEFAULT '{}'(both JSON-encoded).
- Enforce model registry on every write path: on the first non-empty embedding write the model signature is recorded; subsequent writes whose
Embedder::model_name()/dimension()/model_hash()disagree must fail withMemoryStoreError::ModelMismatchbefore touching the DB. - Audit all 29 cognitive modules under
crates/vestige-core/src/neuroscience/andcrates/vestige-core/src/advanced/to confirm they hold no directrusqlite::Connectionreferences, noStoragestruct field, and no SQL strings. Any that do get refactored to take&dyn LocalMemoryStore(local-only modules) or&Arc<dyn MemoryStore>(modules crossingawaitpoints). - Add unit tests alongside each new trait method and integration tests in
tests/phase_1/.
Out of scope
- Implementing
PgMemoryStoreon sqlx + pgvector -- that is Phase 2. vestige migrate --from sqlite --to postgresandvestige migrate --reembed-- Phase 2.- MCP over Streamable HTTP, API key middleware,
api_keystable,vestige keys create|list|revoke-- Phase 3. DomainClassifiermodule, HDBSCAN clustering,vestige domains discover|list|rename|mergeCLI, incremental soft-assignment, cross-domain spreading activation decay -- Phase 4.- Federation, mycelium/mDNS node discovery, review event log table -- Phase 5.
- Removing the
pub type Storage = SqliteMemoryStore;compatibility alias -- that cleanup happens at the end of Phase 4 when no consumers still spell the old name.
Prerequisites
Current code state
- Single concrete type
Storageincrates/vestige-core/src/storage/sqlite.rs(4592 lines, 216 public symbols on the impl blocks, approximately 85 public methods) is the only storage surface the crate exposes. EmbeddingServiceincrates/vestige-core/src/embeddings/local.rsholds the fastembed singleton. No trait exists; callers type-erase via&EmbeddingService.- Migrations live in
crates/vestige-core/src/storage/migrations.rs; the current head is v11. - All cognitive modules in
neuroscience/andadvanced/are pure (verified bygrep rusqlite|Connection::|execute\(|prepare\(returning no matches in those trees). They operate onKnowledgeNode,Vec<f32>,ConnectionRecord, etc. passed in by the caller. vestige-mcpconsumesArc<Storage>incrates/vestige-mcp/src/server.rsand every tool undercrates/vestige-mcp/src/tools/. These call sites will type-check unchanged after the alias is introduced because the trait methods preserve the exact signatures of the existingpub fnonStorage.- Test count reported in
CLAUDE.md: 758 tests (406 mcp + 352 core). This is the no-regression target.
Required crates (add via cargo add under crates/vestige-core)
| Crate | Version | Why |
|---|---|---|
trait-variant |
0.1 |
Generates the Send-bound MemoryStore alias from LocalMemoryStore so Arc<dyn MemoryStore> works under tokio/axum without hand-writing two traits. Listed in PRD section "Crate Dependencies (new)" under Phase 1. |
blake3 |
1 |
Embedder::model_hash() -> [u8; 32] uses blake3 to stabilise the "model signature" stored in the embedding_model registry. Already slated for Phase 3 auth; pulling it forward costs nothing and avoids a second migration to add a hash column. |
async-trait |
0.1 |
Not strictly required with trait-variant on MSRV 1.91 (RPITIT is stable), but used for one utility trait (EmbedderExt) that carries a default embed_batch body. OPTIONAL; see Open Implementation Questions below. |
No changes to vestige-mcp/Cargo.toml are required for Phase 1 -- the new trait lives in vestige-core and the mcp crate continues to depend on the SqliteMemoryStore concrete type (via the Storage alias) until Phase 2 introduces backend selection.
Deliverables
crates/vestige-core/src/storage/memory_store.rs--LocalMemoryStore+MemoryStoretraits and supporting types.crates/vestige-core/src/storage/mod.rs-- updated exports and module wiring.crates/vestige-core/src/storage/sqlite.rs--Storagerenamed toSqliteMemoryStore,impl LocalMemoryStore for SqliteMemoryStoreblock, enforcement hooks for the model registry, serde ofdomains/domain_scorescolumns.crates/vestige-core/src/storage/migrations.rs--MIGRATION_V12_UPaddingembedding_modeltable anddomains,domain_scorescolumns.crates/vestige-core/src/embedder/mod.rs--Embeddertrait and re-exports.crates/vestige-core/src/embedder/fastembed.rs--FastembedEmbedderimplementation.crates/vestige-core/src/embeddings/local.rs-- retained;EmbeddingServicekept as the underlying fastembed holder;FastembedEmbedderwraps it.crates/vestige-core/src/lib.rs-- newpub mod embedder;+ re-exports forMemoryStore,LocalMemoryStore,Embedder,FastembedEmbedder, and the data types.tests/phase_1/trait_round_trip.rs-- integration test: round-trip of every trait method throughSqliteMemoryStore.tests/phase_1/embedding_model_registry.rs-- integration test: first-write registers, mismatch refuses, dimension mismatch refuses.tests/phase_1/domain_column_migration.rs-- integration test: a v11 DB upgraded to v12 readsdomains=[]anddomain_scores={}for all existing rows.tests/phase_1/cognitive_module_isolation.rs-- integration test: every cognitive module compiles and executes against anArc<dyn MemoryStore>without touchingSqliteMemoryStoreconcretely.tests/phase_1/send_bound_variant.rs-- integration test: anArc<dyn MemoryStore>can be moved acrosstokio::spawn.- Updated
tests/phase_1/mod.rs(if the dir already uses a module layout) or individual[[test]]entries intests/e2e/Cargo.tomlas needed -- see "Test Plan" for the exact layout.
Detailed Task Breakdown
D1. Trait + supporting types (memory_store.rs)
- File:
crates/vestige-core/src/storage/memory_store.rs(new). - Depends on:
trait-variantcrate added under vestige-core,chrono,serde_json,uuid,thiserror(all already in Cargo.toml). - Signatures:
//! Backend-agnostic memory store trait.
//!
//! This is the single abstraction every cognitive module sits above. It is
//! intentionally flat: one trait, ~25 methods, no sub-traits.
use std::collections::HashMap;
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
// ----------------------------------------------------------------------------
// ERROR
// ----------------------------------------------------------------------------
/// Error returned by every `LocalMemoryStore` / `MemoryStore` method.
#[non_exhaustive]
#[derive(Debug, thiserror::Error)]
pub enum MemoryStoreError {
#[error("not found: {0}")]
NotFound(String),
#[error("backend error: {0}")]
Backend(String),
#[error(
"embedding model mismatch: store registered {registered_name} (dim {registered_dim}, \
hash {registered_hash}), embedder is {actual_name} (dim {actual_dim}, hash {actual_hash})"
)]
ModelMismatch {
registered_name: String,
registered_dim: usize,
registered_hash: String,
actual_name: String,
actual_dim: usize,
actual_hash: String,
},
#[error("invalid input: {0}")]
InvalidInput(String),
#[error("initialization error: {0}")]
Init(String),
}
impl From<crate::storage::StorageError> for MemoryStoreError {
fn from(e: crate::storage::StorageError) -> Self {
use crate::storage::StorageError as S;
match e {
S::NotFound(s) => MemoryStoreError::NotFound(s),
S::Database(e) => MemoryStoreError::Backend(e.to_string()),
S::Io(e) => MemoryStoreError::Backend(e.to_string()),
S::InvalidTimestamp(s) => MemoryStoreError::Backend(format!("invalid timestamp: {s}")),
S::Init(s) => MemoryStoreError::Init(s),
}
}
}
pub type MemoryStoreResult<T> = std::result::Result<T, MemoryStoreError>;
// ----------------------------------------------------------------------------
// DATA TYPES
// ----------------------------------------------------------------------------
/// Backend-agnostic memory record.
///
/// Phase 1 intentionally keeps this type independent of `KnowledgeNode` to
/// avoid dragging 30+ legacy fields through the trait surface. The SQLite
/// backend converts between `MemoryRecord` and `KnowledgeNode` at the
/// boundary.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryRecord {
pub id: Uuid,
/// Empty = unclassified. Populated in Phase 4.
pub domains: Vec<String>,
/// Raw similarity per domain centroid. Empty until Phase 4 runs clustering.
pub domain_scores: HashMap<String, f64>,
pub content: String,
pub node_type: String,
pub tags: Vec<String>,
pub embedding: Option<Vec<f32>>,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub metadata: serde_json::Value,
}
/// FSRS-6 scheduling state, one row per memory.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SchedulingState {
pub memory_id: Uuid,
pub stability: f64,
pub difficulty: f64,
pub retrievability: f64,
pub last_review: Option<DateTime<Utc>>,
pub next_review: Option<DateTime<Utc>>,
pub reps: u32,
pub lapses: u32,
}
/// Hybrid search request.
#[derive(Debug, Clone, Default)]
pub struct SearchQuery {
pub domains: Option<Vec<String>>,
pub text: Option<String>,
pub embedding: Option<Vec<f32>>,
pub tags: Option<Vec<String>>,
pub node_types: Option<Vec<String>>,
pub limit: usize,
pub min_retrievability: Option<f64>,
}
#[derive(Debug, Clone)]
pub struct SearchResult {
pub record: MemoryRecord,
pub score: f64,
pub fts_score: Option<f64>,
pub vector_score: Option<f64>,
}
/// Edge in the spreading-activation graph.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryEdge {
pub source_id: Uuid,
pub target_id: Uuid,
pub edge_type: String,
pub weight: f64,
pub created_at: DateTime<Utc>,
}
/// A topical domain (populated in Phase 4). Phase 1 only needs the type to
/// shape the trait surface; discover/classify are Phase 4 work.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Domain {
pub id: String,
pub label: String,
pub centroid: Vec<f32>,
pub top_terms: Vec<String>,
pub memory_count: usize,
pub created_at: DateTime<Utc>,
}
/// Result of classifying one vector against all known domains.
#[derive(Debug, Clone)]
pub struct ClassificationResult {
pub scores: HashMap<String, f64>,
pub domains: Vec<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct StoreStats {
pub total_memories: usize,
pub memories_with_embeddings: usize,
pub total_edges: usize,
pub total_domains: usize,
pub registered_model_name: Option<String>,
pub registered_model_dim: Option<usize>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum HealthStatus {
Healthy,
Degraded { reason: String },
Unavailable { reason: String },
}
// ----------------------------------------------------------------------------
// EMBEDDING MODEL SIGNATURE
// ----------------------------------------------------------------------------
/// Snapshot of the embedding model that was used to write vectors into the
/// store. Persisted in the `embedding_model` table; compared on every write
/// before the vector is accepted.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ModelSignature {
pub name: String,
pub dimension: usize,
/// Lowercase hex-encoded blake3 hash, 64 chars.
pub hash: String,
}
// ----------------------------------------------------------------------------
// TRAIT
// ----------------------------------------------------------------------------
/// The single storage abstraction. `trait_variant::make` auto-generates a
/// `MemoryStore` alias with `Send`-bound return futures so `Arc<dyn MemoryStore>`
/// works in tokio/axum contexts.
#[trait_variant::make(MemoryStore: Send)]
pub trait LocalMemoryStore: Sync + 'static {
// --- Lifecycle ---
async fn init(&self) -> MemoryStoreResult<()>;
async fn health_check(&self) -> MemoryStoreResult<HealthStatus>;
// --- Embedding model registry ---
async fn registered_model(&self) -> MemoryStoreResult<Option<ModelSignature>>;
async fn register_model(&self, sig: &ModelSignature) -> MemoryStoreResult<()>;
// --- CRUD ---
async fn insert(&self, record: &MemoryRecord) -> MemoryStoreResult<Uuid>;
async fn get(&self, id: Uuid) -> MemoryStoreResult<Option<MemoryRecord>>;
async fn update(&self, record: &MemoryRecord) -> MemoryStoreResult<()>;
async fn delete(&self, id: Uuid) -> MemoryStoreResult<()>;
// --- Search ---
async fn search(&self, query: &SearchQuery) -> MemoryStoreResult<Vec<SearchResult>>;
async fn fts_search(&self, text: &str, limit: usize) -> MemoryStoreResult<Vec<SearchResult>>;
async fn vector_search(
&self,
embedding: &[f32],
limit: usize,
) -> MemoryStoreResult<Vec<SearchResult>>;
// --- FSRS Scheduling ---
async fn get_scheduling(
&self,
memory_id: Uuid,
) -> MemoryStoreResult<Option<SchedulingState>>;
async fn update_scheduling(&self, state: &SchedulingState) -> MemoryStoreResult<()>;
async fn get_due_memories(
&self,
before: DateTime<Utc>,
limit: usize,
) -> MemoryStoreResult<Vec<(MemoryRecord, SchedulingState)>>;
// --- Graph (spreading activation) ---
async fn add_edge(&self, edge: &MemoryEdge) -> MemoryStoreResult<()>;
async fn get_edges(
&self,
node_id: Uuid,
edge_type: Option<&str>,
) -> MemoryStoreResult<Vec<MemoryEdge>>;
async fn remove_edge(&self, source: Uuid, target: Uuid) -> MemoryStoreResult<()>;
async fn get_neighbors(
&self,
node_id: Uuid,
depth: usize,
) -> MemoryStoreResult<Vec<(MemoryRecord, f64)>>;
// --- Domains (Phase 1: stubs return empty; full impl in Phase 4) ---
async fn list_domains(&self) -> MemoryStoreResult<Vec<Domain>>;
async fn get_domain(&self, id: &str) -> MemoryStoreResult<Option<Domain>>;
async fn upsert_domain(&self, domain: &Domain) -> MemoryStoreResult<()>;
async fn delete_domain(&self, id: &str) -> MemoryStoreResult<()>;
/// Phase 1: returns `Ok(vec![])` since no centroids exist. Phase 4 wires
/// the full soft-assignment pass.
async fn classify(&self, embedding: &[f32]) -> MemoryStoreResult<Vec<(String, f64)>>;
// --- Bulk / Maintenance ---
async fn count(&self) -> MemoryStoreResult<usize>;
async fn get_stats(&self) -> MemoryStoreResult<StoreStats>;
async fn vacuum(&self) -> MemoryStoreResult<()>;
}
- Behavior notes:
- Every method returns
MemoryStoreResult<T>; the trait never exposesrusqlite::Error. LocalMemoryStorerequiresSync + 'staticsoArc<dyn LocalMemoryStore>is usable. The auto-generatedMemoryStorealias addsSendbounds on the returnedimpl Future.register_modelis idempotent: writing the same signature twice isOk(()). Writing a different signature after one is registered returnsMemoryStoreError::ModelMismatch.classifyon Phase 1 returnsOk(vec![])and MUST NOT error; cognitive modules call it and Phase 4 will flesh it out without changing the signature.upsert_domain/delete_domain/list_domains/get_domainoperate against adomainstable that is empty until Phase 4 populates it. Phase 1 still exposes the methods so Phase 2 can implement them against Postgres in one shot.get_neighbors(node_id, depth)withdepth == 0returns just(node, 1.0)if the node exists, otherwiseNotFound.depth > 0performs breadth-first expansion over edges, weight = product of edge weights along the shortest path discovered, capped atmax_neighbors = 256to prevent runaway expansion.
- Every method returns
D2. Storage module wiring (storage/mod.rs)
- File:
crates/vestige-core/src/storage/mod.rs. - Depends on: D1.
- Signatures / diff:
//! Storage Module
//!
//! Backend-agnostic memory store abstraction plus SQLite reference impl.
mod memory_store;
mod migrations;
mod sqlite;
pub use memory_store::{
ClassificationResult, Domain, HealthStatus, LocalMemoryStore, MemoryEdge, MemoryRecord,
MemoryStore, MemoryStoreError, MemoryStoreResult, ModelSignature, SchedulingState,
SearchQuery, SearchResult, StoreStats,
};
pub use migrations::MIGRATIONS;
pub use sqlite::{
ConnectionRecord, ConsolidationHistoryRecord, DreamHistoryRecord, InsightRecord,
IntentionRecord, Result, SmartIngestResult, SqliteMemoryStore, StateTransitionRecord,
StorageError,
};
/// Backwards-compatibility alias. Retained until Phase 4 completes so every
/// existing `Arc<Storage>` call site keeps compiling. Scheduled for removal
/// once no downstream source file references it.
pub type Storage = SqliteMemoryStore;
- Behavior notes:
- The alias MUST be a
pub type(not a re-export), because several tool files pattern onvestige_core::Storagethroughusestatements and we want to keep them compiling verbatim. This has zero runtime cost. StorageErrorstays exported for the 29 existing inherent-method callers; the trait exposesMemoryStoreErrorand providesFrom<StorageError>.
- The alias MUST be a
D3. Rename + trait impl in sqlite.rs
- File:
crates/vestige-core/src/storage/sqlite.rs. - Depends on: D1, D2, D4 (for schema columns), D5/D6 (to have
Embedderto accept oninsert). - Signatures (key excerpts):
pub struct SqliteMemoryStore {
writer: Mutex<Connection>,
reader: Mutex<Connection>,
scheduler: Mutex<FSRSScheduler>,
#[cfg(feature = "embeddings")]
embedding_service: EmbeddingService,
#[cfg(feature = "vector-search")]
vector_index: Mutex<VectorIndex>,
#[cfg(feature = "embeddings")]
query_cache: Mutex<LruCache<String, Vec<f32>>>,
/// Cached model signature. `None` until the first embedding is written.
registered_model: std::sync::RwLock<Option<ModelSignature>>,
}
impl SqliteMemoryStore {
pub fn new(db_path: Option<std::path::PathBuf>) -> MemoryStoreResult<Self> { /* existing body, Result converted */ }
/// Internal: convert a row into a `MemoryRecord` (new mapping reading
/// `domains` / `domain_scores` JSON columns).
fn row_to_record(row: &rusqlite::Row) -> rusqlite::Result<MemoryRecord> { /* ... */ }
/// Internal: given a `MemoryRecord` plus an optional embedding, enforce
/// the registered model signature and return a `MemoryStoreError` if
/// the embedder would produce a mismatched vector.
fn enforce_model(
&self,
incoming: Option<&ModelSignature>,
) -> MemoryStoreResult<()> { /* ... */ }
}
impl crate::storage::memory_store::LocalMemoryStore for SqliteMemoryStore {
async fn init(&self) -> MemoryStoreResult<()> { /* no-op; migrations run in `new` */ Ok(()) }
async fn health_check(&self) -> MemoryStoreResult<HealthStatus> {
// SELECT 1; check vector index loaded; check embedding_model presence.
}
async fn registered_model(&self) -> MemoryStoreResult<Option<ModelSignature>> {
let cached = self.registered_model.read().map_err(|_| MemoryStoreError::Init("registered_model rwlock poisoned".into()))?.clone();
if cached.is_some() {
return Ok(cached);
}
// Fall through to DB read...
}
async fn register_model(&self, sig: &ModelSignature) -> MemoryStoreResult<()> {
// INSERT OR IGNORE; if a row exists and differs, return ModelMismatch.
}
async fn insert(&self, record: &MemoryRecord) -> MemoryStoreResult<Uuid> {
if let Some(vec) = &record.embedding {
// Caller is REQUIRED to have called register_model first (or the
// store auto-registers on the first embedded write -- see
// "embedding_model_registry.rs" test).
let derived = ModelSignature { /* from cache or from record.metadata */ };
self.enforce_model(Some(&derived))?;
if vec.len() != derived.dimension {
return Err(MemoryStoreError::InvalidInput(
format!("embedding length {} != registered dimension {}", vec.len(), derived.dimension),
));
}
}
// Delegate to a private `insert_record_blocking` helper that is the
// current `ingest`/`update_node_content` body, rewritten to accept a
// `MemoryRecord` and to also write `domains` / `domain_scores` JSON.
}
// ... remaining ~24 methods follow the same pattern: convert inputs,
// call the existing synchronous body, convert outputs.
}
- SQL (covered in full in D4 below).
- Behavior notes:
- The
async fnbodies are allowed to be synchronous under the hood (rusqlite is blocking). We do NOT wrap inspawn_blockingfor Phase 1 -- the currentStorageis already used from synchronous code paths (CLI, MCP stdio handler) and forcing the tokio runtime is a Phase 2 concern when we also add sqlx. The trait simply lifts the synchronous body into anasync fnso the signatures match the trait. MSRV 1.91 supports async fn in trait viatrait_variant::make. insertpreserves the current FSRS initialization logic (stability, difficulty, next_review, etc.) -- the new code path convertsMemoryRecord.metadataback intoIngestInput-equivalent fields when needed. All existing inherent methods (ingest,smart_ingest,mark_reviewed, ...) remain onSqliteMemoryStoreuntouched; the trait impl calls into them.registered_modelcache is anRwLock<Option<ModelSignature>>. Invalidated on schema reset. Never mutated after first population until an explicit--reembedmigration (Phase 2) takes the RwLock exclusively and writes a new row.enforce_modelreturnsOk(())if no model is registered yet ANDincoming.is_none()(no-embedding write). ReturnsOk(())if no model is registered andincoming.is_some()after callingregister_model. ReturnsErr(ModelMismatch)if registered and they disagree.domains/domain_scoresserialization usesserde_json::to_stringon write andserde_json::from_stron read. Empty vec ->"[]", empty map ->"{}".NULLin the DB is treated as the empty value for pre-migration rows.- Every existing inherent method is kept verbatim. The trait impl dispatches to them. This is the "no behavior change" guarantee.
- The
D4. Schema migration V12
- File:
crates/vestige-core/src/storage/migrations.rs. - Depends on: D2.
- SQL:
-- Migration V12: embedding model registry + per-memory domain columns.
-- 1. Embedding model registry. Single logical row; the (id = 1) constraint is
-- enforced in code via `register_model` (SQLite CHECK on a single-row
-- table is uglier than a constraint we already enforce in Rust).
CREATE TABLE IF NOT EXISTS embedding_model (
id INTEGER PRIMARY KEY CHECK (id = 1),
name TEXT NOT NULL,
dimension INTEGER NOT NULL,
hash TEXT NOT NULL, -- lowercase hex blake3
created_at TEXT NOT NULL
);
-- 2. Per-memory domain columns (JSON TEXT; SQLite has no native arrays).
ALTER TABLE knowledge_nodes ADD COLUMN domains TEXT NOT NULL DEFAULT '[]';
ALTER TABLE knowledge_nodes ADD COLUMN domain_scores TEXT NOT NULL DEFAULT '{}';
-- 3. Index on the domains JSON column to enable `LIKE '%"dev"%'`-style
-- filter in Phase 4. Kept lightweight here; Postgres will use GIN.
CREATE INDEX IF NOT EXISTS idx_nodes_domains ON knowledge_nodes(domains);
CREATE INDEX IF NOT EXISTS idx_nodes_domain_scores ON knowledge_nodes(domain_scores);
-- 4. Domains catalogue (empty until Phase 4 populates).
CREATE TABLE IF NOT EXISTS domains (
id TEXT PRIMARY KEY,
label TEXT NOT NULL,
centroid BLOB, -- f32 vector, raw bytes
top_terms TEXT NOT NULL DEFAULT '[]',
memory_count INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_domains_created_at ON domains(created_at);
UPDATE schema_version SET version = 12, applied_at = datetime('now');
- Rust changes to
migrations.rs:
pub const MIGRATIONS: &[Migration] = &[
// ... V1..V11 unchanged ...
Migration {
version: 12,
description: "Phase 1: embedding_model registry, domains/domain_scores columns, domains table",
up: MIGRATION_V12_UP,
},
];
const MIGRATION_V12_UP: &str = r#"...SQL above..."#;
- Behavior notes:
- Idempotent:
ALTER TABLE ... ADD COLUMNon SQLite is not idempotent by default, but theapply_migrationsdriver only applies migrations whose version > current. A user who has already applied V12 never sees the SQL again. - The
CHECK (id = 1)onembedding_modelis the only one-row guardrail -- all inserts go throughregister_modelwhich usesINSERT OR IGNORE INTO embedding_model (id, ...) VALUES (1, ...)followed by aSELECTto detect mismatch. centroid BLOBstores the f32 vector using the sameEmbedding::to_bytes()format used innode_embeddings, for consistency.
- Idempotent:
D5. Embedder trait (embedder/mod.rs)
- File:
crates/vestige-core/src/embedder/mod.rs(new). - Depends on:
blake3crate added to vestige-core. - Signatures:
//! Text-to-vector encoding trait. Pluggable per-install.
use std::fmt::Debug;
mod fastembed;
pub use fastembed::FastembedEmbedder;
/// Error returned by every `Embedder` method.
#[non_exhaustive]
#[derive(Debug, thiserror::Error)]
pub enum EmbedderError {
#[error("embedder initialization failed: {0}")]
Init(String),
#[error("embedding generation failed: {0}")]
EmbedFailed(String),
#[error("invalid input: {0}")]
InvalidInput(String),
}
pub type EmbedderResult<T> = std::result::Result<T, EmbedderError>;
/// Pluggable embedder. The storage layer NEVER calls fastembed directly;
/// callers compute vectors via this trait and pass them into `MemoryStore`.
#[trait_variant::make(Embedder: Send)]
pub trait LocalEmbedder: Sync + 'static {
async fn embed(&self, text: &str) -> EmbedderResult<Vec<f32>>;
fn model_name(&self) -> &str;
fn dimension(&self) -> usize;
/// Stable blake3 hash of (model_name || dimension || optional weights
/// digest if available). Lowercase hex, 64 chars.
///
/// Used by `MemoryStore::register_model` to detect silent model drift
/// (e.g. a fastembed minor upgrade that changes vector output).
fn model_hash(&self) -> String;
async fn embed_batch(&self, texts: &[&str]) -> EmbedderResult<Vec<Vec<f32>>> {
// Default: sequential. Backends with native batching override this.
let mut out = Vec::with_capacity(texts.len());
for t in texts {
out.push(self.embed(t).await?);
}
Ok(out)
}
/// Returns the `ModelSignature` describing this embedder. Convenience
/// wrapper over the three accessors above.
fn signature(&self) -> crate::storage::ModelSignature {
crate::storage::ModelSignature {
name: self.model_name().to_string(),
dimension: self.dimension(),
hash: self.model_hash(),
}
}
}
- Behavior notes:
- The
embed_batchdefault implementation is non-trivial only in that backends with genuine batching override it. TheFastembedEmbedderoverrides to callEmbeddingService::embed_batch. model_hash()is intentionally a function, not a constant, so backends with configurable weights (a futureOnnxEmbedderthat loads an arbitrary file) can hash the file bytes into the signature.Embedder(theSendvariant) is what cognitive modules bind against when they holdArc<dyn Embedder>.LocalEmbedderis available for single-threaded callers (CLI, tests).
- The
D6. FastembedEmbedder impl (embedder/fastembed.rs)
- File:
crates/vestige-core/src/embedder/fastembed.rs(new). - Depends on: D5, existing
crate::embeddings::local::EmbeddingService. - Signatures:
use super::{EmbedderError, EmbedderResult, LocalEmbedder};
use crate::embeddings::{EMBEDDING_DIMENSIONS, EmbeddingService, matryoshka_truncate};
pub struct FastembedEmbedder {
inner: EmbeddingService,
cached_hash: std::sync::OnceLock<String>,
}
impl FastembedEmbedder {
pub fn new() -> Self {
Self {
inner: EmbeddingService::new(),
cached_hash: std::sync::OnceLock::new(),
}
}
fn compute_hash(name: &str, dim: usize) -> String {
let mut hasher = blake3::Hasher::new();
hasher.update(name.as_bytes());
hasher.update(&(dim as u64).to_le_bytes());
// fastembed's ONNX bytes are not directly accessible at runtime; we
// use `(name, dim, static fastembed crate version)` as the
// signature. If fastembed ever changes its output deterministically
// between minor versions, bumping the crate version triggers a
// mismatch -- which is exactly the drift we want to detect.
hasher.update(env!("CARGO_PKG_VERSION").as_bytes());
hasher.finalize().to_hex().to_string()
}
}
impl Default for FastembedEmbedder {
fn default() -> Self { Self::new() }
}
impl LocalEmbedder for FastembedEmbedder {
async fn embed(&self, text: &str) -> EmbedderResult<Vec<f32>> {
let emb = self
.inner
.embed(text)
.map_err(|e| EmbedderError::EmbedFailed(e.to_string()))?;
Ok(emb.vector)
}
fn model_name(&self) -> &str { self.inner.model_name() }
fn dimension(&self) -> usize { EMBEDDING_DIMENSIONS }
fn model_hash(&self) -> String {
self.cached_hash
.get_or_init(|| Self::compute_hash(self.inner.model_name(), EMBEDDING_DIMENSIONS))
.clone()
}
async fn embed_batch(&self, texts: &[&str]) -> EmbedderResult<Vec<Vec<f32>>> {
let embs = self
.inner
.embed_batch(texts)
.map_err(|e| EmbedderError::EmbedFailed(e.to_string()))?;
Ok(embs.into_iter().map(|e| e.vector).collect())
}
}
- Behavior notes:
EmbeddingServiceis kept as the fastembed singleton holder;FastembedEmbedderis a thin trait adapter. Existing callers ofEmbeddingServicecontinue to work during the transition.model_hashis deterministic for a given(model_name, EMBEDDING_DIMENSIONS, vestige-core version)triple. This is the drift detector the ADR calls out under "Risks: Embedding model drift".matryoshka_truncateis already applied insideEmbeddingService::embed, so the vectors returned here are the 256-dim Matryoshka-truncated L2-normalized vectors that the rest of the stack expects.
D7. lib.rs re-exports
- File:
crates/vestige-core/src/lib.rs. - Depends on: D1, D2, D5, D6.
- Diff (inserted alongside the existing
pub mod storage;re-exports):
pub mod embedder;
pub use embedder::{Embedder, EmbedderError, EmbedderResult, FastembedEmbedder, LocalEmbedder};
pub use storage::{
ClassificationResult, Domain, HealthStatus, LocalMemoryStore, MemoryEdge, MemoryRecord,
MemoryStore, MemoryStoreError, MemoryStoreResult, ModelSignature, SchedulingState,
SearchQuery, SearchResult, SqliteMemoryStore, Storage, StoreStats,
// Existing re-exports retained:
ConnectionRecord, ConsolidationHistoryRecord, DreamHistoryRecord, InsightRecord,
IntentionRecord, Result, SmartIngestResult, StateTransitionRecord, StorageError,
};
- Behavior notes:
Storageremains a top-level re-export souse vestige_core::Storage;keeps working investige-mcpwithout changes. Post-Phase-4 cleanup will grep the downstream crates and replace.
D8. Cognitive module audit
- Files: all under
crates/vestige-core/src/neuroscience/*.rsandcrates/vestige-core/src/advanced/*.rs-- 21 source files. - Depends on: D1..D7.
- Work: perform the following grep-gate BEFORE and AFTER the refactor:
Grep pattern: "rusqlite|Connection::|execute\\(|prepare\\(|&Storage|SqliteMemoryStore"
Expected in neuroscience/ and advanced/ BEFORE: only a single comment-only hit in `neuroscience/active_forgetting.rs:54` referencing `Storage::suppress_memory` in a doc comment.
Expected AFTER: zero hits that reference `SqliteMemoryStore` concretely. References through `&dyn LocalMemoryStore` or `&Arc<dyn MemoryStore>` are acceptable.
- Behavior notes:
- Current state: the 29 cognitive modules are already pure (they take nodes/vectors/connections as arguments, not a
&Storage). No refactor is required for their bodies. - The only work is the
consolidation/sleep.rsandconsolidation/phases.rspath, which in the current codebase accepts&Storage. These get rewritten to accept&dyn LocalMemoryStore(callable from sync contexts) or&Arc<dyn MemoryStore>(callable from async contexts). See file inventory below. - Actual rewrites (expected number): 3-5 functions across
consolidation/sleep.rsandconsolidation/mod.rs. All trait-object refactors; no logic changes. cognitive.rsinvestige-mcpusesstorage.get_all_connections(). BecauseSqliteMemoryStorekeepsget_all_connectionsas an inherent method AND implementsMemoryStore::get_edges, both call styles keep compiling.cognitive.rsdoes not need to change in Phase 1.
- Current state: the 29 cognitive modules are already pure (they take nodes/vectors/connections as arguments, not a
D9. Backwards-compatible inherent methods on SqliteMemoryStore
- File:
crates/vestige-core/src/storage/sqlite.rs. - Depends on: D3.
- Behavior notes:
- Every one of the 85 existing
pub fnonStorage(e.g.ingest,smart_ingest,mark_reviewed,hybrid_search_filtered,save_intention,save_insight,save_connection,apply_rac1_cascade, ...) stays as an inherent method onSqliteMemoryStore. The Phase 1 refactor ONLY adds the trait impl; it does NOT remove any method, rename any field, or change any SQL. - Internal writes that previously embedded
INSERT INTO knowledge_nodes (...)statements gain two more columns (domains = '[]',domain_scores = '{}') in the INSERT list. These are non-optional columns after migration V12, and their DEFAULT is'[]'/'{}'respectively, so ALTER behaves correctly for pre-existing rows but INSERT statements need to either list the defaults explicitly or rely on the DB default. Plan: explicitly write'[]'and'{}'in everyINSERT INTO knowledge_nodesstatement to avoid surprises if a future migration drops the DEFAULT.
- Every one of the 85 existing
Test Plan
Unit tests (colocated, #[cfg(test)] mod tests at end of each source file)
Every public trait method on LocalMemoryStore gets at least one unit test, exercised through the SqliteMemoryStore impl. The unit test file is crates/vestige-core/src/storage/sqlite.rs (inside the existing mod tests).
vestige_core::storage::sqlite::tests::trait_init_is_idempotent-- callingLocalMemoryStore::inittwice returnsOk(())both times.vestige_core::storage::sqlite::tests::trait_health_check_reports_healthy_on_fresh_db-- assertsHealthStatus::Healthyon a fresh in-memory DB.vestige_core::storage::sqlite::tests::trait_register_model_first_write_succeeds-- after registering a signature,registered_model()returns it.vestige_core::storage::sqlite::tests::trait_register_model_mismatched_write_refused-- registering a second, different signature returnsMemoryStoreError::ModelMismatch.vestige_core::storage::sqlite::tests::trait_register_model_same_signature_idempotent-- registering the same signature twice returnsOk(())both times.vestige_core::storage::sqlite::tests::trait_insert_returns_uuid--insert(record)returns the UUID from the record.vestige_core::storage::sqlite::tests::trait_insert_refuses_dimension_mismatch-- inserting a record with a 512-dim vector into a store registered for 256 dims returnsMemoryStoreError::InvalidInput.vestige_core::storage::sqlite::tests::trait_get_missing_returns_none--get(non_existent_uuid)returnsOk(None).vestige_core::storage::sqlite::tests::trait_get_after_insert_round_trip-- insert then get returns a record equal (by content/tags/type) to the input;domains == [],domain_scores == {}.vestige_core::storage::sqlite::tests::trait_update_modifies_content-- update with new content reflects in subsequentget.vestige_core::storage::sqlite::tests::trait_delete_removes_record--deletethengetreturnsOk(None).vestige_core::storage::sqlite::tests::trait_search_combines_fts_and_vector-- with one memory whose content matches by FTS and another by vector,searchreturns both, higher score for the exact content match.vestige_core::storage::sqlite::tests::trait_fts_search_returns_tokens_match-- verifies FTS path.vestige_core::storage::sqlite::tests::trait_vector_search_returns_cosine_order-- verifies ordering.vestige_core::storage::sqlite::tests::trait_scheduling_round_trip--update_schedulingthenget_schedulingreturns equivalent state.vestige_core::storage::sqlite::tests::trait_get_scheduling_missing_returns_none.vestige_core::storage::sqlite::tests::trait_get_due_memories_returns_in_order-- inserts 3 records with differentnext_review, asserts older-due listed first.vestige_core::storage::sqlite::tests::trait_add_edge_is_idempotent-- adding the same edge twice does not duplicate.vestige_core::storage::sqlite::tests::trait_get_edges_filters_by_type.vestige_core::storage::sqlite::tests::trait_remove_edge_deletes_single.vestige_core::storage::sqlite::tests::trait_get_neighbors_bfs_depth_zero_returns_self_only.vestige_core::storage::sqlite::tests::trait_get_neighbors_bfs_depth_two_expands-- build A->B->C, get_neighbors(A, 2) returns {A, B, C}.vestige_core::storage::sqlite::tests::trait_list_domains_empty_in_phase_1-- Phase 1 has no clustering, solist_domains()returns[].vestige_core::storage::sqlite::tests::trait_upsert_then_get_domain_round_trip.vestige_core::storage::sqlite::tests::trait_delete_domain_idempotent.vestige_core::storage::sqlite::tests::trait_classify_with_no_domains_returns_empty-- verifies Phase 1 stub behavior.vestige_core::storage::sqlite::tests::trait_count_matches_insert_count.vestige_core::storage::sqlite::tests::trait_get_stats_reports_registered_model.vestige_core::storage::sqlite::tests::trait_vacuum_succeeds-- runs and asserts no error.
Every public method on LocalEmbedder gets at least one unit test under crates/vestige-core/src/embedder/fastembed.rs:
vestige_core::embedder::fastembed::tests::embedder_reports_correct_name--model_name()contains "nomic".vestige_core::embedder::fastembed::tests::embedder_reports_256_dimension.vestige_core::embedder::fastembed::tests::embedder_hash_is_stable--model_hash()called twice returns identical string.vestige_core::embedder::fastembed::tests::embedder_hash_includes_crate_version-- a synthetic test that asserts the hash contains the blake3 of(name, 256, VERSION).vestige_core::embedder::fastembed::tests::embedder_embed_smoke-- gated on#[cfg(feature = "embeddings")]; asserts output length == 256.vestige_core::embedder::fastembed::tests::embedder_embed_batch_matches_sequential-- gated; assert batch result equals sequential result.vestige_core::embedder::fastembed::tests::embedder_signature_matches_accessors.
Migration V12 unit tests under crates/vestige-core/src/storage/migrations.rs:
vestige_core::storage::migrations::tests::v12_adds_embedding_model_table-- apply V12 then assertSELECT count(*) FROM sqlite_master WHERE name='embedding_model'== 1.vestige_core::storage::migrations::tests::v12_adds_domains_columns-- assertPRAGMA table_info(knowledge_nodes)includesdomainsanddomain_scores.vestige_core::storage::migrations::tests::v12_default_values_empty_json-- insert a row via raw SQL, read back, assertdomains == '[]'anddomain_scores == '{}'.vestige_core::storage::migrations::tests::v12_is_replayable-- rewindschema_versionto 11, re-apply migrations, does not error (MUST useCREATE TABLE IF NOT EXISTS;ALTER TABLE ADD COLUMNwill be skipped because the driver only re-runs migrations whose version > current -- already covered byapply_migrations).vestige_core::storage::migrations::tests::v12_preserves_existing_rows-- insert rows under V11 schema, upgrade to V12, assertdomains='[]'on those rows.
Supporting-type unit tests under crates/vestige-core/src/storage/memory_store.rs:
vestige_core::storage::memory_store::tests::memory_store_error_from_storage_error-- convertsStorageError::NotFoundtoMemoryStoreError::NotFound.vestige_core::storage::memory_store::tests::model_signature_serde_round_trip.vestige_core::storage::memory_store::tests::memory_record_serde_round_trip.
Integration tests (tests/phase_1/)
Each file is a standalone [[test]] target. The Cargo layout:
tests/phase_1/Cargo.tomlwith:
[package]
name = "vestige-phase-1-tests"
version = "0.0.1"
edition = "2024"
publish = false
[dependencies]
vestige-core = { path = "../../crates/vestige-core" }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
tempfile = "3"
uuid = { version = "1", features = ["v4"] }
chrono = "0.4"
serde_json = "1"
rusqlite = { version = "0.38", features = ["bundled"] }
And added to the workspace Cargo.toml members. Each .rs file below is a #[tokio::test]-using integration test.
tests/phase_1/trait_round_trip.rs
round_trip::insert_get_update_delete-- exercises CRUD via the trait. Inserts a record withdomains=[], gets it, asserts equality, updates content, deletes, asserts not found.round_trip::scheduling_upsert_and_due_scan-- upserts FSRS state for three memories with differentnext_review, callsget_due_memories(Utc::now(), 10), asserts only past-due ones appear.round_trip::edge_crud-- add edge, list edges, remove edge, assert gone.round_trip::search_hybrid_returns_results-- insert three memories, embed one by content match only, one by semantic only, one by both, search with bothtextandembedding, assert all three appear withfts_score/vector_scorecorrectly populated.round_trip::count_and_stats_track_inserts-- after 10 inserts,count()== 10 andget_stats().total_memories== 10.round_trip::vacuum_after_deletes_reclaims-- insert 50, delete 40, callvacuum, assert disk file size decreased (informational; test is lenient if VACUUM was a no-op).round_trip::list_domains_empty_then_upsert_then_delete-- Phase 1 has no discovery, but manual upsert/delete must work so Phase 2's Postgres impl can share the test.round_trip::classify_with_no_domains_returns_empty-- callsclassify(embedding)on a fresh store, assertsVec<(String, f64)>is empty.
tests/phase_1/embedding_model_registry.rs
model_registry::first_embedded_insert_auto_registers-- fresh store; insert a record with a 256-dim vector using aFastembedEmbedder; subsequentregistered_model()returns aSome(ModelSignature)with dim=256.model_registry::second_insert_with_same_signature_succeeds.model_registry::second_insert_with_different_dimension_refused-- register a 256-dim signature, try to insert a 512-dim vector, expectMemoryStoreError::InvalidInput(because dimension does not match registered).model_registry::second_insert_with_different_model_name_refused-- register signature A, callregister_modelwith signature B (same dim, different name), expectMemoryStoreError::ModelMismatch.model_registry::second_insert_with_different_hash_refused-- register signature A, try to register signature A' with the same name and dim but a different hash, expectMemoryStoreError::ModelMismatch.model_registry::no_embedding_insert_allowed_before_registration-- a plain text memory without an embedding must insert successfully even whenregistered_model()isNone.model_registry::stats_reports_registered_model_after_first_write.
tests/phase_1/domain_column_migration.rs
domain_columns::fresh_db_has_v12_schema-- open a fresh store, queryPRAGMA table_info(knowledge_nodes), assertdomainsanddomain_scorescolumns are present with the correct defaults.domain_columns::v11_db_upgrades_cleanly-- programmatically create a DB at V11 by running migrations up to V11 only, insert 5 rows, then invoke the V12 migration, assert all 5 rows now reportdomains=='[]'anddomain_scores=='{}'.domain_columns::empty_domains_serialize_as_brackets-- insert aMemoryRecord { domains: vec![], .. }, then read the underlying SQLite row via a raw query, assert the stored value is"[]", notNULL.domain_columns::populated_domains_round_trip-- insert a record withdomains=["dev","infra"]anddomain_scores={"dev":0.82,"infra":0.71}, read back via the trait, assert equality.domain_columns::domains_table_exists--SELECT name FROM sqlite_master WHERE name='domains'returns one row.
tests/phase_1/cognitive_module_isolation.rs
cognitive_isolation::all_modules_compile_against_dyn_store-- a test function that allocates alet store: Arc<dyn MemoryStore> = Arc::new(SqliteMemoryStore::new(...)?);, then invokes a representative method from every cognitive module passing in records/vectors/edges it reads through the trait. The point is a compile-time gate: if any module still typed againstSqliteMemoryStore, this would fail to compile.cognitive_isolation::spreading_activation_traverses_via_trait-- exerciseActivationNetworkseeded fromstore.get_edges(...)results.cognitive_isolation::synaptic_tagging_consumes_records_via_trait-- buildCapturedMemoryfromstore.get(uuid)and let the tagger compute retroactive importance.cognitive_isolation::hippocampal_index_built_from_store-- load memories viastore.fts_search, buildHippocampalIndex, assert queries against the index work.
tests/phase_1/send_bound_variant.rs
send_bound::arc_dyn_memory_store_moves_across_tokio_tasks-- wrapSqliteMemoryStoreinArc<dyn MemoryStore>, spawn 16 tokio tasks each inserting 10 memories, join all tasks, assert finalcount() == 160. This verifies the#[trait_variant::make(MemoryStore: Send)]emission actually produces aSend-bound future.send_bound::concurrent_readers_one_writer-- 32 concurrent readers callingsearchwhile one writer loops inserting; asserts no panics, no deadlocks, eventual consistency oncount.
tests/phase_1/embedder_trait.rs
embedder::fastembed_implements_embedder_trait--let e: Box<dyn Embedder> = Box::new(FastembedEmbedder::new());compiles ande.dimension()== 256.embedder::signature_matches_memory_store_registry-- take the signature fromEmbedder::signature(), register it viaMemoryStore::register_model, assertregistered_model()returns the same.
Regression verification
cargo build -p vestige-core-- zero warnings.cargo build -p vestige-mcp-- zero warnings.cargo clippy --workspace --all-targets -- -D warnings-- green.cargo test -p vestige-core --lib-- existing 352 core lib tests remain green.cargo test -p vestige-mcp --lib-- existing 406 mcp tests remain green.cargo test -p vestige-core --lib storage::migrations::tests-- explicitly invokes the migration tests added in Phase 1.cargo test -p vestige-core --lib storage::sqlite::tests-- invokes the trait-method unit tests added in Phase 1.cargo test -p vestige-core --lib embedder::fastembed::tests-- invokes embedder unit tests.cargo test -p vestige-phase-1-tests --test trait_round_trip-- Phase 1 integration test file 1.cargo test -p vestige-phase-1-tests --test embedding_model_registry-- Phase 1 integration test file 2.cargo test -p vestige-phase-1-tests --test domain_column_migration-- Phase 1 integration test file 3.cargo test -p vestige-phase-1-tests --test cognitive_module_isolation-- Phase 1 integration test file 4.cargo test -p vestige-phase-1-tests --test send_bound_variant-- Phase 1 integration test file 5.cargo test -p vestige-phase-1-tests --test embedder_trait-- Phase 1 integration test file 6.cargo test -p vestige-phase-1-tests-- convenience: runs all integration test binaries in the Phase 1 crate.cargo test -p vestige-e2e-- existing e2e harness runs unchanged; no new tests here but existing ones must pass.
Acceptance Criteria
cargo build -p vestige-core-- zero warnings.cargo build -p vestige-mcp-- zero warnings.cargo build --workspace --all-targets-- zero warnings.cargo clippy --workspace --all-targets -- -D warnings-- exits 0.cargo test -p vestige-core-- all 352 existing core tests plus new Phase 1 unit tests pass.cargo test -p vestige-mcp-- all 406 existing mcp tests pass, unchanged.cargo test -p vestige-phase-1-tests-- all Phase 1 integration tests pass.cargo test -p vestige-e2e-- existing e2e journey suite passes unchanged.- Cumulative test count >= 758 (the pre-Phase-1 baseline) plus the new unit and integration additions.
git grep -n 'rusqlite::' crates/vestige-core/src/neuroscience/ crates/vestige-core/src/advanced/-- zero hits (the single pre-existing doc-comment reference inactive_forgetting.rsis acceptable and does not introduce SQL dependency; code references must be zero).git grep -n 'SqliteMemoryStore' crates/vestige-core/src/neuroscience/ crates/vestige-core/src/advanced/-- zero hits.git grep -n 'fastembed::' crates/vestige-core/src/storage/sqlite.rs-- zero hits (Storage must never call fastembed directly; embedding goes through theEmbeddertrait held on the caller side).SqliteMemoryStore::insertrefuses a vector whose dimension disagrees with the registered model (returnsMemoryStoreError::InvalidInput).SqliteMemoryStore::register_modelreturnsMemoryStoreError::ModelMismatchwhen a second, different signature is provided after a first was already registered.- After upgrading a V11 database to V12, every pre-existing row has
domains == "[]"anddomain_scores == "{}"with no NULLs. #[trait_variant::make(MemoryStore: Send)]compiles;Arc<dyn MemoryStore>is movable acrosstokio::spawn.- Migration V12 is idempotent on replay:
apply_migrationsrewound to V11, re-applied, succeeds without error. vestige-core::storage::Storagecontinues to resolve (via thepub typealias) at every current call site investige-mcp.- The
embedding_modeltable can only hold a single row (programmatic invariant -- verified by an integration test that attempts a secondINSERT INTO embedding_model (id = 1, ...)and observes the CHECK-enforced uniqueness). registered_model()is cached on first read; no SELECT is issued againstembedding_modelafter the first hit within the same process (verified by wrapping the reader in a counting proxy in a dedicated test).
Rollback Notes
If Phase 1 fails mid-way, rollback granularity is per-deliverable and the DB can be downgraded by SQL.
- D1 (
memory_store.rs): revert the new file. The trait has zero non-test consumers in Phase 1, so deletion is safe. - D2 (
storage/mod.rs): revert to the prior export list. The only forward-facing identifier is thepub type Storage = SqliteMemoryStore;alias, which becomespub use sqlite::Storage;again onceSqliteMemoryStoreis renamed back toStorage. - D3 (
sqlite.rsrename + trait impl): revert the struct rename (SqliteMemoryStore->Storage). The trait impl is a separateimplblock and can be deleted wholesale. Inherent methods are unchanged and do not need to be touched. Net diff on revert: delete oneimpl LocalMemoryStore for ...block plus the two helper functions (row_to_record,enforce_model). - D4 (Migration V12): DOWN migration script:
-- Phase 1 rollback: drop Phase 1 schema additions.
-- WARNING: this deletes any `domains` / `domain_scores` values stored under V12.
-- Execute ONLY when downgrading from V12 to V11 on a database where no Phase 4
-- work has happened yet (otherwise you lose domain classifications).
DROP TABLE IF EXISTS domains;
DROP INDEX IF EXISTS idx_nodes_domains;
DROP INDEX IF EXISTS idx_nodes_domain_scores;
-- SQLite does not support DROP COLUMN before 3.35; the project's bundled
-- rusqlite uses 3.45+ (see `bundled-sqlite` feature). So the DROP COLUMN
-- form below is safe on every target platform.
ALTER TABLE knowledge_nodes DROP COLUMN domains;
ALTER TABLE knowledge_nodes DROP COLUMN domain_scores;
DROP TABLE IF EXISTS embedding_model;
UPDATE schema_version SET version = 11, applied_at = datetime('now');
Operationally: the DOWN script is NOT included in the source migrations list (migrations are forward-only). If a rollback is required, it is applied manually via sqlite3 vestige.db < rollback_v12.sql. A backup via storage.backup_to(...) MUST be taken before the Phase 1 migration runs in production -- the Storage::backup_to method already exists (line 3903) and does not need changes.
- D5/D6 (
embedder/): delete the module.EmbeddingServiceis untouched, so callers that still use it continue to work. The newEmbeddertrait has no pre-Phase-2 consumers. - D7 (
lib.rs): revert the re-export additions. Zero downstream impact since the new symbols have no pre-Phase-2 consumers. - D8 (cognitive module audit): audit-only, no code changes. Nothing to roll back unless
consolidation/sleep.rswas changed; if so, revert. - Crate-level considerations:
trait-variantmust remain inCargo.tomluntil every consumer of the trait alias has been reverted. Safe to leave in[dependencies]indefinitely; it has no runtime cost.blake3was going to be added in Phase 3 anyway; leaving it in on rollback is harmless.rusqliteversion stays pinned; no bump required for Phase 1.
Open Implementation Questions
Implementation-choice-only. Architectural questions are resolved in ADR 0001.
-
MemoryRecordvsKnowledgeNodeas the trait currency.- Candidate A:
MemoryRecord(new, lean type matching the PRD) -- chosen. - Candidate B: use existing
KnowledgeNodedirectly. - Recommendation: A.
KnowledgeNodecarries 30+ FSRS / dual-strength / sentiment / temporal fields that bind callers to the SQLite columns.MemoryRecordis whatPgMemoryStoreand future backends will want. SQLite impl converts between the two at the boundary, which is a ~40-lineimpl From<KnowledgeNode> for MemoryRecord(and back) shim. Pays for itself in Phase 2.
- Candidate A:
-
async fnin traits vsBox<dyn Future>viaasync-trait.- Candidate A: use
trait-variant(RPITIT-based, MSRV 1.75+, our MSRV is 1.91). - Candidate B: use
async-trait(allocates one Box per call). - Recommendation: A.
trait-variantgenerates both the baseLocalMemoryStoreand theSend-boundMemoryStorefrom one definition, matches what the PRD explicitly calls out, and avoids the allocation overhead of boxed futures on every CRUD call.
- Candidate A: use
-
Blocking SQLite under async signatures: spawn_blocking vs inline.
- Candidate A: bodies call the existing sync
self.writer.lock()...inline inside theasync fn. - Candidate B: bodies wrap in
tokio::task::spawn_blocking. - Recommendation: A for Phase 1. The current call sites are a mix of sync (CLI, bin/restore.rs) and async (MCP handlers). Introducing
spawn_blockingwould force a tokio runtime even for CLI use. Inline blocking underasync fnis a documented pattern that compiles and works; under Phase 2 the Postgres impl usessqlxwhich is natively async, and we can revisit Sqlite blocking policy at that point. Phase 1 priority is "no behavior change".
- Candidate A: bodies call the existing sync
-
Where does
register_modelget called from: storage side auto-register, or caller-side explicit?- Candidate A: caller explicitly calls
store.register_model(embedder.signature())once afterMemoryStore::init. - Candidate B: first
insertwith a vector auto-registers. - Recommendation: B. The current code path (
Storage::ingest->generate_embedding_for_node-> INSERT intonode_embeddings) has no explicit registration step and we want--no behavior change. Auto-register on first embedded write preserves the exact current UX. Callers who care (migration tooling, Phase 2--reembed) can still callregister_modelexplicitly; it is a no-op when idempotent.
- Candidate A: caller explicitly calls
-
model_hashcontent: fastembed ONNX bytes vs(name, dim, crate_version).- Candidate A: hash the ONNX file bytes on disk (after model download).
- Candidate B: hash
(name, dim, vestige-core CARGO_PKG_VERSION). - Recommendation: B. Fastembed caches ONNX files under
FASTEMBED_CACHE_PATH; reading them from insideFastembedEmbedder::new()couples the embedder to fastembed's caching behavior and adds slow startup. Hashing(name, dim, our crate version)catches the "silent model drift between vestige versions" case the ADR calls out under Risks. Phase 2 can add a content-hashedOnnxEmbedderthat loads any file and genuinely hashes it; the trait method signature stays the same.
-
LocalMemoryStoreSync + 'staticor justSync.- Candidate A:
Sync + 'static. - Candidate B:
Sync. - Recommendation: A.
'staticis required forArc<dyn LocalMemoryStore>which is the target call pattern (Axum, MCP server, cognitive engine). Every impl we have in mind --SqliteMemoryStore,PgMemoryStore-- holds owned state (connection pool, vector index), so'staticis free.
- Candidate A:
-
Should trait methods appear on the SQLite impl instead of being separate?
- Candidate A: keep the current ~85 inherent methods on
SqliteMemoryStoreAND add theimpl LocalMemoryStoreblock. - Candidate B: move every inherent method into the trait.
- Recommendation: A. Many inherent methods (e.g.
run_rac1_cascade_sweep,apply_rac1_cascade,save_insight,save_connection,preview_review,get_memory_subgraph) have SQLite-specific semantics, transactional behavior, and call patterns that do not belong in a backend-agnostic trait. They will stay SQLite-only or be extracted into new traits in a post-Phase-4 cleanup. Phase 1's job is to expose the~25 methodscontract the ADR specifies, not to retrofit the entire API.
- Candidate A: keep the current ~85 inherent methods on
-
Where do
Domainbytes (centroid) live?- Candidate A:
BLOBcolumn ondomainstable. - Candidate B: JSON-encoded array of f32 in a
TEXTcolumn. - Recommendation: A. Consistent with how
node_embeddings.embeddingalready stores vectors (little-endian f32 bytes viaEmbedding::to_bytes). JSON would triple the storage size and slow deserialization. TheDomain::centroid: Vec<f32>field round-trips through the same codec.
- Candidate A:
-
Migration numbering when Phase 2 also wants to add a migration.
- Candidate A: Phase 1 takes V12, Phase 2 takes V13.
- Candidate B: Phase 1 takes V12, Phase 2 re-shapes V12 to include its changes.
- Recommendation: A. Migrations are forward-only and append-only in this project. Phase 2 adds V13 (for
review_eventsappend-only table, if that lands in Phase 2 -- otherwise it is Phase 5 work).
-
Integration test crate location: sibling to
tests/e2e/or insidecrates/vestige-core/tests/.- Candidate A: new workspace member at
tests/phase_1/(sibling totests/e2e/). - Candidate B: under
crates/vestige-core/tests/(standard cargo integration-test layout). - Recommendation: A. Matches the existing pattern of
tests/e2e/, which is already a workspace member with its ownCargo.toml. Keeps the Phase 1 test binary outputs in a predictable location (target/debug/deps/trait_round_trip-*). Also avoids the build-graph cycle wherecrates/vestige-core/tests/would re-link everything undervestige-coreeach edit.
- Candidate A: new workspace member at
Critical Files for Implementation
- /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/memory_store.rs (new; contains the
LocalMemoryStore/MemoryStoretraits plusMemoryRecord,SchedulingState,SearchQuery,SearchResult,MemoryEdge,Domain,ClassificationResult,StoreStats,HealthStatus,MemoryStoreError,ModelSignature) - /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/sqlite.rs (rename
Storage->SqliteMemoryStore, add theimpl LocalMemoryStoreblock and theenforce_model/row_to_recordhelpers; ~200 line diff on a 4592-line file) - /home/delandtj/prppl/vestige/crates/vestige-core/src/storage/migrations.rs (append
Migration { version: 12, ... }+MIGRATION_V12_UPconstant; ~80 new lines) - /home/delandtj/prppl/vestige/crates/vestige-core/src/embedder/mod.rs (new;
Embedder+LocalEmbeddertraits,EmbedderError, defaultembed_batch) - /home/delandtj/prppl/vestige/crates/vestige-core/src/embedder/fastembed.rs (new;
FastembedEmbedderimplementation adapting the existingEmbeddingService)