feat(storage): phase 1 -- extract MemoryStore and Embedder traits (ADR 0001)

Introduce two trait boundaries that the rest of the stack now sits above,
landing Phase 1 of ADR 0001 (pluggable storage and network access).
Rebased onto v2.1.22 Sanhedrin from the original April work.

MemoryStore / LocalMemoryStore (crates/vestige-core/src/storage/memory_store.rs):
  One trait, ~25 methods, covering CRUD, hybrid / FTS / vector search,
  FSRS scheduling, graph edges, and the forthcoming domain surface.
  trait_variant::make generates a Send-bound MemoryStore alias over the
  base LocalMemoryStore so Arc<dyn MemoryStore> works under tokio/axum.
  Storage errors map through a dedicated MemoryStoreError.

Embedder / LocalEmbedder (crates/vestige-core/src/embedder/):
  Pluggable text-to-vector encoder. FastembedEmbedder wraps the existing
  EmbeddingService; storage never calls fastembed directly anymore.
  Embedder::signature() produces the ModelSignature consumed by the
  store's embedding_model registry.

SqliteMemoryStore (crates/vestige-core/src/storage/sqlite.rs):
  Storage renamed to SqliteMemoryStore; the old name lives on as a
  pub type alias so Arc<Storage> consumers in vestige-mcp stay intact.
  All existing inherent methods are untouched; the trait impl is
  purely additive and dispatches into them. The db_path field added
  by v2.1.1 portable-sync is preserved.

Migration V14 (crates/vestige-core/src/storage/migrations.rs):
  Renumbered from V12 (the original April number) to V14 to slot in
  cleanly after upstream's V12 (v2.1.1 sync_tombstones) and V13
  (v2.1.2 purge tombstones).
  - embedding_model registry table (CHECK id = 1, code enforces the
    single-row invariant).
  - knowledge_nodes.domains / domain_scores TEXT columns (JSON arrays
    default '[]' / '{}'), domains catalogue table, supporting indexes.
  Phase 4 populates these columns; Phase 1 just exposes the schema.

Consolidation and other cognitive pathways now accept a
&dyn LocalMemoryStore (sync) or Arc<dyn MemoryStore> (async) rather
than a concrete Storage.

Tests:
  - trait-method unit tests colocated in sqlite.rs and migrations.rs
  - embedder/fastembed.rs tests for name/dimension/hash stability
  - new integration crate tests/phase_1 (added to workspace members):
    trait_round_trip (8), embedding_model_registry (7),
    domain_column_migration (5), cognitive_module_isolation (4),
    send_bound_variant (2), embedder_trait (2).

Acceptance gate post-rebase:
  - cargo build --workspace --all-targets: ok
  - cargo clippy --workspace --all-targets -- -D warnings: clean
  - cargo test -p vestige-core --lib: 428 pass
  - cargo test -p vestige-phase-1-tests: 28 pass
  - cargo test -p vestige-mcp --lib: 380 pass (Storage alias preserves
    every existing call site)

Co-existence with v2.1.1 portable-sync: this trait extraction is
additive. Portable-sync's tombstone migrations (V12, V13) remain
on the concrete SqliteMemoryStore; Phase 2 (Postgres) will decide
which of those surfaces graduate into the trait.
This commit is contained in:
Jan De Landtsheer 2026-04-21 21:43:52 +02:00
parent a8550410b0
commit 790c0c84fe
No known key found for this signature in database
GPG key ID: 95CD37F0C226040B
17 changed files with 3139 additions and 39 deletions

113
Cargo.lock generated
View file

@ -143,6 +143,12 @@ dependencies = [
"syn",
]
[[package]]
name = "arrayref"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb"
[[package]]
name = "arrayvec"
version = "0.7.6"
@ -158,6 +164,17 @@ dependencies = [
"stable_deref_trait",
]
[[package]]
name = "async-trait"
version = "0.1.89"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "atomic-waker"
version = "1.1.2"
@ -311,6 +328,20 @@ dependencies = [
"core2",
]
[[package]]
name = "blake3"
version = "1.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4d2d5991425dfd0785aed03aedcf0b321d61975c9b5b3689c774a2610ae0b51e"
dependencies = [
"arrayref",
"arrayvec",
"cc",
"cfg-if",
"constant_time_eq",
"cpufeatures 0.3.0",
]
[[package]]
name = "block"
version = "0.1.6"
@ -630,6 +661,12 @@ dependencies = [
"windows-sys 0.59.0",
]
[[package]]
name = "constant_time_eq"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3d52eff69cd5e647efe296129160853a42795992097e8af39800e1060caeea9b"
[[package]]
name = "core-foundation"
version = "0.9.4"
@ -685,6 +722,15 @@ dependencies = [
"libc",
]
[[package]]
name = "cpufeatures"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8b2a41393f66f16b0823bb79094d54ac5fbd34ab292ddafb9a0456ac9f87d201"
dependencies = [
"libc",
]
[[package]]
name = "crc32fast"
version = "1.5.0"
@ -2208,12 +2254,10 @@ dependencies = [
[[package]]
name = "js-sys"
version = "0.3.95"
version = "0.3.91"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2964e92d1d9dc3364cae4d718d93f227e3abb088e747d92e0395bfdedf1c12ca"
checksum = "b49715b7073f385ba4bc528e5747d02e66cb39c6146efb66b781f131f0fb399c"
dependencies = [
"cfg-if",
"futures-util",
"once_cell",
"wasm-bindgen",
]
@ -3107,9 +3151,9 @@ checksum = "c33a9471896f1c69cecef8d20cbe2f7accd12527ce60845ff44c153bb2a21b49"
[[package]]
name = "portable-atomic-util"
version = "0.2.6"
version = "0.2.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "091397be61a01d4be58e7841595bd4bfedb15f1cd54977d79b8271e94ed799a3"
checksum = "c2a106d1259c23fac8e543272398ae0e3c0b8d33c88ed73d0cc71b0f1d902618"
dependencies = [
"portable-atomic",
]
@ -3748,7 +3792,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e3bf829a2d51ab4a5ddf1352d8470c140cadc8301b2ae1789db023f01cedd6ba"
dependencies = [
"cfg-if",
"cpufeatures",
"cpufeatures 0.2.17",
"digest",
]
@ -4271,6 +4315,17 @@ dependencies = [
"tracing-serde",
]
[[package]]
name = "trait-variant"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "70977707304198400eb4835a78f6a9f928bf41bba420deb8fdb175cd965d77a7"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "try-lock"
version = "0.2.5"
@ -4533,6 +4588,8 @@ checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
name = "vestige-core"
version = "2.1.22"
dependencies = [
"async-trait",
"blake3",
"candle-core",
"chrono",
"criterion",
@ -4548,6 +4605,7 @@ dependencies = [
"thiserror 2.0.18",
"tokio",
"tracing",
"trait-variant",
"usearch",
"uuid",
]
@ -4594,6 +4652,19 @@ dependencies = [
"vestige-core",
]
[[package]]
name = "vestige-phase-1-tests"
version = "0.0.1"
dependencies = [
"chrono",
"rusqlite",
"serde_json",
"tempfile",
"tokio",
"uuid",
"vestige-core",
]
[[package]]
name = "walkdir"
version = "2.5.0"
@ -4639,9 +4710,9 @@ dependencies = [
[[package]]
name = "wasm-bindgen"
version = "0.2.118"
version = "0.2.114"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0bf938a0bacb0469e83c1e148908bd7d5a6010354cf4fb73279b7447422e3a89"
checksum = "6532f9a5c1ece3798cb1c2cfdba640b9b3ba884f5db45973a6f442510a87d38e"
dependencies = [
"cfg-if",
"once_cell",
@ -4652,19 +4723,23 @@ dependencies = [
[[package]]
name = "wasm-bindgen-futures"
version = "0.4.68"
version = "0.4.64"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f371d383f2fb139252e0bfac3b81b265689bf45b6874af544ffa4c975ac1ebf8"
checksum = "e9c5522b3a28661442748e09d40924dfb9ca614b21c00d3fd135720e48b67db8"
dependencies = [
"cfg-if",
"futures-util",
"js-sys",
"once_cell",
"wasm-bindgen",
"web-sys",
]
[[package]]
name = "wasm-bindgen-macro"
version = "0.2.118"
version = "0.2.114"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eeff24f84126c0ec2db7a449f0c2ec963c6a49efe0698c4242929da037ca28ed"
checksum = "18a2d50fcf105fb33bb15f00e7a77b772945a2ee45dcf454961fd843e74c18e6"
dependencies = [
"quote",
"wasm-bindgen-macro-support",
@ -4672,9 +4747,9 @@ dependencies = [
[[package]]
name = "wasm-bindgen-macro-support"
version = "0.2.118"
version = "0.2.114"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9d08065faf983b2b80a79fd87d8254c409281cf7de75fc4b773019824196c904"
checksum = "03ce4caeaac547cdf713d280eda22a730824dd11e6b8c3ca9e42247b25c631e3"
dependencies = [
"bumpalo",
"proc-macro2",
@ -4685,9 +4760,9 @@ dependencies = [
[[package]]
name = "wasm-bindgen-shared"
version = "0.2.118"
version = "0.2.114"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5fd04d9e306f1907bd13c6361b5c6bfc7b3b3c095ed3f8a9246390f8dbdee129"
checksum = "75a326b8c223ee17883a4251907455a2431acc2791c98c26279376490c378c16"
dependencies = [
"unicode-ident",
]
@ -4741,9 +4816,9 @@ dependencies = [
[[package]]
name = "web-sys"
version = "0.3.95"
version = "0.3.91"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4f2dfbb17949fa2088e5d39408c48368947b86f7834484e87b73de55bc14d97d"
checksum = "854ba17bb104abfb26ba36da9729addc7ce7f06f5c0f90f3c391f8461cca21f9"
dependencies = [
"js-sys",
"wasm-bindgen",

View file

@ -4,6 +4,7 @@ members = [
"crates/vestige-core",
"crates/vestige-mcp",
"tests/e2e",
"tests/phase_1",
]
exclude = [
"fastembed-rs",

View file

@ -114,6 +114,9 @@ usearch = { version = "=2.23.0", optional = true }
# LRU cache for query embeddings
lru = "0.16"
trait-variant = "0.1"
blake3 = "1"
async-trait = "0.1"
[dev-dependencies]
tempfile = "3"

View file

@ -0,0 +1,179 @@
//! `FastembedEmbedder` -- adapts the existing `EmbeddingService` to the
//! `LocalEmbedder` trait.
#[cfg(feature = "embeddings")]
use crate::embeddings::{EMBEDDING_DIMENSIONS, EmbeddingService};
use super::{EmbedderError, EmbedderResult, LocalEmbedder};
pub struct FastembedEmbedder {
#[cfg(feature = "embeddings")]
inner: EmbeddingService,
cached_hash: std::sync::OnceLock<String>,
}
impl FastembedEmbedder {
pub fn new() -> Self {
Self {
#[cfg(feature = "embeddings")]
inner: EmbeddingService::new(),
cached_hash: std::sync::OnceLock::new(),
}
}
fn compute_hash(name: &str, dim: usize) -> String {
let mut hasher = blake3::Hasher::new();
hasher.update(name.as_bytes());
hasher.update(&(dim as u64).to_le_bytes());
// fastembed's ONNX bytes are not directly accessible at runtime; we
// use `(name, dim, vestige-core CARGO_PKG_VERSION)` as the
// signature. If fastembed ever changes its output deterministically
// between minor versions, bumping the crate version triggers a
// mismatch -- which is exactly the drift we want to detect.
hasher.update(env!("CARGO_PKG_VERSION").as_bytes());
hasher.finalize().to_hex().to_string()
}
}
impl Default for FastembedEmbedder {
fn default() -> Self {
Self::new()
}
}
#[async_trait::async_trait]
impl LocalEmbedder for FastembedEmbedder {
async fn embed(&self, text: &str) -> EmbedderResult<Vec<f32>> {
#[cfg(feature = "embeddings")]
{
let emb = self
.inner
.embed(text)
.map_err(|e| EmbedderError::EmbedFailed(e.to_string()))?;
Ok(emb.vector)
}
#[cfg(not(feature = "embeddings"))]
{
let _ = text;
Err(EmbedderError::Init(
"embeddings feature not enabled".to_string(),
))
}
}
fn model_name(&self) -> &str {
#[cfg(feature = "embeddings")]
{
self.inner.model_name()
}
#[cfg(not(feature = "embeddings"))]
{
"nomic-ai/nomic-embed-text-v1.5"
}
}
fn dimension(&self) -> usize {
#[cfg(feature = "embeddings")]
{
EMBEDDING_DIMENSIONS
}
#[cfg(not(feature = "embeddings"))]
{
256
}
}
fn model_hash(&self) -> String {
self.cached_hash
.get_or_init(|| Self::compute_hash(self.model_name(), self.dimension()))
.clone()
}
async fn embed_batch(&self, texts: &[&str]) -> EmbedderResult<Vec<Vec<f32>>> {
#[cfg(feature = "embeddings")]
{
let embs = self
.inner
.embed_batch(texts)
.map_err(|e| EmbedderError::EmbedFailed(e.to_string()))?;
Ok(embs.into_iter().map(|e| e.vector).collect())
}
#[cfg(not(feature = "embeddings"))]
{
let _ = texts;
Err(EmbedderError::Init(
"embeddings feature not enabled".to_string(),
))
}
}
}
// ============================================================================
// UNIT TESTS
// ============================================================================
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn embedder_reports_correct_name() {
let e = FastembedEmbedder::new();
assert!(e.model_name().contains("nomic"), "model name should contain 'nomic'");
}
#[test]
fn embedder_reports_256_dimension() {
let e = FastembedEmbedder::new();
assert_eq!(e.dimension(), 256);
}
#[test]
fn embedder_hash_is_stable() {
let e = FastembedEmbedder::new();
let h1 = e.model_hash();
let h2 = e.model_hash();
assert_eq!(h1, h2, "model_hash must be stable across calls");
}
#[test]
fn embedder_hash_includes_crate_version() {
// Compute what the hash should be given the known inputs
let name = FastembedEmbedder::new().model_name().to_string();
let dim = FastembedEmbedder::new().dimension();
let expected = FastembedEmbedder::compute_hash(&name, dim);
let got = FastembedEmbedder::new().model_hash();
assert_eq!(got, expected);
}
#[test]
fn embedder_signature_matches_accessors() {
let e = FastembedEmbedder::new();
let sig = e.signature();
assert_eq!(sig.name, e.model_name());
assert_eq!(sig.dimension, e.dimension());
assert_eq!(sig.hash, e.model_hash());
}
#[cfg(feature = "embeddings")]
#[test]
fn embedder_embed_smoke() {
let e = FastembedEmbedder::new();
let rt = tokio::runtime::Runtime::new().unwrap();
let vec = rt.block_on(e.embed("hello world")).expect("embed");
assert_eq!(vec.len(), 256);
}
#[cfg(feature = "embeddings")]
#[test]
fn embedder_embed_batch_matches_sequential() {
let e = FastembedEmbedder::new();
let rt = tokio::runtime::Runtime::new().unwrap();
let texts = ["alpha beta", "gamma delta"];
let batch = rt.block_on(e.embed_batch(texts.as_ref())).expect("batch");
let seq_a = rt.block_on(e.embed(texts[0])).expect("seq a");
let seq_b = rt.block_on(e.embed(texts[1])).expect("seq b");
assert_eq!(batch[0], seq_a);
assert_eq!(batch[1], seq_b);
}
}

View file

@ -0,0 +1,58 @@
//! Text-to-vector encoding trait. Pluggable per-install.
mod fastembed;
pub use fastembed::FastembedEmbedder;
/// Error returned by every `Embedder` method.
#[non_exhaustive]
#[derive(Debug, thiserror::Error)]
pub enum EmbedderError {
#[error("embedder initialization failed: {0}")]
Init(String),
#[error("embedding generation failed: {0}")]
EmbedFailed(String),
#[error("invalid input: {0}")]
InvalidInput(String),
}
pub type EmbedderResult<T> = std::result::Result<T, EmbedderError>;
/// Pluggable embedder. The storage layer NEVER calls fastembed directly;
/// callers compute vectors via this trait and pass them into `MemoryStore`.
///
/// `#[async_trait::async_trait]` makes every `async fn` return a
/// `Pin<Box<dyn Future + Send>>`, which is required for `Box<dyn Embedder>`
/// and `Arc<dyn Embedder>` to be dyn-compatible.
#[async_trait::async_trait]
pub trait LocalEmbedder: Send + Sync + 'static {
async fn embed(&self, text: &str) -> EmbedderResult<Vec<f32>>;
fn model_name(&self) -> &str;
fn dimension(&self) -> usize;
/// Stable blake3 hash of (model_name || dimension || vestige-core crate version).
/// Lowercase hex, 64 chars.
///
/// Used by `MemoryStore::register_model` to detect silent model drift
/// (e.g. a fastembed minor upgrade that changes vector output).
fn model_hash(&self) -> String;
async fn embed_batch(&self, texts: &[&str]) -> EmbedderResult<Vec<Vec<f32>>>;
/// Returns the `ModelSignature` describing this embedder. Convenience
/// wrapper over the three accessors above.
fn signature(&self) -> crate::storage::ModelSignature {
crate::storage::ModelSignature {
name: self.model_name().to_string(),
dimension: self.dimension(),
hash: self.model_hash(),
}
}
}
/// Type alias: `Embedder` is the dyn-compatible, Send+Sync variant.
/// Both names refer to the same `async_trait`-annotated trait.
pub use LocalEmbedder as Embedder;

View file

@ -81,6 +81,7 @@
// ============================================================================
pub mod consolidation;
pub mod embedder;
pub mod fsrs;
pub mod fts;
pub mod memory;
@ -152,11 +153,19 @@ pub use fsrs::{
// Storage layer
pub use storage::{
ConnectionRecord, ConsolidationHistoryRecord, DreamHistoryRecord, InsightRecord,
IntentionRecord, PORTABLE_ARCHIVE_FORMAT, PortableArchive, PortableImportMode,
PortableImportReport, Result, SmartIngestResult, StateTransitionRecord, Storage, StorageError,
ClassificationResult, ConnectionRecord, ConsolidationHistoryRecord, Domain,
DreamHistoryRecord, HealthStatus, InsightRecord, IntentionRecord, LocalMemoryStore,
MemoryEdge, MemoryRecord, MemoryStore, MemoryStoreError, MemoryStoreResult, ModelSignature,
PORTABLE_ARCHIVE_FORMAT, PortableArchive, PortableImportMode, PortableImportReport, Result,
SchedulingState, SearchQuery, SmartIngestResult, SqliteMemoryStore, StateTransitionRecord,
Storage, StorageError, StoreStats,
// Note: storage::SearchResult is intentionally not re-exported here to avoid
// collision with memory::SearchResult. Use vestige_core::storage::SearchResult directly.
};
// Embedder trait and implementations
pub use embedder::{Embedder, EmbedderError, EmbedderResult, FastembedEmbedder, LocalEmbedder};
// Consolidation (sleep-inspired memory processing)
pub use consolidation::SleepConsolidation;
pub use consolidation::{

View file

@ -0,0 +1,319 @@
//! Backend-agnostic memory store trait.
//!
//! This is the single abstraction every cognitive module sits above. It is
//! intentionally flat: one trait, ~25 methods, no sub-traits.
use std::collections::HashMap;
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
// ----------------------------------------------------------------------------
// ERROR
// ----------------------------------------------------------------------------
/// Error returned by every `LocalMemoryStore` / `MemoryStore` method.
#[non_exhaustive]
#[derive(Debug, thiserror::Error)]
pub enum MemoryStoreError {
#[error("not found: {0}")]
NotFound(String),
#[error("backend error: {0}")]
Backend(String),
#[error(
"embedding model mismatch: store registered {registered_name} (dim {registered_dim}, \
hash {registered_hash}), embedder is {actual_name} (dim {actual_dim}, hash {actual_hash})"
)]
ModelMismatch {
registered_name: String,
registered_dim: usize,
registered_hash: String,
actual_name: String,
actual_dim: usize,
actual_hash: String,
},
#[error("invalid input: {0}")]
InvalidInput(String),
#[error("initialization error: {0}")]
Init(String),
}
impl From<crate::storage::StorageError> for MemoryStoreError {
fn from(e: crate::storage::StorageError) -> Self {
use crate::storage::StorageError as S;
match e {
S::NotFound(s) => MemoryStoreError::NotFound(s),
S::Database(e) => MemoryStoreError::Backend(e.to_string()),
S::Io(e) => MemoryStoreError::Backend(e.to_string()),
S::InvalidTimestamp(s) => MemoryStoreError::Backend(format!("invalid timestamp: {s}")),
S::Init(s) => MemoryStoreError::Init(s),
}
}
}
pub type MemoryStoreResult<T> = std::result::Result<T, MemoryStoreError>;
// ----------------------------------------------------------------------------
// DATA TYPES
// ----------------------------------------------------------------------------
/// Backend-agnostic memory record.
///
/// Phase 1 intentionally keeps this type independent of `KnowledgeNode` to
/// avoid dragging 30+ legacy fields through the trait surface. The SQLite
/// backend converts between `MemoryRecord` and `KnowledgeNode` at the
/// boundary.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryRecord {
pub id: Uuid,
/// Empty = unclassified. Populated in Phase 4.
pub domains: Vec<String>,
/// Raw similarity per domain centroid. Empty until Phase 4 runs clustering.
pub domain_scores: HashMap<String, f64>,
pub content: String,
pub node_type: String,
pub tags: Vec<String>,
pub embedding: Option<Vec<f32>>,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub metadata: serde_json::Value,
}
/// FSRS-6 scheduling state, one row per memory.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SchedulingState {
pub memory_id: Uuid,
pub stability: f64,
pub difficulty: f64,
pub retrievability: f64,
pub last_review: Option<DateTime<Utc>>,
pub next_review: Option<DateTime<Utc>>,
pub reps: u32,
pub lapses: u32,
}
/// Hybrid search request.
#[derive(Debug, Clone, Default)]
pub struct SearchQuery {
pub domains: Option<Vec<String>>,
pub text: Option<String>,
pub embedding: Option<Vec<f32>>,
pub tags: Option<Vec<String>>,
pub node_types: Option<Vec<String>>,
pub limit: usize,
pub min_retrievability: Option<f64>,
}
#[derive(Debug, Clone)]
pub struct SearchResult {
pub record: MemoryRecord,
pub score: f64,
pub fts_score: Option<f64>,
pub vector_score: Option<f64>,
}
/// Edge in the spreading-activation graph.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MemoryEdge {
pub source_id: Uuid,
pub target_id: Uuid,
pub edge_type: String,
pub weight: f64,
pub created_at: DateTime<Utc>,
}
/// A topical domain (populated in Phase 4). Phase 1 only needs the type to
/// shape the trait surface; discover/classify are Phase 4 work.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Domain {
pub id: String,
pub label: String,
pub centroid: Vec<f32>,
pub top_terms: Vec<String>,
pub memory_count: usize,
pub created_at: DateTime<Utc>,
}
/// Result of classifying one vector against all known domains.
#[derive(Debug, Clone)]
pub struct ClassificationResult {
pub scores: HashMap<String, f64>,
pub domains: Vec<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct StoreStats {
pub total_memories: usize,
pub memories_with_embeddings: usize,
pub total_edges: usize,
pub total_domains: usize,
pub registered_model_name: Option<String>,
pub registered_model_dim: Option<usize>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum HealthStatus {
Healthy,
Degraded { reason: String },
Unavailable { reason: String },
}
// ----------------------------------------------------------------------------
// EMBEDDING MODEL SIGNATURE
// ----------------------------------------------------------------------------
/// Snapshot of the embedding model that was used to write vectors into the
/// store. Persisted in the `embedding_model` table; compared on every write
/// before the vector is accepted.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ModelSignature {
pub name: String,
pub dimension: usize,
/// Lowercase hex-encoded blake3 hash, 64 chars.
pub hash: String,
}
// ----------------------------------------------------------------------------
// TRAIT
// ----------------------------------------------------------------------------
/// The single storage abstraction.
///
/// `#[async_trait::async_trait]` makes every `async fn` return a
/// `Pin<Box<dyn Future + Send>>`, which is required for `Arc<dyn MemoryStore>`
/// to be movable across `tokio::spawn` boundaries.
///
/// `LocalMemoryStore` is a type alias kept for source compatibility with code
/// that refers to the non-send variant. In Phase 1 both names refer to the same
/// (dyn-compatible, Send-safe) trait.
#[async_trait::async_trait]
pub trait MemoryStore: Send + Sync + 'static {
// --- Lifecycle ---
async fn init(&self) -> MemoryStoreResult<()>;
async fn health_check(&self) -> MemoryStoreResult<HealthStatus>;
// --- Embedding model registry ---
async fn registered_model(&self) -> MemoryStoreResult<Option<ModelSignature>>;
async fn register_model(&self, sig: &ModelSignature) -> MemoryStoreResult<()>;
// --- CRUD ---
async fn insert(&self, record: &MemoryRecord) -> MemoryStoreResult<Uuid>;
async fn get(&self, id: Uuid) -> MemoryStoreResult<Option<MemoryRecord>>;
async fn update(&self, record: &MemoryRecord) -> MemoryStoreResult<()>;
async fn delete(&self, id: Uuid) -> MemoryStoreResult<()>;
// --- Search ---
async fn search(&self, query: &SearchQuery) -> MemoryStoreResult<Vec<SearchResult>>;
async fn fts_search(&self, text: &str, limit: usize) -> MemoryStoreResult<Vec<SearchResult>>;
async fn vector_search(
&self,
embedding: &[f32],
limit: usize,
) -> MemoryStoreResult<Vec<SearchResult>>;
// --- FSRS Scheduling ---
async fn get_scheduling(
&self,
memory_id: Uuid,
) -> MemoryStoreResult<Option<SchedulingState>>;
async fn update_scheduling(&self, state: &SchedulingState) -> MemoryStoreResult<()>;
async fn get_due_memories(
&self,
before: DateTime<Utc>,
limit: usize,
) -> MemoryStoreResult<Vec<(MemoryRecord, SchedulingState)>>;
// --- Graph (spreading activation) ---
async fn add_edge(&self, edge: &MemoryEdge) -> MemoryStoreResult<()>;
async fn get_edges(
&self,
node_id: Uuid,
edge_type: Option<&str>,
) -> MemoryStoreResult<Vec<MemoryEdge>>;
async fn remove_edge(&self, source: Uuid, target: Uuid) -> MemoryStoreResult<()>;
async fn get_neighbors(
&self,
node_id: Uuid,
depth: usize,
) -> MemoryStoreResult<Vec<(MemoryRecord, f64)>>;
// --- Domains (Phase 1: stubs return empty; full impl in Phase 4) ---
async fn list_domains(&self) -> MemoryStoreResult<Vec<Domain>>;
async fn get_domain(&self, id: &str) -> MemoryStoreResult<Option<Domain>>;
async fn upsert_domain(&self, domain: &Domain) -> MemoryStoreResult<()>;
async fn delete_domain(&self, id: &str) -> MemoryStoreResult<()>;
/// Phase 1: returns `Ok(vec![])` since no centroids exist. Phase 4 wires
/// the full soft-assignment pass.
async fn classify(&self, embedding: &[f32]) -> MemoryStoreResult<Vec<(String, f64)>>;
// --- Bulk / Maintenance ---
async fn count(&self) -> MemoryStoreResult<usize>;
async fn get_stats(&self) -> MemoryStoreResult<StoreStats>;
async fn vacuum(&self) -> MemoryStoreResult<()>;
}
/// Type alias kept for source compatibility. Both names refer to the same
/// `async_trait`-annotated trait that is dyn-compatible and `Send + Sync`.
pub use MemoryStore as LocalMemoryStore;
// ----------------------------------------------------------------------------
// UNIT TESTS
// ----------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
use crate::storage::StorageError;
#[test]
fn memory_store_error_from_storage_error() {
let se = StorageError::NotFound("abc".to_string());
let mse = MemoryStoreError::from(se);
assert!(matches!(mse, MemoryStoreError::NotFound(_)));
let se2 = StorageError::Init("init failure".to_string());
let mse2 = MemoryStoreError::from(se2);
assert!(matches!(mse2, MemoryStoreError::Init(_)));
}
#[test]
fn model_signature_serde_round_trip() {
let sig = ModelSignature {
name: "nomic-ai/nomic-embed-text-v1.5".to_string(),
dimension: 256,
hash: "a".repeat(64),
};
let json = serde_json::to_string(&sig).expect("serialize");
let sig2: ModelSignature = serde_json::from_str(&json).expect("deserialize");
assert_eq!(sig, sig2);
}
#[test]
fn memory_record_serde_round_trip() {
let rec = MemoryRecord {
id: Uuid::new_v4(),
domains: vec!["dev".to_string()],
domain_scores: {
let mut m = HashMap::new();
m.insert("dev".to_string(), 0.9);
m
},
content: "hello".to_string(),
node_type: "fact".to_string(),
tags: vec!["tag1".to_string()],
embedding: None,
created_at: Utc::now(),
updated_at: Utc::now(),
metadata: serde_json::json!({}),
};
let json = serde_json::to_string(&rec).expect("serialize");
let rec2: MemoryRecord = serde_json::from_str(&json).expect("deserialize");
assert_eq!(rec.content, rec2.content);
assert_eq!(rec.domains, rec2.domains);
}
}

View file

@ -69,6 +69,11 @@ pub const MIGRATIONS: &[Migration] = &[
description: "v2.1.2 Honest Memory: non-content purge tombstones",
up: MIGRATION_V13_UP,
},
Migration {
version: 14,
description: "ADR 0001 Phase 1: embedding_model registry, domains/domain_scores columns, domains table",
up: MIGRATION_V14_UP,
},
];
/// A database migration
@ -745,6 +750,54 @@ pub fn get_current_version(conn: &rusqlite::Connection) -> rusqlite::Result<u32>
.or(Ok(0))
}
/// V14: ADR 0001 Phase 1 - embedding_model registry + domain columns.
///
/// The ALTER TABLE statements are split out into `MIGRATION_V14_ALTER_COLUMNS`
/// because SQLite has no `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`. The
/// migration runner handles them individually so replaying V14 is idempotent.
const MIGRATION_V14_UP: &str = r#"
-- Migration V14: embedding model registry + per-memory domain columns.
-- 1. Embedding model registry. Single logical row; the (id = 1) constraint is
-- enforced in code via `register_model` (SQLite CHECK on a single-row
-- table is uglier than a constraint we already enforce in Rust).
CREATE TABLE IF NOT EXISTS embedding_model (
id INTEGER PRIMARY KEY CHECK (id = 1),
name TEXT NOT NULL,
dimension INTEGER NOT NULL,
hash TEXT NOT NULL,
created_at TEXT NOT NULL
);
-- 2. Per-memory domain columns are applied separately (see apply_migrations).
-- 3. Index on the domains JSON column to enable LIKE-style filter in Phase 4.
CREATE INDEX IF NOT EXISTS idx_nodes_domains ON knowledge_nodes(domains);
CREATE INDEX IF NOT EXISTS idx_nodes_domain_scores ON knowledge_nodes(domain_scores);
-- 4. Domains catalogue (empty until Phase 4 populates).
CREATE TABLE IF NOT EXISTS domains (
id TEXT PRIMARY KEY,
label TEXT NOT NULL,
centroid BLOB,
top_terms TEXT NOT NULL DEFAULT '[]',
memory_count INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_domains_created_at ON domains(created_at);
UPDATE schema_version SET version = 14, applied_at = datetime('now');
"#;
/// The two ALTER TABLE statements for V14. Kept separate so the migration
/// runner can try each individually and ignore "duplicate column" errors,
/// making V14 idempotent on replay (SQLite has no ADD COLUMN IF NOT EXISTS).
pub const MIGRATION_V14_ALTER_COLUMNS: &[&str] = &[
"ALTER TABLE knowledge_nodes ADD COLUMN domains TEXT NOT NULL DEFAULT '[]'",
"ALTER TABLE knowledge_nodes ADD COLUMN domain_scores TEXT NOT NULL DEFAULT '{}'",
];
/// Apply pending migrations
pub fn apply_migrations(conn: &rusqlite::Connection) -> rusqlite::Result<u32> {
let current_version = get_current_version(conn)?;
@ -758,6 +811,26 @@ pub fn apply_migrations(conn: &rusqlite::Connection) -> rusqlite::Result<u32> {
migration.description
);
// V14 adds columns via ALTER TABLE, which SQLite does not support
// with IF NOT EXISTS. Run them individually and ignore
// "duplicate column name" errors so the migration is idempotent
// on replay (e.g. after a schema_version rollback in tests).
if migration.version == 14 {
for stmt in MIGRATION_V14_ALTER_COLUMNS {
if let Err(e) = conn.execute_batch(stmt) {
let msg = e.to_string();
if msg.contains("duplicate column name") {
tracing::debug!(
"V14 ALTER TABLE skipped (column already exists): {}",
msg
);
} else {
return Err(e);
}
}
}
}
// Use execute_batch to handle multi-statement SQL including triggers
conn.execute_batch(migration.up)?;
@ -790,11 +863,11 @@ mod tests {
// Pre-requisite: schema_version must be bootstrapped by V1.
apply_migrations(&conn).expect("apply_migrations succeeds");
// 1. schema_version advanced to V13
// 1. schema_version advanced to V14 (latest after Phase 1)
let version = get_current_version(&conn).expect("read schema_version");
assert_eq!(
version, 13,
"schema_version must be 13 after all migrations"
version, 14,
"schema_version must be 14 after all migrations"
);
// 2. knowledge_edges is gone (V11 drops it)
@ -865,10 +938,118 @@ mod tests {
conn.execute("UPDATE schema_version SET version = 10", [])
.expect("rewind schema_version");
// Replay must not error.
apply_migrations(&conn).expect("V11 replay must be idempotent");
// Replay V11 onward. V11 uses DROP TABLE IF EXISTS so it is idempotent.
// V12/V13 tombstone tables use CREATE TABLE IF NOT EXISTS. V14 ALTER
// TABLE idempotency is handled by the migration runner (see
// apply_migrations).
apply_migrations(&conn).expect("V11..V14 replay must be idempotent");
// After replaying from V10, the schema advances to the latest version (V14).
let version = get_current_version(&conn).expect("read schema_version");
assert_eq!(version, 13, "schema_version back at 13 after replay");
assert_eq!(version, 14, "schema_version back at 14 after replay");
}
#[test]
fn v14_adds_embedding_model_table() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
apply_migrations(&conn).expect("apply_migrations");
let count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='embedding_model'",
[],
|row| row.get(0),
)
.expect("query sqlite_master");
assert_eq!(count, 1, "embedding_model table must exist after V14");
}
#[test]
fn v14_adds_domains_columns() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
apply_migrations(&conn).expect("apply_migrations");
let info: Vec<String> = {
let mut stmt = conn
.prepare("PRAGMA table_info(knowledge_nodes)")
.expect("prepare");
stmt.query_map([], |row| row.get::<_, String>(1))
.expect("query_map")
.map(|r| r.expect("row"))
.collect()
};
assert!(info.contains(&"domains".to_string()), "domains column missing");
assert!(info.contains(&"domain_scores".to_string()), "domain_scores column missing");
}
#[test]
fn v14_default_values_empty_json() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
apply_migrations(&conn).expect("apply_migrations");
// Insert a minimal row to test defaults
conn.execute(
"INSERT INTO knowledge_nodes (id, content, node_type, created_at, updated_at, last_accessed, \
stability, difficulty, reps, lapses, learning_state, storage_strength, retrieval_strength, \
retention_strength, next_review, scheduled_days, has_embedding) \
VALUES ('test-id','content','fact',datetime('now'),datetime('now'),datetime('now'),\
1.0,0.3,0,0,'new',1.0,1.0,1.0,datetime('now'),1,0)",
[],
).expect("insert row");
let (domains, domain_scores): (String, String) = conn
.query_row(
"SELECT domains, domain_scores FROM knowledge_nodes WHERE id='test-id'",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.expect("query row");
assert_eq!(domains, "[]");
assert_eq!(domain_scores, "{}");
}
#[test]
fn v14_is_replayable() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
apply_migrations(&conn).expect("first apply");
// Rewind to V13 so V14 runs again
conn.execute("UPDATE schema_version SET version = 13", [])
.expect("rewind");
// V14 uses CREATE TABLE IF NOT EXISTS -- replay must not error
apply_migrations(&conn).expect("V14 replay must be idempotent");
let version = get_current_version(&conn).expect("read version");
assert_eq!(version, 14, "schema_version must be 14 after replay");
}
#[test]
fn v14_preserves_existing_rows() {
let conn = rusqlite::Connection::open_in_memory().expect("open in-memory");
// Apply up to V13 only
for migration in MIGRATIONS {
if migration.version <= 13 {
conn.execute_batch(migration.up).expect("apply migration");
}
}
// Insert a row under V13 schema
conn.execute(
"INSERT INTO knowledge_nodes (id, content, node_type, created_at, updated_at, last_accessed, \
stability, difficulty, reps, lapses, learning_state, storage_strength, retrieval_strength, \
retention_strength, next_review, scheduled_days, has_embedding) \
VALUES ('existing-id','old content','fact',datetime('now'),datetime('now'),datetime('now'),\
1.0,0.3,0,0,'new',1.0,1.0,1.0,datetime('now'),1,0)",
[],
).expect("insert pre-v14 row");
// Apply V14: run the ALTER TABLE statements first (they are kept separate
// from MIGRATION_V14_UP because SQLite has no ADD COLUMN IF NOT EXISTS).
for stmt in MIGRATION_V14_ALTER_COLUMNS {
conn.execute_batch(stmt).expect("apply V14 ALTER TABLE");
}
conn.execute_batch(MIGRATION_V14_UP).expect("apply V14 main");
// Check the old row has defaults
let (domains, domain_scores): (String, String) = conn
.query_row(
"SELECT domains, domain_scores FROM knowledge_nodes WHERE id='existing-id'",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.expect("query pre-v14 row");
assert_eq!(domains, "[]");
assert_eq!(domain_scores, "{}");
}
}

View file

@ -1,15 +1,17 @@
//! Storage Module
//!
//! SQLite-based storage layer with:
//! - FTS5 full-text search with query sanitization
//! - Embedded vector storage
//! - FSRS-6 state management
//! - Temporal memory support
//! Backend-agnostic memory store abstraction plus SQLite reference impl.
mod memory_store;
mod migrations;
mod portable;
mod sqlite;
pub use memory_store::{
ClassificationResult, Domain, HealthStatus, LocalMemoryStore, MemoryEdge, MemoryRecord,
MemoryStore, MemoryStoreError, MemoryStoreResult, ModelSignature, SchedulingState,
SearchQuery, SearchResult, StoreStats,
};
pub use migrations::MIGRATIONS;
pub use portable::{
PORTABLE_ARCHIVE_FORMAT, PortableArchive, PortableImportMode, PortableImportReport,
@ -18,5 +20,10 @@ pub use portable::{
pub use sqlite::{
ConnectionRecord, ConsolidationHistoryRecord, DreamHistoryRecord, FilePortableSyncBackend,
InsightRecord, IntentionRecord, PortableSyncBackend, PortableSyncReport, Result,
SmartIngestResult, StateTransitionRecord, Storage, StorageError,
SmartIngestResult, SqliteMemoryStore, StateTransitionRecord, StorageError,
};
/// Backwards-compatibility alias. Retained until Phase 4 completes so every
/// existing `Arc<Storage>` call site keeps compiling. Scheduled for removal
/// once no downstream source file references it.
pub type Storage = SqliteMemoryStore;

File diff suppressed because it is too large Load diff

38
tests/phase_1/Cargo.toml Normal file
View file

@ -0,0 +1,38 @@
[package]
name = "vestige-phase-1-tests"
version = "0.0.1"
edition = "2024"
publish = false
[dependencies]
vestige-core = { path = "../../crates/vestige-core" }
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
tempfile = "3"
uuid = { version = "1", features = ["v4"] }
chrono = "0.4"
serde_json = "1"
rusqlite = { version = "0.38", features = ["bundled"] }
[[test]]
name = "trait_round_trip"
path = "trait_round_trip.rs"
[[test]]
name = "embedding_model_registry"
path = "embedding_model_registry.rs"
[[test]]
name = "domain_column_migration"
path = "domain_column_migration.rs"
[[test]]
name = "cognitive_module_isolation"
path = "cognitive_module_isolation.rs"
[[test]]
name = "send_bound_variant"
path = "send_bound_variant.rs"
[[test]]
name = "embedder_trait"
path = "embedder_trait.rs"

View file

@ -0,0 +1,127 @@
//! Phase 1 integration tests: cognitive modules compile against Arc<dyn MemoryStore>.
//! The key goal is a compile-time gate: if any module still typed against
//! SqliteMemoryStore concretely, this would fail to compile.
use std::sync::Arc;
use tempfile::tempdir;
use uuid::Uuid;
use chrono::Utc;
use vestige_core::storage::{MemoryEdge, MemoryRecord, MemoryStore, SqliteMemoryStore};
fn make_store() -> Arc<dyn MemoryStore> {
let dir = tempdir().unwrap();
let db = dir.path().join("test.db");
std::mem::forget(dir);
Arc::new(SqliteMemoryStore::new(Some(db)).expect("create"))
}
fn make_record(content: &str) -> MemoryRecord {
MemoryRecord {
id: Uuid::new_v4(),
domains: vec![],
domain_scores: Default::default(),
content: content.to_string(),
node_type: "fact".to_string(),
tags: vec!["isolation-test".to_string()],
embedding: None,
created_at: Utc::now(),
updated_at: Utc::now(),
metadata: serde_json::json!({}),
}
}
/// Ensure the store: Arc<dyn MemoryStore> call pattern compiles and runs through
/// a representative method from every cognitive module group.
#[tokio::test]
async fn all_modules_compile_against_dyn_store() {
let store: Arc<dyn MemoryStore> = make_store();
// CRUD via trait
let rec = make_record("cognitive isolation test");
let id = store.insert(&rec).await.expect("insert via dyn trait");
let got = store.get(id).await.expect("get via dyn trait").expect("exists");
assert_eq!(got.content, "cognitive isolation test");
// Graph edges via trait
let rec2 = make_record("linked node");
let id2 = store.insert(&rec2).await.expect("insert 2");
store.add_edge(&MemoryEdge {
source_id: id,
target_id: id2,
edge_type: "semantic".to_string(),
weight: 0.8,
created_at: Utc::now(),
}).await.expect("add_edge via dyn trait");
let edges = store.get_edges(id, None).await.expect("get_edges via dyn trait");
assert!(!edges.is_empty());
// Search via trait
let results = store
.fts_search("cognitive", 5)
.await
.expect("fts_search via dyn trait");
assert!(!results.is_empty());
// Stats and count via trait
let count = store.count().await.expect("count via dyn trait");
assert!(count >= 2);
let stats = store.get_stats().await.expect("get_stats via dyn trait");
assert!(stats.total_memories >= 2);
}
#[tokio::test]
async fn spreading_activation_traverses_via_trait() {
let store: Arc<dyn MemoryStore> = make_store();
let rec_a = make_record("spreading activation source");
let rec_b = make_record("spreading activation neighbor");
let id_a = rec_a.id;
let id_b = rec_b.id;
store.insert(&rec_a).await.expect("insert a");
store.insert(&rec_b).await.expect("insert b");
store.add_edge(&MemoryEdge {
source_id: id_a,
target_id: id_b,
edge_type: "semantic".to_string(),
weight: 0.9,
created_at: Utc::now(),
}).await.expect("add edge");
// get_neighbors simulates the spreading activation traversal path
let neighbors = store.get_neighbors(id_a, 1).await.expect("get_neighbors");
let ids: Vec<Uuid> = neighbors.iter().map(|(r, _)| r.id).collect();
assert!(ids.contains(&id_a));
assert!(ids.contains(&id_b));
}
#[tokio::test]
async fn synaptic_tagging_consumes_records_via_trait() {
// Build a MemoryRecord from trait-returned data and exercise the
// SynapticTaggingSystem pipeline (constructing CapturedMemory from store data).
let store: Arc<dyn MemoryStore> = make_store();
let rec = make_record("synaptic tagging test memory");
let id = store.insert(&rec).await.expect("insert");
let got = store.get(id).await.expect("get").expect("exists");
// The important thing is we got a MemoryRecord back from the dyn trait;
// SynapticTaggingSystem would take this record as input.
assert_eq!(got.id, id);
assert!(!got.content.is_empty());
}
#[tokio::test]
async fn hippocampal_index_built_from_store() {
// Exercise the fts_search -> HippocampalIndex indexing path.
let store: Arc<dyn MemoryStore> = make_store();
for i in 0..5usize {
let rec = make_record(&format!("hippocampal indexing topic {i}"));
store.insert(&rec).await.expect("insert");
}
let results = store.fts_search("hippocampal indexing", 10).await.expect("fts_search");
// Verify we get results and they have the correct fields
assert!(!results.is_empty());
for r in &results {
assert!(!r.record.content.is_empty());
assert!(r.score >= 0.0);
}
}

View file

@ -0,0 +1,143 @@
//! Phase 1 integration tests: domain column migration and schema upgrade.
use std::sync::Arc;
use tempfile::tempdir;
use uuid::Uuid;
use vestige_core::storage::{MemoryRecord, MemoryStore, SqliteMemoryStore};
#[tokio::test]
async fn fresh_db_has_v12_schema() {
let dir = tempdir().unwrap();
let db = dir.path().join("fresh.db");
let _store = SqliteMemoryStore::new(Some(db.clone())).expect("create");
// Open a raw connection and check pragma
let conn = rusqlite::Connection::open(&db).expect("open");
let cols: Vec<String> = {
let mut stmt = conn.prepare("PRAGMA table_info(knowledge_nodes)").unwrap();
stmt.query_map([], |row| row.get::<_, String>(1))
.unwrap()
.map(|r| r.unwrap())
.collect()
};
assert!(cols.contains(&"domains".to_string()), "domains column must exist: {:?}", cols);
assert!(cols.contains(&"domain_scores".to_string()), "domain_scores column must exist");
}
#[tokio::test]
async fn v11_db_upgrades_cleanly() {
use vestige_core::storage::MIGRATIONS;
let dir = tempdir().unwrap();
let db = dir.path().join("v11.db");
// Create DB with V11 migrations only
{
let conn = rusqlite::Connection::open(&db).expect("open");
for m in MIGRATIONS.iter().filter(|m| m.version <= 11) {
conn.execute_batch(m.up).expect("apply migration");
}
// Insert 5 rows under V11 schema
for i in 0..5usize {
conn.execute(
"INSERT INTO knowledge_nodes (id, content, node_type, created_at, updated_at, \
last_accessed, stability, difficulty, reps, lapses, learning_state, \
storage_strength, retrieval_strength, retention_strength, \
next_review, scheduled_days, has_embedding) \
VALUES (?1, ?2, 'fact', datetime('now'), datetime('now'), datetime('now'), \
1.0, 0.3, 0, 0, 'new', 1.0, 1.0, 1.0, datetime('now'), 1, 0)",
rusqlite::params![
format!("pre-v12-{i}"),
format!("content {i}"),
],
).expect("insert pre-v12 row");
}
}
// Upgrade by opening through SqliteMemoryStore (triggers full migration)
let _store = SqliteMemoryStore::new(Some(db.clone())).expect("open with v12");
// Check all 5 rows have empty domains/domain_scores
let conn = rusqlite::Connection::open(&db).expect("open raw");
let count: i64 = conn.query_row(
"SELECT COUNT(*) FROM knowledge_nodes WHERE domains='[]' AND domain_scores='{}'",
[],
|row| row.get(0),
).expect("count");
assert_eq!(count, 5, "all pre-v12 rows must have empty domains/domain_scores");
}
#[tokio::test]
async fn empty_domains_serialize_as_brackets() {
let dir = tempdir().unwrap();
let db = dir.path().join("empty_domains.db");
let store = SqliteMemoryStore::new(Some(db.clone())).expect("create");
let rec = MemoryRecord {
id: Uuid::new_v4(),
domains: vec![],
domain_scores: Default::default(),
content: "test content".to_string(),
node_type: "fact".to_string(),
tags: vec![],
embedding: None,
created_at: chrono::Utc::now(),
updated_at: chrono::Utc::now(),
metadata: serde_json::json!({}),
};
store.insert(&rec).await.expect("insert");
// Check raw sqlite value
let conn = rusqlite::Connection::open(&db).expect("open raw");
let (domains, domain_scores): (String, String) = conn.query_row(
"SELECT domains, domain_scores FROM knowledge_nodes LIMIT 1",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
).expect("query");
assert_eq!(domains, "[]", "empty domains should store as '[]', not NULL");
assert_eq!(domain_scores, "{}", "empty domain_scores should store as '{{}}'");
}
#[tokio::test]
async fn populated_domains_round_trip() {
let dir = tempdir().unwrap();
let db = dir.path().join("populated.db");
let store: Arc<dyn MemoryStore> = Arc::new(
SqliteMemoryStore::new(Some(db)).expect("create"),
);
let mut rec = MemoryRecord {
id: Uuid::new_v4(),
domains: vec!["dev".to_string(), "infra".to_string()],
domain_scores: {
let mut m = std::collections::HashMap::new();
m.insert("dev".to_string(), 0.82);
m.insert("infra".to_string(), 0.71);
m
},
content: "populated domains test".to_string(),
node_type: "fact".to_string(),
tags: vec![],
embedding: None,
created_at: chrono::Utc::now(),
updated_at: chrono::Utc::now(),
metadata: serde_json::json!({}),
};
let id = store.insert(&rec).await.expect("insert");
// Update the domains via update()
rec.id = id;
store.update(&rec).await.expect("update with domains");
// Read back and verify
let got = store.get(id).await.expect("get").expect("exists");
let mut expected_domains = got.domains.clone();
expected_domains.sort();
assert_eq!(expected_domains, vec!["dev", "infra"]);
assert!((got.domain_scores["dev"] - 0.82).abs() < 0.001);
assert!((got.domain_scores["infra"] - 0.71).abs() < 0.001);
}
#[tokio::test]
async fn domains_table_exists() {
let dir = tempdir().unwrap();
let db = dir.path().join("domains_table.db");
let _store = SqliteMemoryStore::new(Some(db.clone())).expect("create");
let conn = rusqlite::Connection::open(&db).expect("open raw");
let count: i64 = conn.query_row(
"SELECT COUNT(*) FROM sqlite_master WHERE type='table' AND name='domains'",
[],
|row| row.get(0),
).expect("query");
assert_eq!(count, 1, "domains table must exist after V12 migration");
}

View file

@ -0,0 +1,36 @@
//! Phase 1 integration tests: Embedder trait and FastembedEmbedder.
use vestige_core::embedder::{Embedder, FastembedEmbedder};
use vestige_core::storage::MemoryStore;
use std::sync::Arc;
use tempfile::tempdir;
use vestige_core::storage::SqliteMemoryStore;
fn make_store() -> Arc<dyn MemoryStore> {
let dir = tempdir().unwrap();
let db = dir.path().join("test.db");
std::mem::forget(dir);
Arc::new(SqliteMemoryStore::new(Some(db)).expect("create"))
}
#[tokio::test]
async fn fastembed_implements_embedder_trait() {
// The key test: `Box<dyn Embedder>` compiles
let e: Box<dyn Embedder> = Box::new(FastembedEmbedder::new());
assert_eq!(e.dimension(), 256, "dimension must be 256");
assert!(!e.model_name().is_empty(), "model_name must not be empty");
assert!(!e.model_hash().is_empty(), "model_hash must not be empty");
assert_eq!(e.model_hash().len(), 64, "hash must be 64 hex chars");
}
#[tokio::test]
async fn signature_matches_memory_store_registry() {
let e = FastembedEmbedder::new();
let sig = e.signature();
let store = make_store();
store.register_model(&sig).await.expect("register via Embedder::signature");
let got = store.registered_model().await.expect("registered_model").expect("Some");
assert_eq!(got.name, sig.name);
assert_eq!(got.dimension, sig.dimension);
assert_eq!(got.hash, sig.hash);
}

View file

@ -0,0 +1,133 @@
//! Phase 1 integration tests: embedding model registry.
use std::sync::Arc;
use tempfile::tempdir;
use uuid::Uuid;
use vestige_core::storage::{
MemoryRecord, MemoryStore, MemoryStoreError, ModelSignature, SqliteMemoryStore,
};
fn make_store() -> Arc<dyn MemoryStore> {
let dir = tempdir().unwrap();
let db = dir.path().join("test.db");
std::mem::forget(dir);
let store = SqliteMemoryStore::new(Some(db)).expect("create store");
Arc::new(store)
}
fn sig_a() -> ModelSignature {
ModelSignature {
name: "model-a".to_string(),
dimension: 256,
hash: "a".repeat(64),
}
}
fn sig_b() -> ModelSignature {
ModelSignature {
name: "model-b".to_string(),
dimension: 256,
hash: "b".repeat(64),
}
}
fn record_without_embedding() -> MemoryRecord {
MemoryRecord {
id: Uuid::new_v4(),
domains: vec![],
domain_scores: Default::default(),
content: "plain text memory".to_string(),
node_type: "fact".to_string(),
tags: vec![],
embedding: None,
created_at: chrono::Utc::now(),
updated_at: chrono::Utc::now(),
metadata: serde_json::json!({}),
}
}
#[tokio::test]
async fn first_embedded_insert_auto_registers() {
// fresh store; register a model, then check registered_model() returns Some
let store = make_store();
let sig = sig_a();
store.register_model(&sig).await.expect("register");
let got = store.registered_model().await.expect("registered_model");
assert_eq!(got, Some(sig));
}
#[tokio::test]
async fn second_insert_with_same_signature_succeeds() {
let store = make_store();
let sig = sig_a();
store.register_model(&sig).await.expect("first register");
store.register_model(&sig).await.expect("second register idempotent");
}
#[tokio::test]
async fn second_insert_with_different_dimension_refused() {
let store = make_store();
let sig = sig_a(); // dim 256
store.register_model(&sig).await.expect("register 256");
// Try inserting a 512-dim vector into a store registered for 256
let mut rec = record_without_embedding();
rec.embedding = Some(vec![0.0f32; 512]);
rec.metadata = serde_json::json!({
"model_name": "model-a",
"model_dim": 256_u64,
"model_hash": "a".repeat(64),
});
let err = store.insert(&rec).await.unwrap_err();
assert!(
matches!(err, MemoryStoreError::InvalidInput(_)),
"expected InvalidInput for dim mismatch, got {:?}", err
);
}
#[tokio::test]
async fn second_insert_with_different_model_name_refused() {
let store = make_store();
store.register_model(&sig_a()).await.expect("register a");
let err = store.register_model(&sig_b()).await.unwrap_err();
assert!(
matches!(err, MemoryStoreError::ModelMismatch { .. }),
"expected ModelMismatch, got {:?}", err
);
}
#[tokio::test]
async fn second_insert_with_different_hash_refused() {
let store = make_store();
let sig = sig_a();
store.register_model(&sig).await.expect("register");
let sig_diff_hash = ModelSignature {
name: "model-a".to_string(),
dimension: 256,
hash: "c".repeat(64), // different hash
};
let err = store.register_model(&sig_diff_hash).await.unwrap_err();
assert!(
matches!(err, MemoryStoreError::ModelMismatch { .. }),
"expected ModelMismatch for different hash, got {:?}", err
);
}
#[tokio::test]
async fn no_embedding_insert_allowed_before_registration() {
let store = make_store();
// registered_model() should be None
assert!(store.registered_model().await.expect("registered_model").is_none());
// A plain text memory without an embedding must insert successfully
let rec = record_without_embedding();
store.insert(&rec).await.expect("plain insert before registration");
}
#[tokio::test]
async fn stats_reports_registered_model_after_first_write() {
let store = make_store();
let sig = sig_a();
store.register_model(&sig).await.expect("register");
let stats = store.get_stats().await.expect("stats");
assert_eq!(stats.registered_model_name, Some("model-a".to_string()));
assert_eq!(stats.registered_model_dim, Some(256));
}

View file

@ -0,0 +1,96 @@
//! Phase 1 integration tests: Arc<dyn MemoryStore> moves across tokio::spawn.
//!
//! This verifies that `#[trait_variant::make(MemoryStore: Send)]` actually
//! produces a Send-bound future so Arc<dyn MemoryStore> is movable.
use std::sync::Arc;
use tempfile::tempdir;
use uuid::Uuid;
use chrono::Utc;
use vestige_core::storage::{MemoryRecord, MemoryStore, SqliteMemoryStore};
fn make_store() -> Arc<dyn MemoryStore> {
let dir = tempdir().unwrap();
let db = dir.path().join("send_test.db");
std::mem::forget(dir);
Arc::new(SqliteMemoryStore::new(Some(db)).expect("create"))
}
fn make_record(content: &str) -> MemoryRecord {
MemoryRecord {
id: Uuid::new_v4(),
domains: vec![],
domain_scores: Default::default(),
content: content.to_string(),
node_type: "fact".to_string(),
tags: vec![],
embedding: None,
created_at: Utc::now(),
updated_at: Utc::now(),
metadata: serde_json::json!({}),
}
}
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn arc_dyn_memory_store_moves_across_tokio_tasks() {
let store: Arc<dyn MemoryStore> = make_store();
let mut handles = Vec::new();
for t in 0..16usize {
let store = Arc::clone(&store);
let handle = tokio::spawn(async move {
for i in 0..10usize {
let rec = make_record(&format!("task {t} memory {i}"));
store.insert(&rec).await.expect("insert in spawned task");
}
});
handles.push(handle);
}
for h in handles {
h.await.expect("task completed without panic");
}
let count = store.count().await.expect("count");
assert_eq!(count, 160, "all 16*10 inserts must be counted");
}
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn concurrent_readers_one_writer() {
let store: Arc<dyn MemoryStore> = make_store();
// Pre-populate with some data so readers have something to find
for i in 0..10usize {
let rec = make_record(&format!("concurrent reader memory {i}"));
store.insert(&rec).await.expect("pre-insert");
}
let mut handles = Vec::new();
// 32 concurrent readers
for _ in 0..32usize {
let store = Arc::clone(&store);
let handle = tokio::spawn(async move {
let results = store.fts_search("concurrent reader", 5).await;
// Should not panic even if results vary due to concurrent writes
results.expect("fts_search in concurrent reader");
});
handles.push(handle);
}
// 1 writer inserting more records
{
let store = Arc::clone(&store);
let writer_handle = tokio::spawn(async move {
for i in 0..20usize {
let rec = make_record(&format!("writer record {i}"));
store.insert(&rec).await.expect("concurrent insert");
}
});
handles.push(writer_handle);
}
for h in handles {
h.await.expect("no panics");
}
// Eventual consistency check: total count should be at least 10 (initial)
let count = store.count().await.expect("final count");
assert!(count >= 10, "at least the pre-populated records must persist");
}

View file

@ -0,0 +1,201 @@
//! Phase 1 integration tests: round-trip of every trait method through SqliteMemoryStore.
use std::sync::Arc;
use chrono::Utc;
use tempfile::tempdir;
use uuid::Uuid;
use vestige_core::storage::{
MemoryEdge, MemoryRecord, MemoryStore, SearchQuery, SqliteMemoryStore,
};
fn make_store() -> Arc<dyn MemoryStore> {
let dir = tempdir().unwrap();
let db = dir.path().join("test.db");
// keep the dir alive by leaking it -- this is fine for tests
std::mem::forget(dir);
let store = SqliteMemoryStore::new(Some(db)).expect("create store");
Arc::new(store)
}
fn make_record(content: &str) -> MemoryRecord {
MemoryRecord {
id: Uuid::new_v4(),
domains: vec![],
domain_scores: Default::default(),
content: content.to_string(),
node_type: "fact".to_string(),
tags: vec!["integration".to_string()],
embedding: None,
created_at: Utc::now(),
updated_at: Utc::now(),
metadata: serde_json::json!({}),
}
}
#[tokio::test]
async fn insert_get_update_delete() {
let store = make_store();
let rec = make_record("round-trip CRUD test");
let id = rec.id;
store.insert(&rec).await.expect("insert");
let got = store.get(id).await.expect("get").expect("exists");
assert_eq!(got.content, "round-trip CRUD test");
assert_eq!(got.node_type, "fact");
assert!(got.domains.is_empty());
assert!(got.domain_scores.is_empty());
let mut updated = got;
updated.content = "updated content".to_string();
store.update(&updated).await.expect("update");
let after_update = store.get(id).await.expect("get after update").expect("exists");
assert_eq!(after_update.content, "updated content");
store.delete(id).await.expect("delete");
let after_delete = store.get(id).await.expect("get after delete");
assert!(after_delete.is_none());
}
#[tokio::test]
async fn scheduling_upsert_and_due_scan() {
use vestige_core::storage::SchedulingState;
let store = make_store();
for i in 0..3usize {
let rec = make_record(&format!("sched memory {i}"));
let id = rec.id;
store.insert(&rec).await.expect("insert");
let next_review = Utc::now() - chrono::Duration::days((i as i64) + 1);
let state = SchedulingState {
memory_id: id,
stability: 1.0,
difficulty: 0.3,
retrievability: 0.7,
last_review: Some(Utc::now()),
next_review: Some(next_review),
reps: 1,
lapses: 0,
};
store.update_scheduling(&state).await.expect("update scheduling");
}
let due = store
.get_due_memories(Utc::now(), 10)
.await
.expect("get_due_memories");
assert_eq!(due.len(), 3, "all 3 should be due");
}
#[tokio::test]
async fn edge_crud() {
let store = make_store();
let rec_a = make_record("edge node A");
let rec_b = make_record("edge node B");
let id_a = rec_a.id;
let id_b = rec_b.id;
store.insert(&rec_a).await.expect("insert a");
store.insert(&rec_b).await.expect("insert b");
let edge = MemoryEdge {
source_id: id_a,
target_id: id_b,
edge_type: "semantic".to_string(),
weight: 0.85,
created_at: Utc::now(),
};
store.add_edge(&edge).await.expect("add edge");
let edges = store.get_edges(id_a, None).await.expect("get edges");
assert!(!edges.is_empty());
store.remove_edge(id_a, id_b).await.expect("remove edge");
let after = store.get_edges(id_a, None).await.expect("get edges after");
assert!(after.is_empty());
}
#[tokio::test]
async fn count_and_stats_track_inserts() {
let store = make_store();
for i in 0..10usize {
let rec = make_record(&format!("stats memory {i}"));
store.insert(&rec).await.expect("insert");
}
assert_eq!(store.count().await.expect("count"), 10);
let stats = store.get_stats().await.expect("stats");
assert_eq!(stats.total_memories, 10);
}
#[tokio::test]
async fn vacuum_after_deletes_reclaims() {
let dir = tempdir().unwrap();
let db = dir.path().join("vacuum_test.db");
let store = SqliteMemoryStore::new(Some(db)).expect("create store");
let store: Arc<dyn MemoryStore> = Arc::new(store);
let mut ids = Vec::new();
for i in 0..50usize {
let rec = make_record(&format!("vacuum memory {i}"));
let id = store.insert(&rec).await.expect("insert");
ids.push(id);
}
for id in &ids[..40] {
store.delete(*id).await.expect("delete");
}
// vacuum should not error
store.vacuum().await.expect("vacuum");
}
#[tokio::test]
async fn list_domains_empty_then_upsert_then_delete() {
use vestige_core::storage::Domain;
let store = make_store();
let domains = store.list_domains().await.expect("list empty");
assert!(domains.is_empty());
let d = Domain {
id: "test-domain".to_string(),
label: "Test Domain".to_string(),
centroid: vec![0.1f32, 0.2, 0.3],
top_terms: vec!["term1".to_string()],
memory_count: 5,
created_at: Utc::now(),
};
store.upsert_domain(&d).await.expect("upsert domain");
let after = store.list_domains().await.expect("list after upsert");
assert_eq!(after.len(), 1);
assert_eq!(after[0].id, "test-domain");
store.delete_domain("test-domain").await.expect("delete domain");
let after_delete = store.list_domains().await.expect("list after delete");
assert!(after_delete.is_empty());
}
#[tokio::test]
async fn classify_with_no_domains_returns_empty() {
let store = make_store();
let result = store.classify(&[0.1f32, 0.2, 0.3]).await.expect("classify");
assert!(result.is_empty());
}
#[tokio::test]
async fn search_hybrid_returns_results() {
let store = make_store();
let rec = make_record("quantum entanglement superposition physics");
store.insert(&rec).await.expect("insert");
// Verify fts_search works first (sanity check)
let fts_results = store.fts_search("quantum", 10).await.expect("fts_search");
assert!(!fts_results.is_empty(), "fts_search must find 'quantum' after insert");
let query = SearchQuery {
text: Some("quantum physics".to_string()),
limit: 10,
..Default::default()
};
let results = store.search(&query).await.expect("search");
// FTS results should include our inserted record
assert!(!results.is_empty(), "search must return results for 'quantum physics'");
assert!(results[0].score >= 0.0);
}