mirror of
https://github.com/samvallad33/vestige.git
synced 2026-06-12 20:45:16 +02:00
feat(mcp/system_status): add optional schema_introspection flag (#69)
* feat(mcp/system_status): add optional schema_introspection flag
Adds an optional `schema_introspection: bool` parameter to the
`system_status` MCP tool. When set to true, the response gains a
`schema` block carrying:
- `schemaVersion` (u32) — highest applied migration, mirrors
`Storage::current_schema_version` (now exposed via a typed public
method).
- `schemaVersionAppliedAt` (RFC3339, optional) — timestamp the
current schema_version row was applied.
- `tables` ([{name, rows, columns}]) — per-table row count + column
list, walked over the canonical PORTABLE_USER_DATA_TABLES set so
the surface stays stable across migrations rather than enumerating
arbitrary sqlite_master rows.
- `embeddingNullCount` (i64) — count of knowledge_nodes with NO row
in node_embeddings. Distinct from MemoryStats.nodes_with_embeddings
(which keys off the `has_embedding` flag column), so audit scripts
can detect drift between the flag and the join-based truth.
- `activeEmbeddingModel` (string, optional) + `activeEmbeddingDimensions`
(u32, optional) — mirrors the existing MemoryStats active-model
fields, included here so audits get schema_version + active model
in a single round-trip.
Motivation: external consumers (audit scripts, migration guards,
downstream binary upgrade scripts) currently must read SQLite
directly to learn the schema shape, which couples them to internals
Vestige owns and breaks on every migration. This PR closes that gap
with a first-class MCP surface.
Implementation:
- New `pub fn schema_introspection() -> Result<SchemaIntrospection>`
inherent method on `Storage` (sqlite.rs). Inherent, not on a
trait — schema-walk is SQLite-specific by nature, so this stays
out of any future MemoryStore trait extraction.
- New typed structs `SchemaIntrospection` + `TableIntrospection` in
memory/mod.rs (canonical home alongside MemoryStats), re-exported
from the crate root.
- MCP layer (maintenance.rs) parses `SystemStatusArgs`, conditionally
extends the existing response object with a `schema` key — additive,
default off, response shape unchanged when omitted.
Coupling assessment vs PR #61 (storage-trait-phase1):
This PR adds ONE new public inherent method on `Storage` plus uses
three already-existing private helpers (`current_schema_version`,
`table_exists`, `table_row_count`, `table_columns`). It does NOT
touch the existing inherent method signatures, does NOT add anything
to the prospective `MemoryStore` trait surface, and does NOT modify
any of the ~25 methods #61 lifts into the trait. PR #61 is purely
additive on the trait surface (per its description, `pub type
Storage = SqliteMemoryStore;` preserves all existing call sites);
this PR is additive on the inherent surface. Two purely-additive
changes to disjoint surfaces should rebase cleanly.
Tests:
- system_status_schema_has_schema_introspection_flag (schema
introspection: property present, type=boolean, default=false,
not required)
- system_status_without_schema_flag_omits_schema_block
(backwards-compat: unset/false → no `schema` key)
- system_status_with_schema_flag_emits_schema_block (positive case:
schema block present, schemaVersion >= 13, tables non-empty,
knowledge_nodes row count + columns sane, convenience fields
present)
- system_status_camelcase_alias (#[serde(rename_all="camelCase")] +
alias works for both snake and camel input)
- storage_schema_introspection_method (Storage-layer method tested
directly, independent of MCP)
Closes the second of two gaps surfaced in the knowledge-mgmt-sota-uplift
initiative. Companion to PR #68 (search.tag_prefix). The two PRs are
deliberately decoupled — this one carries the storage-layer surface
extension; the other is MCP-layer-only.
* fix(memory): derive Default on SchemaIntrospection to satisfy clippy
The manual `impl Default for SchemaIntrospection` tripped
`clippy::derivable_impls` under the workspace's `-D warnings` CI gate.
All fields are types with `Default` impls (`u32`, `Option<T>`, `Vec<T>`,
`i64`), so deriving is equivalent and clippy-clean. Matches the existing
style of `ConsolidationResult` further down in the same file.
This commit is contained in:
parent
5aa261398d
commit
b01269db22
4 changed files with 367 additions and 4 deletions
|
|
@ -127,9 +127,11 @@ pub use memory::{
|
|||
MemorySystem,
|
||||
NodeType,
|
||||
RecallInput,
|
||||
SchemaIntrospection,
|
||||
SearchMode,
|
||||
SearchResult,
|
||||
SimilarityResult,
|
||||
TableIntrospection,
|
||||
TemporalRange,
|
||||
};
|
||||
|
||||
|
|
|
|||
|
|
@ -276,6 +276,59 @@ impl Default for MemoryStats {
|
|||
}
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// SCHEMA INTROSPECTION (v2.1.24+: surfaces DB shape to MCP consumers)
|
||||
// ============================================================================
|
||||
|
||||
/// A single SQLite table's introspected shape: name, row count, column list.
|
||||
///
|
||||
/// Returned as part of `SchemaIntrospection` from `Storage::schema_introspection()`.
|
||||
/// Consumers needing more depth (e.g. per-column NULL counts) should request
|
||||
/// targeted methods rather than expecting this struct to grow unboundedly —
|
||||
/// the row + column shape covered here is the 80% case for audit / migration
|
||||
/// guard scripts.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct TableIntrospection {
|
||||
/// SQLite table name.
|
||||
pub name: String,
|
||||
/// Row count.
|
||||
pub rows: i64,
|
||||
/// Column names in declaration order.
|
||||
pub columns: Vec<String>,
|
||||
}
|
||||
|
||||
/// Result of `Storage::schema_introspection()`. Snapshots the schema version,
|
||||
/// migration timestamp, and a row/column view of every user-data table.
|
||||
///
|
||||
/// Motivation: external consumers (audit scripts, migration guards, downstream
|
||||
/// upgrade scripts) currently must read SQLite directly to learn the schema
|
||||
/// version and table shape, which couples them to internal layout. This struct
|
||||
/// gives them a first-class MCP-callable surface. The list of tables walked is
|
||||
/// intentionally the same canonical set used elsewhere in storage (the user-
|
||||
/// data tables) so the surface stays stable across migrations.
|
||||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
pub struct SchemaIntrospection {
|
||||
/// Current schema version (highest applied migration; matches the
|
||||
/// `schema_version` table's MAX(version)).
|
||||
pub schema_version: u32,
|
||||
/// When the current schema version was applied (RFC3339), if known.
|
||||
pub schema_version_applied_at: Option<DateTime<Utc>>,
|
||||
/// Per-table introspection rows.
|
||||
pub tables: Vec<TableIntrospection>,
|
||||
/// Total number of nodes whose `embeddings.embedding` is NULL (i.e., have
|
||||
/// no embedding row). Convenience field for embedding-coverage audits;
|
||||
/// equivalent to (knowledge_nodes.rows − rows in `embeddings` joined to
|
||||
/// knowledge_nodes), so consumers don't have to compute it themselves.
|
||||
pub embedding_null_count: i64,
|
||||
/// Active embedding model name (mirrors `MemoryStats.active_embedding_model`).
|
||||
/// Useful when an audit script wants schema_version + active model in one call.
|
||||
pub active_embedding_model: Option<String>,
|
||||
/// Embedding dimensions for the active model, if known.
|
||||
pub active_embedding_dimensions: Option<u32>,
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// CONSOLIDATION RESULT
|
||||
// ============================================================================
|
||||
|
|
|
|||
|
|
@ -1890,6 +1890,103 @@ impl Storage {
|
|||
})
|
||||
}
|
||||
|
||||
/// Introspect the live SQLite schema: schema version + per-table row/column
|
||||
/// shape + embedding-coverage convenience fields.
|
||||
///
|
||||
/// This is the v2.1.24+ replacement for the direct-SQLite reads that
|
||||
/// audit scripts and migration guards previously had to perform. The set
|
||||
/// of tables walked matches `PORTABLE_USER_DATA_TABLES` — the same
|
||||
/// canonical set used by portable export / import — so the surface stays
|
||||
/// stable across migrations rather than chasing arbitrary
|
||||
/// `sqlite_master` rows.
|
||||
///
|
||||
/// Cost: O(N_tables) `COUNT(*)` queries + one PRAGMA per table. Negligible
|
||||
/// at the table cardinalities Vestige carries (~15 tables, all indexed).
|
||||
/// Safe to call on every MCP `system_status` invocation when the flag is
|
||||
/// set; callers wanting to limit cost should leave the flag off (default).
|
||||
pub fn schema_introspection(&self) -> Result<crate::SchemaIntrospection> {
|
||||
let reader = self
|
||||
.reader
|
||||
.lock()
|
||||
.map_err(|_| StorageError::Init("Reader lock poisoned".into()))?;
|
||||
|
||||
let schema_version = Self::current_schema_version(&reader)?;
|
||||
|
||||
// schema_version has the row (version PK + applied_at TEXT). Read the
|
||||
// applied_at for the current version row; tolerate failure (legacy
|
||||
// databases may have skipped the applied_at fill on early upgrades).
|
||||
let applied_at_str: Option<String> = reader
|
||||
.query_row(
|
||||
"SELECT applied_at FROM schema_version WHERE version = ?1",
|
||||
params![schema_version as i64],
|
||||
|row| row.get(0),
|
||||
)
|
||||
.optional()?;
|
||||
let schema_version_applied_at = applied_at_str.and_then(|s| {
|
||||
// The migration scripts use `datetime('now')` which yields
|
||||
// SQLite's "YYYY-MM-DD HH:MM:SS" UTC form (NOT RFC3339).
|
||||
// Try the SQLite form first, fall back to RFC3339 for any
|
||||
// future migrations that switch.
|
||||
chrono::NaiveDateTime::parse_from_str(&s, "%Y-%m-%d %H:%M:%S")
|
||||
.map(|naive| naive.and_utc())
|
||||
.or_else(|_| {
|
||||
DateTime::parse_from_rfc3339(&s)
|
||||
.map(|dt| dt.with_timezone(&Utc))
|
||||
})
|
||||
.ok()
|
||||
});
|
||||
|
||||
let mut tables = Vec::with_capacity(PORTABLE_USER_DATA_TABLES.len());
|
||||
for table_name in PORTABLE_USER_DATA_TABLES {
|
||||
if Self::table_exists(&reader, table_name)? {
|
||||
let rows = Self::table_row_count(&reader, table_name)?;
|
||||
let columns = Self::table_columns(&reader, table_name)?;
|
||||
tables.push(crate::TableIntrospection {
|
||||
name: (*table_name).to_string(),
|
||||
rows,
|
||||
columns,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Convenience: embedding-coverage NULL count. Defined as the number
|
||||
// of knowledge_nodes with NO matching row in node_embeddings. This is
|
||||
// distinct from `nodes_with_embeddings` in MemoryStats (which uses
|
||||
// the `has_embedding` column flag); we compute the join-based truth
|
||||
// here so audit scripts can detect drift between the flag and the
|
||||
// actual embeddings table.
|
||||
let embedding_null_count: i64 = reader
|
||||
.query_row(
|
||||
"SELECT COUNT(*) FROM knowledge_nodes kn
|
||||
WHERE NOT EXISTS (
|
||||
SELECT 1 FROM node_embeddings ne WHERE ne.node_id = kn.id
|
||||
)",
|
||||
[],
|
||||
|row| row.get(0),
|
||||
)
|
||||
.unwrap_or(0);
|
||||
|
||||
#[cfg(feature = "embeddings")]
|
||||
let active_embedding_model = Some(self.embedding_service.model_name().to_string());
|
||||
#[cfg(not(feature = "embeddings"))]
|
||||
let active_embedding_model: Option<String> = None;
|
||||
|
||||
#[cfg(feature = "embeddings")]
|
||||
let active_embedding_dimensions: Option<u32> =
|
||||
Some(self.embedding_service.dimensions() as u32);
|
||||
#[cfg(not(feature = "embeddings"))]
|
||||
let active_embedding_dimensions: Option<u32> = None;
|
||||
|
||||
Ok(crate::SchemaIntrospection {
|
||||
schema_version,
|
||||
schema_version_applied_at,
|
||||
tables,
|
||||
embedding_null_count,
|
||||
active_embedding_model,
|
||||
active_embedding_dimensions,
|
||||
})
|
||||
}
|
||||
|
||||
/// Delete a node
|
||||
pub fn delete_node(&self, id: &str) -> Result<bool> {
|
||||
let mut writer = self
|
||||
|
|
|
|||
|
|
@ -105,10 +105,24 @@ pub fn gc_schema() -> Value {
|
|||
pub fn system_status_schema() -> Value {
|
||||
serde_json::json!({
|
||||
"type": "object",
|
||||
"properties": {}
|
||||
"properties": {
|
||||
"schema_introspection": {
|
||||
"type": "boolean",
|
||||
"description": "When true, extends the response with a 'schema' block carrying the SQLite schema version, per-table row counts + column lists, and embedding-coverage convenience fields. Default: false (response shape unchanged). Use this for audit / migration-guard / downstream-upgrade scripts that otherwise have to read SQLite directly.",
|
||||
"default": false
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// Arguments for the system_status tool. All optional.
|
||||
#[derive(Debug, Default, Deserialize)]
|
||||
#[serde(rename_all = "camelCase")]
|
||||
struct SystemStatusArgs {
|
||||
#[serde(alias = "schema_introspection")]
|
||||
schema_introspection: Option<bool>,
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// EXECUTE FUNCTIONS
|
||||
// ============================================================================
|
||||
|
|
@ -117,11 +131,24 @@ pub fn system_status_schema() -> Value {
|
|||
///
|
||||
/// Returns system health status, full statistics, FSRS preview,
|
||||
/// cognitive module health, state distribution, and actionable recommendations.
|
||||
///
|
||||
/// v2.1.24+: when `schema_introspection: true` is passed, the response
|
||||
/// additionally carries a `schema` block with the live SQLite schema version,
|
||||
/// per-table row counts + column lists, and embedding-coverage convenience
|
||||
/// fields. Default off; response shape unchanged when omitted.
|
||||
pub async fn execute_system_status(
|
||||
storage: &Arc<Storage>,
|
||||
cognitive: &Arc<Mutex<CognitiveEngine>>,
|
||||
_args: Option<Value>,
|
||||
args: Option<Value>,
|
||||
) -> Result<Value, String> {
|
||||
// Parse arguments (all optional, including the args envelope itself).
|
||||
let parsed: SystemStatusArgs = match args {
|
||||
Some(v) => serde_json::from_value(v)
|
||||
.map_err(|e| format!("Invalid arguments: {}", e))?,
|
||||
None => SystemStatusArgs::default(),
|
||||
};
|
||||
let include_schema = parsed.schema_introspection.unwrap_or(false);
|
||||
|
||||
let stats = storage.get_stats().map_err(|e| e.to_string())?;
|
||||
|
||||
// === Health assessment ===
|
||||
|
|
@ -259,7 +286,7 @@ pub async fn execute_system_status(
|
|||
};
|
||||
let last_backup = storage.last_backup_timestamp();
|
||||
|
||||
Ok(serde_json::json!({
|
||||
let mut response = serde_json::json!({
|
||||
"tool": "system_status",
|
||||
// Health
|
||||
"status": status,
|
||||
|
|
@ -299,7 +326,34 @@ pub async fn execute_system_status(
|
|||
"lastBackupTimestamp": last_backup.map(|dt| dt.to_rfc3339()),
|
||||
"lastConsolidationTimestamp": last_consolidation.map(|dt| dt.to_rfc3339()),
|
||||
},
|
||||
}))
|
||||
});
|
||||
|
||||
// v2.1.24+: optional schema introspection block. Default off; response
|
||||
// shape unchanged when omitted.
|
||||
if include_schema {
|
||||
let intro = storage.schema_introspection().map_err(|e| e.to_string())?;
|
||||
let tables_json: Vec<Value> = intro
|
||||
.tables
|
||||
.iter()
|
||||
.map(|t| {
|
||||
serde_json::json!({
|
||||
"name": t.name,
|
||||
"rows": t.rows,
|
||||
"columns": t.columns,
|
||||
})
|
||||
})
|
||||
.collect();
|
||||
response["schema"] = serde_json::json!({
|
||||
"schemaVersion": intro.schema_version,
|
||||
"schemaVersionAppliedAt": intro.schema_version_applied_at.map(|dt| dt.to_rfc3339()),
|
||||
"tables": tables_json,
|
||||
"embeddingNullCount": intro.embedding_null_count,
|
||||
"activeEmbeddingModel": intro.active_embedding_model,
|
||||
"activeEmbeddingDimensions": intro.active_embedding_dimensions,
|
||||
});
|
||||
}
|
||||
|
||||
Ok(response)
|
||||
}
|
||||
|
||||
/// Consolidate tool
|
||||
|
|
@ -792,6 +846,163 @@ mod tests {
|
|||
assert!(triggers["lastDreamTimestamp"].is_null());
|
||||
}
|
||||
|
||||
// ========================================================================
|
||||
// SCHEMA INTROSPECTION TESTS (PR2)
|
||||
// ========================================================================
|
||||
|
||||
#[test]
|
||||
fn test_system_status_schema_has_schema_introspection_flag() {
|
||||
let schema = system_status_schema();
|
||||
let props = &schema["properties"];
|
||||
let flag = &props["schema_introspection"];
|
||||
assert!(flag.is_object(), "schema_introspection property must exist");
|
||||
assert_eq!(flag["type"], "boolean");
|
||||
assert_eq!(flag["default"], false);
|
||||
// Top-level required must NOT include this — flag is opt-in.
|
||||
let required = schema.get("required");
|
||||
if let Some(req) = required {
|
||||
let req_arr = req.as_array().unwrap();
|
||||
assert!(!req_arr.contains(&serde_json::json!("schema_introspection")));
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_system_status_without_schema_flag_omits_schema_block() {
|
||||
// Backwards-compat: when the flag is not set (or false), the response
|
||||
// shape is unchanged — no `schema` key.
|
||||
let (storage, _dir) = test_storage().await;
|
||||
let result = execute_system_status(&storage, &test_cognitive(), None).await;
|
||||
assert!(result.is_ok());
|
||||
let value = result.unwrap();
|
||||
assert!(
|
||||
value.get("schema").is_none(),
|
||||
"schema block must NOT be present when flag is unset, got {:?}",
|
||||
value.get("schema")
|
||||
);
|
||||
|
||||
// Explicit false → still no schema block.
|
||||
let result = execute_system_status(
|
||||
&storage,
|
||||
&test_cognitive(),
|
||||
Some(serde_json::json!({ "schema_introspection": false })),
|
||||
)
|
||||
.await;
|
||||
assert!(result.is_ok());
|
||||
let value = result.unwrap();
|
||||
assert!(value.get("schema").is_none());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_system_status_with_schema_flag_emits_schema_block() {
|
||||
let (storage, _dir) = test_storage().await;
|
||||
storage
|
||||
.ingest(vestige_core::IngestInput {
|
||||
content: "Schema introspection seed memory".to_string(),
|
||||
node_type: "fact".to_string(),
|
||||
source: None,
|
||||
sentiment_score: 0.0,
|
||||
sentiment_magnitude: 0.0,
|
||||
tags: vec!["schema-test".to_string()],
|
||||
valid_from: None,
|
||||
valid_until: None,
|
||||
})
|
||||
.unwrap();
|
||||
|
||||
let result = execute_system_status(
|
||||
&storage,
|
||||
&test_cognitive(),
|
||||
Some(serde_json::json!({ "schema_introspection": true })),
|
||||
)
|
||||
.await;
|
||||
assert!(result.is_ok(), "{:?}", result);
|
||||
let value = result.unwrap();
|
||||
|
||||
// Shape assertions.
|
||||
let schema_block = value
|
||||
.get("schema")
|
||||
.expect("schema block must be present when flag is true");
|
||||
assert!(schema_block.is_object());
|
||||
assert!(
|
||||
schema_block["schemaVersion"].is_number(),
|
||||
"schemaVersion must be a number, got {:?}",
|
||||
schema_block["schemaVersion"]
|
||||
);
|
||||
// Schema version should be >= 13 (V13 is the highest landed migration
|
||||
// at the time this PR was authored).
|
||||
let v = schema_block["schemaVersion"].as_u64().unwrap();
|
||||
assert!(v >= 13, "expected schema_version >= 13, got {}", v);
|
||||
|
||||
// tables should be a non-empty array of {name, rows, columns}.
|
||||
let tables = schema_block["tables"].as_array().unwrap();
|
||||
assert!(!tables.is_empty(), "expected at least one table");
|
||||
let kn = tables
|
||||
.iter()
|
||||
.find(|t| t["name"] == "knowledge_nodes")
|
||||
.expect("knowledge_nodes table must be present");
|
||||
assert_eq!(kn["rows"], 1, "ingested exactly one memory");
|
||||
let cols = kn["columns"].as_array().unwrap();
|
||||
assert!(!cols.is_empty(), "knowledge_nodes must have columns");
|
||||
// The id column is universally present.
|
||||
let col_names: Vec<&str> = cols.iter().filter_map(|c| c.as_str()).collect();
|
||||
assert!(
|
||||
col_names.contains(&"id"),
|
||||
"knowledge_nodes.id must be in columns list: {:?}",
|
||||
col_names
|
||||
);
|
||||
|
||||
// Convenience fields.
|
||||
assert!(schema_block["embeddingNullCount"].is_number());
|
||||
// activeEmbeddingModel may be null if the `embeddings` feature is
|
||||
// not enabled in the test build; just check the key exists.
|
||||
assert!(schema_block.get("activeEmbeddingModel").is_some());
|
||||
assert!(schema_block.get("activeEmbeddingDimensions").is_some());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_system_status_camelcase_alias() {
|
||||
// Accept both `schema_introspection` (snake) and `schemaIntrospection`
|
||||
// (camel) per the #[serde(rename_all = "camelCase")] + alias attr.
|
||||
let (storage, _dir) = test_storage().await;
|
||||
let result = execute_system_status(
|
||||
&storage,
|
||||
&test_cognitive(),
|
||||
Some(serde_json::json!({ "schemaIntrospection": true })),
|
||||
)
|
||||
.await;
|
||||
assert!(result.is_ok(), "{:?}", result);
|
||||
let value = result.unwrap();
|
||||
assert!(
|
||||
value.get("schema").is_some(),
|
||||
"camelCase form must also trigger schema block"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_storage_schema_introspection_method() {
|
||||
// Direct test on the Storage method, independent of the MCP layer.
|
||||
let dir = TempDir::new().unwrap();
|
||||
let storage = Storage::new(Some(dir.path().join("test.db"))).unwrap();
|
||||
let intro = storage
|
||||
.schema_introspection()
|
||||
.expect("schema_introspection must succeed on a fresh DB");
|
||||
|
||||
// Schema version pulled from the schema_version table.
|
||||
assert!(
|
||||
intro.schema_version >= 13,
|
||||
"fresh DB should be at schema_version >= 13, got {}",
|
||||
intro.schema_version
|
||||
);
|
||||
// At least one walked table should exist.
|
||||
assert!(
|
||||
!intro.tables.is_empty(),
|
||||
"expected at least one user-data table"
|
||||
);
|
||||
// Empty DB → no embeddings → embedding_null_count == 0 (no rows to
|
||||
// count). Once we ingest, it should be > 0 (no embeddings generated
|
||||
// in tests by default).
|
||||
assert_eq!(intro.embedding_null_count, 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_portable_export_writes_archive_to_storage_exports_dir() {
|
||||
let (storage, _dir) = test_storage().await;
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue