fix memory regression: jemalloc, debug endpoints, state eviction, stress tests

Switch brightstaff to jemalloc to fix glibc malloc fragmentation causing
OOM in prod routing service deployments. Add /debug/memstats and
/debug/state_size endpoints for runtime observability. Add TTL eviction
and max_entries cap to MemoryConversationalStorage. Cap tracing span
attributes/events. Include routing stress tests proving zero per-request
leak, and a Python load test for E2E validation.
This commit is contained in:
Adil Hafeez 2026-04-14 15:03:49 -07:00
parent 980faef6be
commit ec5d3660cd
13 changed files with 1550 additions and 1050 deletions

View file

@ -123,6 +123,12 @@ pub struct StateStorageConfig {
#[serde(rename = "type")]
pub storage_type: StateStorageType,
pub connection_string: Option<String>,
/// TTL in seconds for in-memory state entries (default: 1800 = 30 min).
/// Only applies when type is `memory`.
pub ttl_seconds: Option<u64>,
/// Maximum number of in-memory state entries (default: 10000).
/// Only applies when type is `memory`.
pub max_entries: Option<usize>,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]