fix memory regression: jemalloc, debug endpoints, state eviction, stress tests

Switch brightstaff to jemalloc to fix glibc malloc fragmentation causing
OOM in prod routing service deployments. Add /debug/memstats and
/debug/state_size endpoints for runtime observability. Add TTL eviction
and max_entries cap to MemoryConversationalStorage. Cap tracing span
attributes/events. Include routing stress tests proving zero per-request
leak, and a Python load test for E2E validation.
This commit is contained in:
Adil Hafeez 2026-04-14 15:03:49 -07:00
parent 980faef6be
commit ec5d3660cd
13 changed files with 1550 additions and 1050 deletions

View file

@ -472,6 +472,14 @@ properties:
connection_string:
type: string
description: Required when type is postgres. Supports environment variable substitution using $VAR or ${VAR} syntax.
ttl_seconds:
type: integer
minimum: 60
description: TTL in seconds for in-memory state entries. Only applies when type is memory. Default 1800 (30 min).
max_entries:
type: integer
minimum: 100
description: Maximum number of in-memory state entries. Only applies when type is memory. Default 10000.
additionalProperties: false
required:
- type