Mirror search_threads visibility in the referenced-chat resolver: a
search-space owner can now @-mention legacy threads that predate creator
tracking (null created_by_id), instead of those being silently dropped.
Add the referenced_chat_context slice: models for the data shapes, a
fail-closed resolver that fetches mentioned threads and their visible
turns under the same access rules as thread search, and a transcript
renderer that emits a budgeted <referenced_chat_context> block. When a
chat exceeds the per-reference character budget, recent turns are kept
and any leftover budget is filled with the overflowing turn's tail, with
truncation markers signalling the cut.
Some OpenAI-compatible image backends (e.g. Xinference) return a relative
URL like /files/image.png in data[0].url instead of an absolute one.
Browsers cannot resolve these, causing images to fail to load.
Track the provider's api_base after resolving model config via to_litellm().
When the returned URL starts with "/", extract the origin (scheme + host + port)
from api_base and prepend it to produce a full absolute URL.
No behaviour change for providers that return absolute URLs (OpenAI, Azure, etc).
Closes#1496
Presentation and citation ordering moves off Chunk.id/created_at to the
explicit position column (id kept as tiebreaker). Vector and ts_rank
ranking order_by clauses are untouched.
document_converters, the github size-fallback chunker, revert_service
restores, and the kb-persistence middleware now write explicit positions
(the middleware read path also orders by position).
- Introduced lazy knowledge base retrieval mode, allowing the main agent to fetch KB content on demand via the `search_knowledge_base` tool, improving performance by skipping expensive pre-injection processes.
- Added cross-thread caching capability, enabling reuse of compiled graphs across different user chats, reducing latency for returning users.
- Updated middleware to support new lazy loading and caching features, ensuring efficient resource utilization and improved response times.
- Enhanced logging for performance tracking during knowledge retrieval and agent interactions.
- Integrated performance logging in `OtelSpanMiddleware` to track model call durations even when OTel is disabled.
- Added detailed performance metrics in `KnowledgePriorityMiddleware` for database operations and embedding processes, improving visibility into query performance.
- Utilized `get_perf_logger` for consistent logging across middleware components.