SurfSense

mirror of https://github.com/MODSetter/SurfSense.git synced 2026-05-29 19:35:20 +02:00

Author	SHA1	Message	Date
CREDO23	cf0085575c	refactor(chat): add streaming/flows/resume_chat/orchestrator + flows public API Slim composition root for the resume-chat streaming flow. Mirrors the new_chat orchestrator but specialized for resumed turns: * no fresh user turn, no title generation, no image-capability gate * persists a fresh assistant shell for the resumed turn * applies build_resume_routing to dispatch user decisions to the correct paused subagent before invoking the agent * shares the same stream_loop + flow-local _recover closure for in- stream provider rate-limit recovery Also lands flows/__init__.py, which becomes the public chat-flow API: from app.tasks.chat.streaming.flows import stream_new_chat, stream_resume_chat Existing wiring (routes, contract test) still imports from the legacy app.tasks.chat.stream_new_chat module. Cutover is the next phase.	2026-05-25 21:50:09 +02:00
CREDO23	885d4acda9	refactor(chat): add streaming/flows/resume_chat/ per-concern leaf modules Three focused modules used by the upcoming resume-chat orchestrator: * runtime_context: build_resume_chat_runtime_context assembles the SurfSenseContextSchema for a resume turn (handles empty mention lists, since resume requests do not carry fresh @-mentions). * assistant_shell: persist_resume_assistant_shell writes a fresh assistant row for the resumed turn so the post-stream finalize has a target. * resume_routing: build_resume_routing collects the pending interrupts across paused subagents and slices the flat list of ResumeDecision[] into the correct (thread, subagent) buckets so LangGraph routes each decision back to the right paused tool call. Add-only; no orchestrator yet (next commit).	2026-05-25 21:50:03 +02:00
CREDO23	b2a0888588	refactor(chat): add streaming/flows/new_chat/orchestrator.stream_new_chat Slim composition root for the new-chat streaming flow. Sequences: 1. validate inputs and load the LLM bundle (negative id => YAML) 2. open the OTEL chat_request span; set agent_mode tag 3. spawn the four pre-stream DB writes (set-ai-responding, persist user turn, persist assistant shell, first-assistant probe) 4. reserve premium quota (with free-fallback retry on denial) 5. build connector + checkpointer + agent + input_state 6. emit first frames (message-start, step-start, initial thinking step) 7. spawn the background title generator 8. run the shared stream_loop with a flow-local _recover closure that reroutes to the next auto-pin config on provider 429s 9. finalize: emit terminal title/token frames, shielded assistant finalize, release-or-finalize premium quota, close session, GC, record OTEL outcome Public entry-point flows/new_chat/__init__ re-exports stream_new_chat. Existing wiring (routes, tests) still imports the legacy function from app.tasks.chat.stream_new_chat. Cutover is a later commit.	2026-05-25 21:49:55 +02:00
CREDO23	927009745e	refactor(chat): add streaming/flows/new_chat/ per-concern leaf modules Seven focused modules that the upcoming new_chat orchestrator composes: * auto_pin: resolve_initial_auto_pin selects the initial config (with vision-capable filtering and error classification). * llm_capability: check_image_input_capability blocks routing an image-bearing turn to a known text-only model. * runtime_context: build_new_chat_runtime_context assembles the SurfSenseContextSchema for a new-chat turn. * persistence_spawn: spawn_set_ai_responding_bg, spawn_persist_user_task, spawn_persist_assistant_shell_task, and await_persist_task background the four pre-stream DB writes so they overlap with agent build. * initial_thinking_step: build_initial_thinking_step + iter_initial_thinking_step_frame produce the very first thinking-1 SSE step ("Understanding your request" / "Analyzing referenced content"). * title_gen: spawn_title_task + maybe_emit_title_update + await_pending_title_update background the thread-title generator and interleave its update into the stream when ready. * input_state: build_new_chat_input_state assembles the LangGraph input_state (history bootstrap, mentions resolution, context blocks, human-message construction). The heavy one. Add-only; no orchestrator yet (next commit).	2026-05-25 21:49:45 +02:00
CREDO23	21bddc73a7	refactor(chat): add streaming/flows/shared/assistant_finalize.py Extracts finalize_assistant_message: the post-stream server-side write of the final assistant message (with content parts + token usage) guarded by asyncio.shield + shielded_async_session so a client disconnect cannot abort the persist. Add-only; legacy stream_new_chat.py keeps its inline finalize block until cutover.	2026-05-25 21:49:31 +02:00
CREDO23	b54b803dc9	refactor(chat): add streaming/flows/shared/ rate-limit recovery + stream loop Two cooperating modules that wrap stream_agent_events with in-stream recovery from provider 429s: * rate_limit_recovery: can_recover_provider_rate_limit truth-table guard, reroute_to_next_auto_pin (selects the next eligible auto-pin config and reloads the LLM bundle), log_rate_limit_recovered. * stream_loop: run_stream_loop drives stream_agent_events in a while-True loop, delegating recovery to a flow-supplied RecoverFn callback so new_chat and resume_chat can share the same loop while keeping their own nonlocal state. Add-only; not yet wired into any orchestrator.	2026-05-25 21:49:27 +02:00
CREDO23	2c3edb7c84	refactor(chat): add streaming/flows/shared/terminal_error.py Extracts handle_terminal_exception: the shared except-branch behavior for the chat orchestrators. Classifies the raised exception, logs the structured chat_stream error event, and emits the terminal-error SSE frame + done sentinel via the streaming service. Add-only; nothing imports it yet.	2026-05-25 21:49:18 +02:00
CREDO23	40300d300a	refactor(chat): add streaming/flows/shared/premium_quota.py Centralizes the premium-credits lifecycle for chat turns: * needs_premium_quota: gate check (premium user + non-fallback config). * PremiumReservation: dataclass capturing reservation state + token totals. * reserve_premium / finalize_premium / release_premium: idempotent reservation, commit, and rollback used by the orchestrators. Add-only; legacy stream_new_chat.py keeps its inline quota handling until cutover.	2026-05-25 21:49:14 +02:00
CREDO23	e9a98ecafb	refactor(chat): add streaming/flows/shared/ base helpers Six small, single-purpose modules shared by the upcoming new_chat and resume_chat orchestrators: * llm_bundle: dispatches negative config_id to the YAML loader and non-negative config_id to the DB loader, returning (llm, AgentConfig). * pre_stream_setup: builds the connector service, resolves the Firecrawl API key, and returns the chat checkpointer. * first_frames: iter_initial_frames + iter_final_frames emit the canonical message-start / step-start / idle / finish / done SSE envelope. * finalize_emit: iter_token_usage_frame emits the per-turn usage frame from a TokenAccumulator summary. * finally_cleanup: close_session_and_clear_ai_responding and run_gc_pass centralize the finally-block bookkeeping. * span: open_chat_request_span / set_agent_mode / close_chat_request_span / record_outcome_attrs wrap the OpenTelemetry chat_request span. Add-only; these are not yet wired into stream_new_chat.py.	2026-05-25 21:49:09 +02:00
CREDO23	26c569467d	refactor(chat): add streaming/agent/event_loop.stream_agent_events Extracts the inner agent-streaming driver previously inlined as _stream_agent_events in stream_new_chat.py. stream_agent_events drives graph_stream.event_stream.stream_output and, after the agent finishes, performs the post-stream safety-net work: * commit any pending content the agent never explicitly finished * evaluate file-operation contract outcomes and emit the appropriate contract verdict for desktop_local_folder turns This unit is what flows/shared/stream_loop.py wraps in the rate-limit recovery while-loop. Add-only; no existing wiring uses it yet.	2026-05-25 21:48:26 +02:00
CREDO23	94bc827252	refactor(chat): add streaming/agent/ package with build_main_agent_for_thread Extracts the agent-construction wrapper that the chat streamers call to materialize the LangGraph agent for a given thread. Centralizes how we pass the agent factory plus checkpointer, runtime context, and the in-memory content builder. Add-only; pre-existing inline equivalent in stream_new_chat.py stays until cutover.	2026-05-25 21:48:20 +02:00
CREDO23	88a58f6aff	refactor(chat): add streaming/contract/ for file-write contract enforcement Extracts the desktop_local_folder file-operation contract helpers: * contract_enforcement_active: gates the contract on filesystem mode. * evaluate_file_contract_outcome: scores tool outputs as success/no-op. * log_file_contract: structured logging of contract verdicts. This is the unit responsible for catching agents that claim to have written/edited a file without actually invoking the filesystem tool. Add-only; stream_new_chat.py keeps its inline duplicates until cutover.	2026-05-25 21:48:14 +02:00
CREDO23	c13beae1ce	refactor(chat): add streaming/context/ for mentioned-docs and deep-agents todos Extracts two pure context helpers used during input-state assembly: * mentioned_docs.format_mentioned_surfsense_docs_as_context: renders the user's @-mentioned SurfSense docs into the LLM context block. * deepagents_todos.extract_todos_from_deepagents: pulls the in-progress todo list from a deep-agents state snapshot for the title generator. Add-only; existing call sites in stream_new_chat.py remain untouched until cutover.	2026-05-25 21:48:08 +02:00
CREDO23	4910263c93	refactor(chat): add streaming/shared/ package for StreamResult and utils Foundation layer for the parallel refactor of stream_new_chat.py. Extracts the StreamResult dataclass (tracks per-turn streaming state) and a small set of shared utilities (resume_step_prefix, safe_float). Add-only; no existing code imports from this package yet. Existing stream_new_chat.py keeps its inline equivalents until cutover.	2026-05-25 21:48:04 +02:00
Anish Sarkar	dc893281ba	feat(chat): add model retry and stream lifecycle events	2026-05-22 17:48:43 +05:30
Anish Sarkar	5a6b92c2b6	feat(chat): instrument streamed chat request telemetry	2026-05-22 13:48:19 +05:30
CREDO23	49da7a57df	Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed Resolves: surfsense_backend/app/agents/new_chat/middleware/memory_injection.py - Took both imports: upstream moved MEMORY_HARD_LIMIT/SOFT_LIMIT to app.services.memory; kept our perf-logger import for timing. Pulls in upstream changes: - Memory document feature (services/memory refactor, removal of app.agents.new_chat.memory_extraction and background extraction in stream_new_chat — agent now drives memory via update_memory tool). - BACKEND_URL env refactor across web tool-ui/editor/chat/dashboard/lib. - GitHub Actions backend test workflow + pre-commit biome bump. - Token-display polish in MessageInfoDropdown; save_memory no-update sentinel. Verified: 1723 unit tests pass, ruff clean. No semantic regression in stream_new_chat (their memory-extraction deletion and our preflight removal touch different functions).	2026-05-20 21:23:48 +02:00
CREDO23	d5ee8cc4cd	Merge remote-tracking branch 'upstream/dev' into improvement-agent-speed	2026-05-20 19:22:49 +02:00
CREDO23	c3db25302b	perf(chat): kill auto-pin preflight + speculative build, rely on reactive 429 recovery The preflight pattern probed the LLM with a 1-token ping before each cold turn (when requested_llm_config_id==0, llm_config_id<0, and the 45s healthy TTL had expired) to detect 429s before fanning out into planner/classifier/title-gen. To absorb its ~1-5s RTT cost we built the agent speculatively in parallel; on 429 we discarded the build and repinned. Three problems with that design: 1. False security. Provider rate limits are token-bucket. A 1-token ping consumes ~5 tokens; the real request consumes 10-50K. The probe can return 200 while the real call still 429s. 2. Pure overhead in the common case. On warm-agent-cache turns the probe dominates wall time: ~2.5s of TTFT pure tax for ~99% of users who never see a 429. 3. The in-stream recovery loop (catch of _is_provider_rate_limited gated by not _first_event_logged) already does the right thing reactively: mark_runtime_cooldown -> resolve_or_get_pinned_llm_config_id with exclude_config_ids={previous} -> rebuild agent -> retry the stream. Preflight was never the only safety net; it was a redundant probe in front of one. Changes: - Delete _preflight_llm, _settle_speculative_agent_build, and the _PREFLIGHT_TIMEOUT_SEC / _PREFLIGHT_MAX_TOKENS constants. - Drop the parallel agent_build_task / preflight_task plumbing in both stream_new_chat and stream_resume_chat; build the agent inline with await _build_main_agent_for_thread(...). - Drop the unused is_recently_healthy / mark_healthy imports here (still exported from auto_model_pin_service since OpenRouter catalogue refresh and a few tests reference clear_healthy). - Remove the obsolete preflight + settle-speculative tests from test_stream_new_chat_contract.py. Net: -447 LOC. ~2.5s removed from TTFT on every cold preflight-eligible turn. 429 recovery path is unchanged - same repin/rebuild/retry, just not paid in advance on the healthy path.	2026-05-20 11:03:08 +02:00
Anish Sarkar	132e7b3c44	refactor: remove memory extraction functions and related components from the new chat agent	2026-05-20 14:03:28 +05:30
Anish Sarkar	87caa4b6d0	Merge remote-tracking branch 'upstream/dev' into feat/ui-revamp	2026-05-18 09:39:35 +05:30
Anish Sarkar	af1d2fa430	Merge remote-tracking branch 'upstream/dev' into fix/zero-cache-stale-replica-1355	2026-05-16 19:30:09 +05:30
Anish Sarkar	f65bc81509	Merge remote-tracking branch 'upstream/dev' into feat/ui-revamp	2026-05-16 19:26:36 +05:30
Anish Sarkar	01d7379914	refactor: add public URL handling for SurfSense documents across various components and schemas	2026-05-15 02:05:11 +05:30
CREDO23	f2495092da	chat/stream_resume: salt thinking-step prefix with turn_id to avoid duplicate React keys	2026-05-13 21:15:51 +02:00
CREDO23	0fd87ccb7f	chat/stream_resume: key Command(resume=...) by Interrupt.id for parallel HITL	2026-05-13 20:59:57 +02:00
CREDO23	c06dd6e8ba	chat/stream_new_chat: emit one SSE frame per pending interrupt	2026-05-13 20:59:48 +02:00
CREDO23	03cf1466d3	chat/stream_resume: route a flat decisions list per paused subagent	2026-05-13 19:58:13 +02:00
Anish Sarkar	8ea042e88c	refactor(chat): improve user query handling and mention chip functionality	2026-05-12 20:57:15 +05:30
$DESKTOP-RTLN3BA\$punk$ DESKTOP-RTLN3BA\$punk	c8374e6c5b	feat: improved document, folder mentions rendering Some checks are pending Build and Push Docker Images / tag_release (push) Waiting to run Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions	2026-05-09 22:15:51 -07:00
CREDO23	1761b60c16	Carry thinkingStepId on tool output and extend builder and parity tests.	2026-05-08 23:17:12 +02:00
CREDO23	32092c0b65	Pass thinkingStepId through tool-input start and available metadata.	2026-05-08 23:17:05 +02:00
CREDO23	a309e830d3	Document thinkingStepId on tool-call parts and first-key metadata merge.	2026-05-08 23:17:01 +02:00
CREDO23	d136fcd054	Add tool_activity_metadata to merge spanId and thinkingStepId for tools.	2026-05-08 23:16:44 +02:00
CREDO23	3dbcac4b9d	Merge span metadata into persisted tool-call and thinking parts.	2026-05-08 22:48:07 +02:00
CREDO23	f1d80ffe5d	Forward span metadata from report_progress thinking updates.	2026-05-08 22:47:50 +02:00
CREDO23	1dcb08e925	Attach active span metadata to thinking-step SSE and completion.	2026-05-08 22:47:46 +02:00
CREDO23	3ed09bdd90	Clear spans after task completion and pass span id on tool output.	2026-05-08 22:47:38 +02:00
CREDO23	2c1b219c6c	Open task spans at tool start and tag unmatched tool-input SSE.	2026-05-08 22:47:32 +02:00
CREDO23	695f9ded2c	Mint pending span id when the task tool registers from chunks.	2026-05-08 22:47:08 +02:00
CREDO23	f944cdacb7	Add helpers to open and close task delegation span ids.	2026-05-08 22:47:03 +02:00
CREDO23	f0f87107f2	Track active task span id on the agent event relay state.	2026-05-08 22:46:58 +02:00
CREDO23	78f4747382	refactor(chat): stream agent events via stream_output and remove parity v2 flag	2026-05-07 19:40:10 +02:00
CREDO23	7e07092f67	refactor(chat): drop alternate streaming entry path; use graph_stream	2026-05-07 19:25:20 +02:00
CREDO23	52895e37e9	build streaming contexts for chat resume and regenerate paths	2026-05-07 17:57:27 +02:00
CREDO23	a04b2e88bd	wire orchestrator streaming context path and align event relay outputs	2026-05-07 17:06:17 +02:00
CREDO23	0f40279d95	Expand orchestration gate coverage to resume and regenerate flows.	2026-05-07 16:18:29 +02:00
CREDO23	52593d88db	Reorganize streaming orchestration modules into relay and orchestration folders.	2026-05-07 16:00:15 +02:00
CREDO23	f8754a9dab	Rename streaming runtime modules for clearer SRP boundaries.	2026-05-07 15:41:33 +02:00
CREDO23	4e664652a8	Add streaming runtime helpers with behavior-focused unit tests.	2026-05-07 15:13:22 +02:00

1 2 3 4 5

221 commits