Commit graph

5674 commits

Author SHA1 Message Date
DESKTOP-RTLN3BA\$punk
9bcd50164d feat(evals): publish multimodal_doc parser_compare benchmark + n=171 report
Adds the full parser_compare experiment for the multimodal_doc suite:
six arms compared on 30 PDFs / 171 questions from MMLongBench-Doc with
anthropic/claude-sonnet-4.5 across the board.

Source code:
- core/parsers/{azure_di,llamacloud,pdf_pages}.py: direct parser SDK
  callers (Azure Document Intelligence prebuilt-read/layout, LlamaParse
  parse_page_with_llm/parse_page_with_agent) used by the LC arms,
  bypassing the SurfSense backend so each (basic/premium) extraction
  is a clean A/B independent of backend ETL routing.
- suites/multimodal_doc/parser_compare/{ingest,runner,prompt}.py:
  six-arm benchmark (native_pdf, azure_basic_lc, azure_premium_lc,
  llamacloud_basic_lc, llamacloud_premium_lc, surfsense_agentic) with
  byte-identical prompts per question, deterministic grader, Wilson
  CIs, and the per-page preprocessing tariff cost overlay.

Reproducibility:
- pyproject.toml + uv.lock pin pypdf, azure-ai-documentintelligence,
  llama-cloud-services as new deps.
- .env.example documents the AZURE_DI_* and LLAMA_CLOUD_API_KEY env
  vars now required for parser_compare.
- 12 analysis scripts under scripts/: retry pass with exponential
  backoff, post-retry accuracy merge, McNemar / latency / per-PDF
  stats, context-overflow hypothesis test, etc. Each produces one
  number cited by the blog report.

Citation surface:
- reports/blog/multimodal_doc_parser_compare_n171_report.md: 1219-line
  technical writeup (16 sections) covering headline accuracy, per-format
  accuracy, McNemar pairwise significance, latency / token / per-PDF
  distributions, error analysis, retry experiment, post-retry final
  accuracy, cost amortization model with closed-form derivation, threats
  to validity, and reproducibility appendix.
- data/multimodal_doc/runs/2026-05-14T00-53-19Z/parser_compare/{raw,
  raw_retries,raw_post_retry}.jsonl + run_artifact.json + retry summary
  whitelisted via data/.gitignore as the verifiable numbers source.

Gitignore:
- ignore logs_*.txt + retry_run.log; structured artifacts cover the
  citation surface, debug logs are noise.
- data/.gitignore default-ignores everything, whitelists the n=171 run
  artifacts only (parser manifest left ignored to avoid leaking local
  Windows usernames in absolute paths; manifest is fully regenerable
  via 'ingest multimodal_doc parser_compare').
- reports/.gitignore now whitelists hand-curated reports/blog/.

Also retires the abandoned CRAG Task 3 implementation (download script,
streaming Task 3 ingest, CragTask3Benchmark + tests) and trims the
runner / ingest module APIs to match.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 19:54:41 -07:00
DESKTOP-RTLN3BA\$punk
3737118050 chore: evals 2026-05-13 14:02:26 -07:00
DESKTOP-RTLN3BA\$punk
2402b730fa chore: untrack accidentally embedded hermes-agent repo
Some checks failed
Build and Push Docker Images / tag_release (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Has been cancelled
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Has been cancelled
It was committed as a gitlink (mode 160000) in 81583ef3 despite being
listed in .gitignore, because ignore rules don't apply to already-tracked
paths. Remove it from the index and add a slash-less pattern as a guard
against the gitlink form being re-added.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 12:50:13 -07:00
DESKTOP-RTLN3BA\$punk
ec957e6fae Merge commit 'd6618b8357' into dev 2026-05-11 12:35:04 -07:00
Rohan Verma
d6618b8357
Merge pull request #1384 from MODSetter/chore/hide-blog-nav
Some checks failed
Build and Push Docker Images / tag_release (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Has been cancelled
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Has been cancelled
chore: hide blog from navbar until published
2026-05-11 11:20:45 -07:00
DESKTOP-RTLN3BA\$punk
b7e31f2974 chore: update .gitignore 2026-05-11 11:12:06 -07:00
DESKTOP-RTLN3BA\$punk
81583ef382 chore: hide blog until published 2026-05-11 11:08:42 -07:00
Rohan Verma
cb46da3525
Merge pull request #1381 from xclear-cast/codex/centralize-redirect-path
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions
fix(auth): centralize redirect path storage
2026-05-10 16:47:54 -07:00
Rohan Verma
a51755c512
Merge pull request #1380 from xclear-cast/codex/drop-tokenhandler-storagekey
fix(auth): remove redundant token storage write
2026-05-10 16:46:58 -07:00
너이름
fb0c13911d fix(auth): centralize redirect path storage 2026-05-11 06:30:26 +09:00
너이름
935cd7b7c9 fix(auth): remove redundant token storage write 2026-05-11 06:25:40 +09:00
DESKTOP-RTLN3BA\$punk
c8374e6c5b feat: improved document, folder mentions rendering
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions
2026-05-09 22:15:51 -07:00
Rohan Verma
28a02a9143
Merge pull request #1357 from CREDO23/feature/multi-agent
[Feature] Multi-agent chat: hierarchical timeline, live subagent streaming, and inline HITL approvals
2026-05-09 16:13:04 -07:00
Rohan Verma
316a90079c
Merge pull request #1356 from mvanhorn/osc/913-aria-label-clickable-cards
feat(a11y): add aria-label to clickable media cards
2026-05-09 16:11:54 -07:00
Rohan Verma
350ab2f60c
Merge pull request #1353 from AnishSarkar22/feat/e2e-testing
feat: add E2E test suite
2026-05-09 16:11:00 -07:00
Rohan Verma
fa31da9937
Merge branch 'dev' into feat/e2e-testing 2026-05-09 16:10:45 -07:00
Anish Sarkar
0487703106 chore: remove E2E tests workflow configuration 2026-05-10 04:29:16 +05:30
Anish Sarkar
822ffb2429 chore: remove Vue and Svelte examples from component testing documentation 2026-05-10 04:24:44 +05:30
Anish Sarkar
d52225c18d chore: add playwright cursor skill 2026-05-10 04:19:55 +05:30
Anish Sarkar
25aad38ca4 chore: update gitignore 2026-05-10 03:49:59 +05:30
CREDO23
2ab6b1c757 Merge upstream/dev into feature/multi-agent. 2026-05-09 23:00:56 +02:00
CREDO23
5e7d41f3e8 chat-messages: drop feature module architecture doc. 2026-05-09 23:00:18 +02:00
CREDO23
932bf22a34 chat: fix mixed-decision HITL crash and fold resumed assistant messages into the interrupted bubble. 2026-05-09 22:54:07 +02:00
CREDO23
2e132513be chat: unify HITL approval UX behind a single paginated card and harden timeline supersede. 2026-05-09 21:44:54 +02:00
CREDO23
89e4953800 chat: suppress stale step separator emitted during resume rehydration. 2026-05-09 18:36:00 +02:00
CREDO23
ba0e1e70a0 chat: drop legacy thinking-steps, tool-fallback, hitl modules, and span-indent helper. 2026-05-09 18:35:52 +02:00
CREDO23
9c5a178468 chat: switch dashboard chat page to slice and drop superseded aborted rows on resume. 2026-05-09 18:35:39 +02:00
CREDO23
d96f966c8f chat: switch consumer chat shells to slice TimelineDataUI and HITL exports. 2026-05-09 18:32:12 +02:00
CREDO23
aafeee0516 assistant-message: render only deliverable tools and delegate process tools to slice timeline. 2026-05-09 18:32:03 +02:00
CREDO23
a32d089199 tool-ui: route HITL imports through chat-messages slice. 2026-05-09 18:31:52 +02:00
CREDO23
97a7626179 chat-messages: add timeline tool registry with HITL-aware fallback. 2026-05-09 18:31:45 +02:00
CREDO23
48c4df822a chat-messages: add timeline module with builder, grouping, items, and rendering. 2026-05-09 18:31:33 +02:00
CREDO23
9e451a5907 chat-messages: add hitl module with types, hooks, bundle, approval cards, and edit panel. 2026-05-09 18:31:23 +02:00
CREDO23
d9ad9ca5cb chat-messages: refresh feature module architecture doc. 2026-05-09 18:31:16 +02:00
Matt Van Horn
790a6f8c37
feat(a11y): add aria-label to clickable media cards
The hero carousel video card and use-cases grid image card already had
role="button", tabIndex={0}, and onKeyDown handlers. Adds the missing
aria-label so screen readers announce what each clickable card does.

Both cards now use aria-label={`Expand ${title}`}, matching the example
in the issue.

Fixes #913
2026-05-09 07:05:57 -07:00
CREDO23
5c1f5edd75 Add chat-messages feature module architecture doc. 2026-05-09 14:39:44 +02:00
CREDO23
a8417e3c45 Render HITL approval cards inline in the thinking-steps timeline. 2026-05-09 14:37:06 +02:00
DESKTOP-RTLN3BA\$punk
de87a55a1f Merge commit '83ee58016e' into dev 2026-05-09 04:58:57 -07:00
DESKTOP-RTLN3BA\$punk
c603b46ea4 feat: added architecture improvement skill 2026-05-09 04:31:53 -07:00
Anish Sarkar
de6fc80dbd chore: ran linting 2026-05-09 05:28:09 +05:30
Anish Sarkar
f7bac59a4b test(integration): enhance Drive indexer credential resolution tests for Composio and native connectors 2026-05-09 05:26:36 +05:30
Anish Sarkar
dbf575fbd0 chore: ran linting 2026-05-09 05:16:20 +05:30
Anish Sarkar
2f540ee065 refactor(tests): simplify logging messages and enhance manual upload journey tests 2026-05-09 05:10:05 +05:30
Anish Sarkar
66eebf614f test(e2e): cover connector PDF docling indexing in journeys 2026-05-09 05:04:00 +05:30
Anish Sarkar
fbfde74cdc test(e2e): route connector PDF canary responses in chat fake & add connector PDF canaries 2026-05-09 05:03:38 +05:30
Anish Sarkar
5a2357b981 test(e2e): serve binary PDF bytes from storage connector fakes 2026-05-09 05:03:07 +05:30
Anish Sarkar
ad226853e5 test(e2e): wire PDF metadata into connector fixture JSONs 2026-05-09 05:02:40 +05:30
Anish Sarkar
fc32ab0cf3 test(e2e): add shared binary fixture loader for connector fakes 2026-05-09 05:02:23 +05:30
Anish Sarkar
523563b948 test(e2e): add canary PDFs and reproducer for connector docling coverage 2026-05-09 05:02:04 +05:30
Anish Sarkar
03ce8c1b81 test(e2e): cover manual file upload journey 2026-05-09 04:41:07 +05:30