Commit graph

5709 commits

Author SHA1 Message Date
Rohan Verma
4ab9544a66
Merge pull request #1382 from mvanhorn/osc/1372-use-canonical-log-types
refactor(use-logs): use canonical log types from contracts/types/log.types
2026-05-15 04:49:21 -07:00
Rohan Verma
4db3cf7fd5
Merge pull request #1377 from AnishSarkar22/feat/e2e-testing-ci
feat: add E2E CI and harden Docker build migrations
2026-05-15 04:47:26 -07:00
DESKTOP-RTLN3BA\$punk
e8aad48ddf refactor(report): enhance citations and clarify implementation details
Updated the multimodal_doc_parser_compare_n171_report.md to include detailed code citations for preprocessing costs and retry logic. Improved clarity on the implementation of the retry mechanism and its impact on failure rates. Added a new section for a code citations index to ensure reproducibility of technical claims.

This enhances the report's transparency and allows readers to trace the source of each claim back to the codebase.
2026-05-14 20:07:14 -07:00
DESKTOP-RTLN3BA\$punk
9bcd50164d feat(evals): publish multimodal_doc parser_compare benchmark + n=171 report
Adds the full parser_compare experiment for the multimodal_doc suite:
six arms compared on 30 PDFs / 171 questions from MMLongBench-Doc with
anthropic/claude-sonnet-4.5 across the board.

Source code:
- core/parsers/{azure_di,llamacloud,pdf_pages}.py: direct parser SDK
  callers (Azure Document Intelligence prebuilt-read/layout, LlamaParse
  parse_page_with_llm/parse_page_with_agent) used by the LC arms,
  bypassing the SurfSense backend so each (basic/premium) extraction
  is a clean A/B independent of backend ETL routing.
- suites/multimodal_doc/parser_compare/{ingest,runner,prompt}.py:
  six-arm benchmark (native_pdf, azure_basic_lc, azure_premium_lc,
  llamacloud_basic_lc, llamacloud_premium_lc, surfsense_agentic) with
  byte-identical prompts per question, deterministic grader, Wilson
  CIs, and the per-page preprocessing tariff cost overlay.

Reproducibility:
- pyproject.toml + uv.lock pin pypdf, azure-ai-documentintelligence,
  llama-cloud-services as new deps.
- .env.example documents the AZURE_DI_* and LLAMA_CLOUD_API_KEY env
  vars now required for parser_compare.
- 12 analysis scripts under scripts/: retry pass with exponential
  backoff, post-retry accuracy merge, McNemar / latency / per-PDF
  stats, context-overflow hypothesis test, etc. Each produces one
  number cited by the blog report.

Citation surface:
- reports/blog/multimodal_doc_parser_compare_n171_report.md: 1219-line
  technical writeup (16 sections) covering headline accuracy, per-format
  accuracy, McNemar pairwise significance, latency / token / per-PDF
  distributions, error analysis, retry experiment, post-retry final
  accuracy, cost amortization model with closed-form derivation, threats
  to validity, and reproducibility appendix.
- data/multimodal_doc/runs/2026-05-14T00-53-19Z/parser_compare/{raw,
  raw_retries,raw_post_retry}.jsonl + run_artifact.json + retry summary
  whitelisted via data/.gitignore as the verifiable numbers source.

Gitignore:
- ignore logs_*.txt + retry_run.log; structured artifacts cover the
  citation surface, debug logs are noise.
- data/.gitignore default-ignores everything, whitelists the n=171 run
  artifacts only (parser manifest left ignored to avoid leaking local
  Windows usernames in absolute paths; manifest is fully regenerable
  via 'ingest multimodal_doc parser_compare').
- reports/.gitignore now whitelists hand-curated reports/blog/.

Also retires the abandoned CRAG Task 3 implementation (download script,
streaming Task 3 ingest, CragTask3Benchmark + tests) and trims the
runner / ingest module APIs to match.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 19:54:41 -07:00
DESKTOP-RTLN3BA\$punk
3737118050 chore: evals 2026-05-13 14:02:26 -07:00
Anish Sarkar
883c72396c chore: add minimumReleaseAge configuration to pnpm workspace for dependency management 2026-05-13 03:38:04 +05:30
Anish Sarkar
6eb900cb0f chore: update packageManager version to pnpm@10.26.0 in both desktop and web projects 2026-05-12 23:59:58 +05:30
Anish Sarkar
275e2c9e83 chore: fix linting 2026-05-12 04:00:04 +05:30
Anish Sarkar
4dbadbf159 chore: update .gitignore and biome.json to include additional test-related directories and files for improved E2E testing 2026-05-12 03:59:52 +05:30
Anish Sarkar
bed2041a1b chore: modify E2E test configuration by updating global LLM model IDs to negative values for improved test isolation 2026-05-12 03:30:01 +05:30
Anish Sarkar
0b9fc00663 chore: update global LLM config fixture to include both premium and free models for comprehensive E2E testing 2026-05-12 03:00:35 +05:30
Anish Sarkar
650b691a39 chore: enhance E2E tests by adding synthetic global LLM config and updating environment variables for Google OAuth 2026-05-12 02:37:39 +05:30
Anish Sarkar
315329f344 chore: update E2E tests workflow to capture logs on cancellation and add shared volume for backend services 2026-05-12 01:35:33 +05:30
DESKTOP-RTLN3BA\$punk
2402b730fa chore: untrack accidentally embedded hermes-agent repo
Some checks failed
Build and Push Docker Images / tag_release (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Has been cancelled
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Has been cancelled
It was committed as a gitlink (mode 160000) in 81583ef3 despite being
listed in .gitignore, because ignore rules don't apply to already-tracked
paths. Remove it from the index and add a slash-less pattern as a guard
against the gitlink form being re-added.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 12:50:13 -07:00
DESKTOP-RTLN3BA\$punk
ec957e6fae Merge commit 'd6618b8357' into dev 2026-05-11 12:35:04 -07:00
Anish Sarkar
c052fc9304 chore: add fake DoclingService for E2E tests and integrate into runtime fakes 2026-05-12 00:30:16 +05:30
Rohan Verma
d6618b8357
Merge pull request #1384 from MODSetter/chore/hide-blog-nav
Some checks failed
Build and Push Docker Images / tag_release (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Has been cancelled
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Has been cancelled
chore: hide blog from navbar until published
2026-05-11 11:20:45 -07:00
DESKTOP-RTLN3BA\$punk
b7e31f2974 chore: update .gitignore 2026-05-11 11:12:06 -07:00
DESKTOP-RTLN3BA\$punk
81583ef382 chore: hide blog until published 2026-05-11 11:08:42 -07:00
Anish Sarkar
b247ff37df chore: implement test-only token mint endpoint and update E2E test authentication flow 2026-05-11 19:48:18 +05:30
Anish Sarkar
741d6e7eea chore: update pnpm workspace configuration to allow builds for specified dependencies 2026-05-11 17:02:06 +05:30
Anish Sarkar
6501e32b4f chore: bump pinned pnpm version to 10.26.0 2026-05-11 16:27:35 +05:30
Anish Sarkar
83e40c5aea chore: update Docker configuration to include pnpm workspace and refine dependency management 2026-05-11 15:31:24 +05:30
Anish Sarkar
3b345e7091 chore: add pnpm configuration for only built dependencies in package.json 2026-05-11 13:41:38 +05:30
Anish Sarkar
99e667f3f9 chore: refine E2E tests workflow by removing pnpm version specification and updating Docker Compose for backend build reference 2026-05-11 13:01:20 +05:30
Matt Van Horn
b92cc963ce
refactor(use-logs): use canonical log types from contracts/types/log.types
Removes duplicated LogLevel, LogStatus, Log, LogFilters and LogSummary
definitions from surfsense_web/hooks/use-logs.ts. These shapes already
live as Zod-derived types in contracts/types/log.types.ts, which is the
source of truth used by logs-api.service.ts and log-mutation.atoms.ts.

Adds LogLevel and LogStatus aliases for LogLevelEnum/LogStatusEnum in
log.types.ts so the existing public surface from use-logs is preserved
without per-hook re-exports. The hook re-exports the canonical names so
callers (app/dashboard/[search_space_id]/logs/(manage)/page.tsx) do not
need to change.

Closes #1372
2026-05-11 00:11:05 -07:00
Anish Sarkar
242925d8e5 chore: update Docker configurations to streamline backend build and enhance E2E testing environment 2026-05-11 12:31:15 +05:30
Rohan Verma
cb46da3525
Merge pull request #1381 from xclear-cast/codex/centralize-redirect-path
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions
fix(auth): centralize redirect path storage
2026-05-10 16:47:54 -07:00
Rohan Verma
a51755c512
Merge pull request #1380 from xclear-cast/codex/drop-tokenhandler-storagekey
fix(auth): remove redundant token storage write
2026-05-10 16:46:58 -07:00
Anish Sarkar
efff7ab2a2 chore: enhance Dockerfile and config to support conditional static ffmpeg import 2026-05-11 04:51:19 +05:30
Anish Sarkar
18de0136bc chore: add ffmpeg to Dockerfile for audio processing capabilities 2026-05-11 04:02:39 +05:30
Anish Sarkar
65fecb3337 chore: update Docker Buildx action to version 4 in E2E tests workflow 2026-05-11 03:53:47 +05:30
Anish Sarkar
f091182b94 chore: update GitHub Actions workflows and Dockerfile to use latest action versions and improve build targets 2026-05-11 03:52:22 +05:30
Anish Sarkar
5344fa47e6 chore: update E2E test documentation for clarity and local setup instructions 2026-05-11 03:29:32 +05:30
Anish Sarkar
68f45335bc chore: implement E2E testing setup with Docker Compose and update workflow for backend and Redis services 2026-05-11 03:09:01 +05:30
너이름
fb0c13911d fix(auth): centralize redirect path storage 2026-05-11 06:30:26 +09:00
너이름
935cd7b7c9 fix(auth): remove redundant token storage write 2026-05-11 06:25:40 +09:00
Anish Sarkar
2c8828f60c fix: ensure idempotency in alembic migrations by checking for existing columns and indexes before creation 2026-05-10 22:45:26 +05:30
Anish Sarkar
319923fb40 fix: add checks for existing tables and indexes before creating them in alembic migrations for idempotency 2026-05-10 22:40:29 +05:30
Anish Sarkar
292b4d70ac chore: enhance E2E tests workflow by adding caching for Next.js build and updating test command 2026-05-10 22:21:06 +05:30
Anish Sarkar
548e574f1a chore: refactor E2E tests workflow to start Postgres as a container and add readiness check 2026-05-10 21:47:59 +05:30
Anish Sarkar
288c18bdf7 chore: update E2E tests workflow to include scoped proxy settings for backend and Celery worker 2026-05-10 21:34:07 +05:30
Anish Sarkar
21d3be14c9 chore: update E2E tests workflow name and adjust video recording settings 2026-05-10 21:13:57 +05:30
Anish Sarkar
3520877d80 Merge remote-tracking branch 'upstream/dev' into feat/e2e-testing-ci 2026-05-10 13:10:13 +05:30
Anish Sarkar
cf9e702bee chore: refine E2E tests workflow by updating Redis configuration and adding fake API keys for various services 2026-05-10 13:09:50 +05:30
DESKTOP-RTLN3BA\$punk
c8374e6c5b feat: improved document, folder mentions rendering
Some checks are pending
Build and Push Docker Images / tag_release (push) Waiting to run
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Blocked by required conditions
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Blocked by required conditions
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Blocked by required conditions
2026-05-09 22:15:51 -07:00
Anish Sarkar
dec06e0e18 chore: add E2E tests workflow configuration 2026-05-10 04:50:38 +05:30
Rohan Verma
28a02a9143
Merge pull request #1357 from CREDO23/feature/multi-agent
[Feature] Multi-agent chat: hierarchical timeline, live subagent streaming, and inline HITL approvals
2026-05-09 16:13:04 -07:00
Rohan Verma
316a90079c
Merge pull request #1356 from mvanhorn/osc/913-aria-label-clickable-cards
feat(a11y): add aria-label to clickable media cards
2026-05-09 16:11:54 -07:00
Rohan Verma
350ab2f60c
Merge pull request #1353 from AnishSarkar22/feat/e2e-testing
feat: add E2E test suite
2026-05-09 16:11:00 -07:00