refactor(tests): Update tests to remove summary references and adjust for embedding errors

2026-07-22 23:31:12 +02:00 · 2026-06-04 01:51:21 +05:30 · 2026-06-04 01:51:21 +05:30 · e588782a9b
commit e588782a9b
parent e4d7b01b09
17 changed files with 69 additions and 148 deletions
--- a/surfsense_evals/README.md
+++ b/surfsense_evals/README.md
@ -137,15 +137,14 @@ Notes:
 - `--skip-unanswerable` (run) — drop unanswerable questions
 - `--docs <a.pdf>,<b.pdf>` (run) — scope to specific docs

-## Ingestion knobs (vision LLM, processing mode, summarize)
+## Ingestion knobs (vision LLM, processing mode)

-The harness exposes `POST /api/v1/documents/fileupload`'s three knobs on every `ingest` subcommand:
+The harness exposes `POST /api/v1/documents/fileupload`'s ingest knobs on every `ingest` subcommand:

 | Flag pair                                  | Effect                                                                                  |
 |--------------------------------------------|-----------------------------------------------------------------------------------------|
 | `--use-vision-llm` / `--no-vision-llm`     | Walk every embedded image in the PDF and inline image-derived text at the image's position (see below). |
 | `--processing-mode {basic,premium}`        | `premium` carries a 10× page multiplier and routes to a stronger ETL (e.g. LlamaCloud). |
-| `--should-summarize` / `--no-summarize`    | Generate a per-document summary at ingest.                                              |

 The "Default ingest" column in the benchmarks table is what runs if you don't pass any flag. Whatever was actually used is recorded as a `__settings__` header in the doc map (`data/<suite>/maps/<benchmark>_*_map.jsonl`) and as `extra.ingest_settings` in `run_artifact.json`, then surfaced in the report — no need to hunt through CLI history.