PageIndex/pageindex
Shreyansh Dubey f413c66fee
fix: prevent KeyError crash and context exhaustion in TOC processing (#188)
* fix: prevent KeyError crash and context exhaustion in TOC processing

- Use .get() with safe defaults for all LLM response dict accesses
- Optimize extract_toc_content retry loop to grow chat_history
  incrementally instead of rebuilding with full accumulated response
- Optimize toc_transformer retry loop to use chat_history instead of
  re-embedding the entire raw TOC and incomplete JSON in each prompt
- Return best-effort results on max retries instead of raising
- Add 14 mock-based tests covering all fix scenarios

Closes #163

* fix: address review feedback on retry behavior and None guard

- Restore explicit Exception on max retries instead of silent warning
- Move truncation logic before the retry loop so it only runs once
  on the initial incomplete response, not on every iteration
- Add explicit None guard for physical_index before passing to
  convert_physical_index_to_int to prevent potential TypeError
- Update test to expect Exception on max retries

---------

Co-authored-by: Your Name <you@example.com>
2026-07-03 20:20:31 +08:00
..
__init__.py Add PageIndexClient with agent-based retrieval via OpenAI Agents SDK (#125) 2026-03-26 23:19:50 +08:00
client.py Disable agent tracing and auto-add litellm/ prefix for retrieve_model 2026-03-29 00:55:57 +08:00
config.yaml Disable agent tracing and auto-add litellm/ prefix for retrieve_model 2026-03-29 00:55:57 +08:00
page_index.py fix: prevent KeyError crash and context exhaustion in TOC processing (#188) 2026-07-03 20:20:31 +08:00
page_index_md.py Restructure examples directory and improve document storage (#189) 2026-03-28 04:28:59 +08:00
retrieve.py Restructure examples directory and improve document storage (#189) 2026-03-28 04:28:59 +08:00
utils.py Adds missing re import (#281) 2026-07-03 16:14:58 +08:00