mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-06-09 19:45:15 +02:00
extract_nodes_from_markdown now matches `**Title**` lines as level-1
headings (alongside ATX `#` headings) and attaches the heading level
on the producer side. extract_node_text_content reads the level from
the node instead of re-running a `^#{1,6}` regex on the source line,
which was silently dropping bold-heading nodes from OCR / MinerU output.
Bold maps to level 1 even when mixed with `#` / `##` / `###` — bold-as-
heading is a courtesy heuristic for non-ATX markdown sources, and
CommonMark has no concept of bold heading depth.
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| client.py | ||
| config.yaml | ||
| page_index.py | ||
| page_index_md.py | ||
| retrieve.py | ||
| utils.py | ||