fix: parallelization effort, progressbar w/ bit-identitical file write compared to sequential path

This commit is contained in:
Apunkt 2026-05-21 09:14:20 +02:00
parent 35e49044f1
commit 8ca5371a98
No known key found for this signature in database
9 changed files with 407 additions and 387 deletions

View file

@ -1,10 +1,10 @@
# LibRay — Agent Quick Reference
# Important
Make sure to use the virtual environment in .venv and not global pip.
Make sure to use the virtual environment in `~/.venv/libray` (activate with `source ~/.venv/libray/bin/activate`) and not global pip.
## Repo
- Python 3 CLI tool for decrypting/encrypting/examining PS3 Blu-Ray ISOs
- Entry point: `libray/libray` (also `libray/libray.py`, identical copy)
- Entry point: `libray/libray.py` (defines `main()`); installed console script is `libray=libray.libray:main` (see `setup.py`). `libray/libray` is a symlink to `libray.py` for running from a source checkout.
- Package: `libray/` — modules: `core.py` (main logic), `iso.py` (ISO parsing), `ird.py` (IRD parsing), `sfo.py` (PARAM.SFO)
- Tests: `tests/``test_iso.py`, `test_interface.py` (interface test is currently skipped/broken)
- Tools: `tools/keys2db.py` (builds `libray/data/keys.db` from redump keys), `tools/rpcs3.py` (fetches compat data)
@ -16,12 +16,14 @@ Make sure to use the virtual environment in .venv and not global pip.
- Publish: `twine upload dist/*`
## Parallelization
- Decrypt and re-encrypt support multi-threading via `-p`/`--threads` CLI argument
- Decrypt and re-encrypt support multiprocessing via `-p`/`--threads` CLI argument
- Default: auto-detects CPU core count via `os.cpu_count()`
- Each sector is independently decrypted (per-sector IV in AES-CBC), making it embarrassingly parallel
- Uses `concurrent.futures.ThreadPoolExecutor` (threads, not processes, since pycryptodome releases the GIL)
- Uses `concurrent.futures.ProcessPoolExecutor` — true multi-core parallelism by spawning OS processes
- Each process gets its own Python interpreter (own GIL), so pycryptodome's partial GIL release is irrelevant
- Unencrypted regions are always copied sequentially (no crypto needed)
- Sector data is read upfront into memory, then processed in parallel, then written in order
- Module-level `_process_sector_chunk_mp()` function is picklable for use with ProcessPoolExecutor; it processes a contiguous run of sectors per task to amortise IPC cost
- A bounded window of chunks is kept in flight (memory stays bounded for large ISOs); results are written back to their absolute offsets in the output file as they complete
## Gotchas
- **Crypto package conflict**: `pycrypto`/`crypto` will break `pycryptodome`. If `ImportError: No module named Crypto.Cipher`, run:
@ -29,7 +31,7 @@ Make sure to use the virtual environment in .venv and not global pip.
pip uninstall crypto pycrypto && pip install pycryptodome
```
- **keys.db is generated**, not committed. Build it with `python3 tools/keys2db.py` (requires keys in `tools/keys/`). It's listed in `.gitignore` via `libray/data/*.db`.
- **`libray/__init__.py`** dynamically imports all submodules via `pkgutil.walk_packages` — don't expect explicit imports.
- **`libray/__init__.py`** dynamically imports submodules via `pkgutil.walk_packages` + `importlib.import_module` (skips the `libray` entry-point module to avoid shadowing the package) — don't expect explicit imports.
- **`test_interface.py`** is skipped (`@unittest.skip('currently broken')`) — the interface test won't run.
- `.editorconfig` enforces 4-space indent for `.py`, 2-space for `.yml`/`.yaml`.
- No linting/typechecking config exists — plain `unittest`, no pytest, no pre-commit.