inject_check_kernel now respects MultiGpuInjectMode from SolverConfig instead of
hardcoding OneIsland. HalfIslands uses LCG-based random island selection.
Also fixes stale write_async calls in test_multi_gpu_b3.cu.
Verified on 2×V100S: all 5 B3 tests pass, e5 (12 problem types) all optimal.
Safety fixes (4 critical, 4 warning) from code review:
- qap.cuh: fix clone_to_device cross-device D2H by retaining host matrices
- types.cuh: add CUDA_CHECK to InjectBuffer, track owner_gpu for safe destroy
- types.cuh: add bounds check on lexicographic priority index
- solver.cuh: cap migrate_kernel islands to MAX_ISLANDS=64 to prevent stack overflow
- multi_gpu_solver.cuh: guard against 0 GPUs, propagate stop_reason from best GPU
- types.cuh: warn on SeqRegistry overflow
- solver.cuh: warn when constraint_directed/phased_search disabled without AOS
Translate all Chinese comments to English across 25+ source files
(core/*.cuh, problems/*.cuh, Makefile, multi-GPU tests).
Verified on V100S×2 (sm_70, CUDA 12.8): e5 (12 problem types, all optimal),
e13 (multi-objective + multi-GPU, 9 configs, all passed).