doc-to-lora/scripts/main_exp/README.md
2026-02-27 03:47:04 +00:00

898 B

D2L pipeline

Data

You can either download the generated data (recommended, ~100 GB for each model) or generate them by youself. Please see 0-download_data.sh for how to do model-specific data download.

# download training data for all three models (328GB)
uv run bash scripts/main_exp/0-download_data.sh

Generating data from scratch can take very long if not parallelized across multiple gpus.

# generate training data (takes very long if not parallelized across multiple gpus)
# optional: use the command below for generating data from scratch
# uv run bash scripts/main_exp/gen_data.sh

Training

Simply run the training script once the data is ready.

# train
uv run bash scripts/main_exp/1-train.sh

Evaluation

All evaluation scripts for reproducing the main results in the paper are included in eval directory.