diff --git a/README.md b/README.md index 8094c3d..c4994d5 100644 --- a/README.md +++ b/README.md @@ -74,11 +74,11 @@ Should run without errors. If using option 2 (existing build), ensure it was compiled with shared libraries: ```bash -cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON # or -DGGML_HIP=ON / -DGGML_VULKAN=1 +cmake -B build -DBUILD_SHARED_LIBS=ON # Add your custom build options cmake --build build --config Release -j$(nproc) ``` -The build must contain `libllama.so` (typically at `build/libllama.so`). +The build must contain `libllama.so` (typically at `build/bin/libllama.so`). ## Scripts @@ -128,7 +128,7 @@ Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA adapter to `./mo | `LEARNING_RATE` | Training learning rate | `2e-4` | | `MAX_LENGTH` | Maximum sequence length | `4096` | | `TRAIN_EPOCHS` | Number of training epochs | `1` | -| `model_name` (line 74) | Base model to fine-tune | `"unsloth/Llama-3.2-3B-Instruct"` | +| `model_name` (line 74) | Base model to fine-tune | `"Qwen/Qwen3.5-2B""` | ```bash bash scripts/finetune.sh @@ -190,10 +190,11 @@ Common issues: ### Out of memory during training -- Reduce `BATCH_SIZE` in `finetune.py` -- Increase `GRADIENT_ACCUMULATION_STEPS` to compensate +- Reduce `BATCH_SIZE` in `finetune.py` (lower = less VRAM usage) +- Increase `GRADIENT_ACCUMULATION_STEPS` to compensate (higher = longer finetuning time) +- `EFFECTIVE_BATCH_SIZE` = `BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS` = 16+ - Reduce `MAX_LENGTH` to fit shorter sequences -- Set `load_in_4bit=True` in `finetune.py` (line 77) +- Set `load_in_4bit=True` in `finetune.py` (line 77) for QLoRA ### llama-cpp-python install fails