# Unsloth Fine-Tune Template > **Linux only** — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support. A template for fine-tuning LLMs using [Unsloth](https://github.com/unslothai/unsloth) and converting to GGUF format with [llama.cpp](https://github.com/ggerganov/llama.cpp). ## Prerequisites - Linux OS - Python 3.10+ - NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU - [cmake](https://cmake.org/) - [git](https://git-scm.com/) ## Quick Start ```bash # 1. Setup (clones llama.cpp, builds it, installs dependencies) bash setup.sh # 2. Configure scripts (see variables below) # 3. Run full pipeline bash run-pipeline.sh ``` ## Workflow ``` scripts/generate-data.sh → Generate synthetic training data (optional) scripts/finetune.sh → Fine-tune model with LoRA adapters scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF scripts/run-model.sh → Run the converted GGUF model run-pipeline.sh → Run finetune → merge/convert → run in sequence ``` ## Setup `setup.sh` will: 1. Create a Python virtual environment and install Python dependencies 2. Clone [llama.cpp](https://github.com/ggml-org/llama.cpp) (fresh build) or symlink an existing build 3. Build llama.cpp with shared libraries (`-DBUILD_SHARED_LIBS=ON`) 4. Install llama-cpp-python bindings linked against the shared library (`-DLLAMA_BUILD=OFF`) **Using an existing llama.cpp build:** Choose option 2 and provide the absolute path. The build must have been created with `-DBUILD_SHARED_LIBS=ON` and contain `libllama.so`. Setup will create a symlink at `./llama.cpp`. ### Backend Selection Backend is only prompted when building llama.cpp from scratch. Choose based on your GPU: | Choice | Backend | Requirements | |---|---|---| | 1 | CUDA (NVIDIA) | Systemwide CUDA installation (NVIDIA drivers + CUDA toolkit) | | 2 | ROCm (AMD) | Systemwide ROCm installation (AMD drivers + ROCm toolkit) | | 3 | Vulkan | Vulkan drivers + `libvulkan-dev`, `glslc`, `spirv-headers` | | 4 | CPU only | None | **Vulkan dependencies (Ubuntu/Debian):** ```bash sudo apt-get install libvulkan-dev glslc spirv-headers ``` Verify Vulkan is correctly installed: ```bash vulkaninfo ``` Should run without errors. ### Existing llama.cpp Build If using option 2 (existing build), ensure it was compiled with shared libraries: ```bash cmake -B build -DBUILD_SHARED_LIBS=ON # Add your custom build options cmake --build build --config Release -j$(nproc) ``` The build must contain `libllama.so` (typically at `build/bin/libllama.so`). ## Scripts ### 1. scripts/generate-data.sh Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset. **Edit `synthetic-data.py`:** | Variable | Description | Example | |---|---|---| | `GGUF_MODEL_PATH` | Path to the GGUF model used for generation | `./path/to/model.gguf` | | `INPUT_PARQUET_PATH` | Path to existing training data to extend | `./data/train.parquet` | | `OUTPUT_PARQUET_PATH` | Path to save the combined dataset | `./data/output.parquet` | | `NEW_ROWS_COUNT` | Number of synthetic records to generate | `100` | | `User prompt` (line 66) | Replace `"YOUR PROMPT GOES HERE"` with generation instructions | `Generate questions about machine learning...` | | `System message` (line 62) | Controls the model's role | `"You are a data generator. Output ONLY the format below..."` | | `max_tokens` | Max tokens per response | `200` | | `temperature` | Creativity of generation | `0.7` | | `top_p` | Nucleus sampling threshold | `0.95` | | `top_k` | Top-k sampling threshold | `50` | | `min_p` | Minimum probability threshold | `0.05` | The script expects output in the format: ``` Question: Answer: ``` ```bash bash scripts/generate-data.sh ``` ### 2. scripts/finetune.sh Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA adapter to `./model/`. **Edit `finetune.py`:** | Variable | Description | Example | |---|---|---| | `DATA_PATH` | Path to training Parquet file | `./data/output.parquet` | | `OUTPUT_DIR` | Directory to save LoRA adapters (leave at default) | `./model` | | `BATCH_SIZE` | Per-device batch size | `2` | | `GRADIENT_ACCUMULATION_STEPS` | Gradient accumulation steps | `8` | | `LEARNING_RATE` | Training learning rate | `2e-4` | | `MAX_LENGTH` | Maximum sequence length | `4096` | | `TRAIN_EPOCHS` | Number of training epochs | `1` | | `model_name` (line 74) | Base model to fine-tune | `"Qwen/Qwen3.5-2B""` | ```bash bash scripts/finetune.sh ``` ### 3. scripts/merge-and-convert.sh Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp. **Edit `merge.py`:** | Variable | Description | Example | |---|---|---| | `BASE_MODEL_PATH` | Path to the base model (same as model_name in finetune.py) | `"Qwen/Qwen3.5-2B"` | | `LORA_DIR` | Path to LoRA adapters (leave at default) | `./model` | | `MERGED_MODEL_PATH` | Output directory for merged model (leave at default) | `./merged_model` | ```bash bash scripts/merge-and-convert.sh ``` ### 4. scripts/run-model.sh Runs the converted GGUF model using llama.cpp's CLI interface for inference. **Edit `run-model.sh`:** | Variable | Description | Example | |---|---|---| | Model path | Path to the GGUF file (gguf file name will vary based on base model) | `./merged_model/model.gguf` | ```bash bash scripts/run-model.sh ``` ## Output Structure ``` ./model/ ← LoRA adapters (from finetune.sh) ./merged_model/ ← Merged HF model + GGUF file (from merge-and-convert.sh) llama.cpp/ ← llama.cpp repository (created by setup.sh) scripts/ ← Individual pipeline step scripts setup.sh ← Setup script (venv + llama.cpp build/symlink) run-pipeline.sh ← Run full pipeline (finetune → merge/convert → run) ``` ## Troubleshooting ### llama.cpp build fails See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md Common issues: - **CUDA**: Requires a systemwide CUDA installation (NVIDIA drivers + CUDA toolkit) - **ROCm**: Requires a systemwide ROCm installation (AMD drivers + ROCm toolkit) - **Vulkan**: Requires Vulkan drivers + `libvulkan-dev`, `glslc`, `spirv-headers` - **cmake**: Install via `sudo apt install cmake` (Debian/Ubuntu) ### Out of memory during training - Reduce `BATCH_SIZE` in `finetune.py` (lower = less VRAM usage) - Increase `GRADIENT_ACCUMULATION_STEPS` to compensate (higher = longer finetuning time) - `EFFECTIVE_BATCH_SIZE` = `BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS` = 16+ - Reduce `MAX_LENGTH` to fit shorter sequences - Set `load_in_4bit=True` in `finetune.py` (line 77) for QLoRA ### llama-cpp-python install fails - Ensure llama.cpp is built successfully first (or build it yourself if you want to use a backend other than CUDA, ROCm or Vulkan) - Try CPU-only install first to verify: `pip install llama-cpp-python` - Check [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/) ## Project Structure ``` ├── finetune.py ← Training script ├── merge.py ← Merge LoRA into base model ├── synthetic-data.py ← Generate synthetic training data ├── requirements.txt ← Python dependencies ├── setup.sh ← One-time setup ├── run-pipeline.sh ← Run full pipeline ├── scripts/ │ ├── generate-data.sh │ ├── finetune.sh │ ├── merge-and-convert.sh │ └── run-model.sh └── README.md ```