| scripts | ||
| finetune.py | ||
| merge.py | ||
| README.md | ||
| requirements.txt | ||
| run-pipeline.sh | ||
| setup.sh | ||
| synthetic-data.py | ||
Unsloth Fine-Tune Template
Linux only — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.
A template for fine-tuning LLMs using Unsloth and converting to GGUF format with llama.cpp.
Prerequisites
Quick Start
# 1. Setup (clones llama.cpp, builds it, installs dependencies)
bash setup.sh
# 2. Configure scripts (see variables below)
# 3. Run full pipeline
bash run-pipeline.sh
Workflow
scripts/generate-data.sh → Generate synthetic training data (optional)
scripts/finetune.sh → Fine-tune model with LoRA adapters
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
scripts/run-model.sh → Run the converted GGUF model
run-pipeline.sh → Run finetune → merge/convert → run in sequence
Setup
setup.sh will:
- Create a Python virtual environment and install Python dependencies
- Clone llama.cpp
- Build llama.cpp with your selected GPU backend
- Install llama-cpp-python bindings with matching backend flags
Backend Selection
| Choice | Backend | Requirements |
|---|---|---|
| 1 | CUDA (NVIDIA) | NVIDIA drivers, CUDA toolkit |
| 2 | ROCm (AMD) | AMD drivers, HIP toolkit |
| 3 | Vulkan | Vulkan drivers |
| 4 | CPU only | None |
Scripts
1. scripts/generate-data.sh
Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.
Edit synthetic-data.py:
| Variable | Description | Example |
|---|---|---|
GGUF_MODEL_PATH |
Path to the GGUF model used for generation | ./path/to/model.gguf |
INPUT_PARQUET_PATH |
Path to existing training data to extend | ./data/train.parquet |
OUTPUT_PARQUET_PATH |
Path to save the combined dataset | ./data/output.parquet |
NEW_ROWS_COUNT |
Number of synthetic records to generate | 100 |
bash scripts/generate-data.sh
2. scripts/finetune.sh
Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to ./model/.
Edit finetune.py:
| Variable | Description | Example |
|---|---|---|
DATA_PATH |
Path to training Parquet file | ./data/output.parquet |
OUTPUT_DIR |
Directory to save LoRA adapters | ./model |
BATCH_SIZE |
Per-device batch size | 2 |
GRADIENT_ACCUMULATION_STEPS |
Gradient accumulation steps | 8 |
LEARNING_RATE |
Training learning rate | 2e-4 |
MAX_LENGTH |
Maximum sequence length | 4096 |
TRAIN_EPOCHS |
Number of training epochs | 1 |
model_name (line 74) |
Base model to fine-tune | "unsloth/Llama-3.2-3B-Instruct" |
bash scripts/finetune.sh
3. scripts/merge-and-convert.sh
Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.
Edit merge.py:
| Variable | Description | Example |
|---|---|---|
BASE_MODEL_PATH |
Path to the base model | "" (empty to load from HuggingFace) |
LORA_DIR |
Path to LoRA adapters | ./model |
MERGED_MODEL_PATH |
Output directory for merged model | ./merged_model |
bash scripts/merge-and-convert.sh
4. scripts/run-model.sh
Runs the converted GGUF model using llama.cpp's CLI interface for inference.
Edit run-model.sh:
| Variable | Description | Example |
|---|---|---|
| Model path | Path to the GGUF file | ./merged_model/model.gguf |
bash scripts/run-model.sh
Output Structure
./model/ ← LoRA adapters (from finetune.sh)
./merged_model/ ← Merged HF model + GGUF file (from merge-and-convert.sh)
llama.cpp/ ← llama.cpp repository (created by setup.sh)
scripts/ ← Individual pipeline step scripts
setup.sh ← Setup script (venv + llama.cpp build)
run-pipeline.sh ← Run full pipeline (finetune → merge/convert → run)
Troubleshooting
llama.cpp build fails
See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
Common issues:
- CUDA: Ensure NVIDIA drivers and CUDA toolkit are installed
- ROCm: Ensure AMD drivers and HIP toolkit are installed
- Vulkan: Ensure Vulkan drivers and SDK are installed
- cmake: Install via
sudo apt install cmake(Debian/Ubuntu)
Out of memory during training
- Reduce
BATCH_SIZEinfinetune.py - Increase
GRADIENT_ACCUMULATION_STEPSto compensate - Reduce
MAX_LENGTHto fit shorter sequences - Set
load_in_4bit=Trueinfinetune.py(line 77)
llama-cpp-python install fails
- Ensure llama.cpp is built successfully first
- Try CPU-only install first to verify:
pip install llama-cpp-python - Check llama-cpp-python docs for other backends
Project Structure
├── finetune.py ← Training script
├── merge.py ← Merge LoRA into base model
├── synthetic-data.py ← Generate synthetic training data
├── requirements.txt ← Python dependencies
├── setup.sh ← One-time setup
├── run-pipeline.sh ← Run full pipeline
├── scripts/
│ ├── generate-data.sh
│ ├── finetune.sh
│ ├── merge-and-convert.sh
│ └── run-model.sh
└── README.md