Unsloth-Finetune-Template/README.md

6.3 KiB

Unsloth Fine-Tune Template

Linux only — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.

A template for fine-tuning LLMs using Unsloth and converting to GGUF format with llama.cpp.

Prerequisites

  • Linux OS
  • Python 3.10+
  • NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
  • cmake
  • git

Quick Start

# 1. Setup (clones llama.cpp, builds it, installs dependencies)
bash setup.sh

# 2. Configure scripts (see variables below)

# 3. Run full pipeline
bash run-pipeline.sh

Workflow

scripts/generate-data.sh   → Generate synthetic training data (optional)
scripts/finetune.sh        → Fine-tune model with LoRA adapters
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
scripts/run-model.sh       → Run the converted GGUF model
run-pipeline.sh            → Run finetune → merge/convert → run in sequence

Setup

setup.sh will:

  1. Create a Python virtual environment and install Python dependencies
  2. Clone llama.cpp or symlink an existing build
  3. Build llama.cpp with your selected GPU backend (skip if using existing)
  4. Install llama-cpp-python bindings with matching backend flags

Using an existing llama.cpp build: Choose option 2 and provide the absolute path to your existing build. Setup will create a symlink at ./llama.cpp.

Backend Selection

Choice Backend Requirements
1 CUDA (NVIDIA) Systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
2 ROCm (AMD) Systemwide ROCm installation (AMD drivers + ROCm toolkit)
3 Vulkan Vulkan drivers + libvulkan-dev, glslc, spirv-headers
4 CPU only None

Vulkan dependencies (Ubuntu/Debian):

sudo apt-get install libvulkan-dev glslc spirv-headers

Verify Vulkan is correctly installed:

vulkaninfo

Should run without errors.

Scripts

1. scripts/generate-data.sh

Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.

Edit synthetic-data.py:

Variable Description Example
GGUF_MODEL_PATH Path to the GGUF model used for generation ./path/to/model.gguf
INPUT_PARQUET_PATH Path to existing training data to extend ./data/train.parquet
OUTPUT_PARQUET_PATH Path to save the combined dataset ./data/output.parquet
NEW_ROWS_COUNT Number of synthetic records to generate 100
User prompt (line 67) Replace "YOUR PROMPT GOES HERE" with generation instructions Generate questions about machine learning...
System message (line 63) Controls the model's role "You are a data generator. Output ONLY the format below..."
bash scripts/generate-data.sh

2. scripts/finetune.sh

Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to ./model/.

Edit finetune.py:

Variable Description Example
DATA_PATH Path to training Parquet file ./data/output.parquet
OUTPUT_DIR Directory to save LoRA adapters ./model
BATCH_SIZE Per-device batch size 2
GRADIENT_ACCUMULATION_STEPS Gradient accumulation steps 8
LEARNING_RATE Training learning rate 2e-4
MAX_LENGTH Maximum sequence length 4096
TRAIN_EPOCHS Number of training epochs 1
model_name (line 74) Base model to fine-tune "unsloth/Llama-3.2-3B-Instruct"
bash scripts/finetune.sh

3. scripts/merge-and-convert.sh

Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.

Edit merge.py:

Variable Description Example
BASE_MODEL_PATH Path to the base model "" (empty to load from HuggingFace)
LORA_DIR Path to LoRA adapters ./model
MERGED_MODEL_PATH Output directory for merged model ./merged_model
bash scripts/merge-and-convert.sh

4. scripts/run-model.sh

Runs the converted GGUF model using llama.cpp's CLI interface for inference.

Edit run-model.sh:

Variable Description Example
Model path Path to the GGUF file ./merged_model/model.gguf
bash scripts/run-model.sh

Output Structure

./model/                  ← LoRA adapters (from finetune.sh)
./merged_model/           ← Merged HF model + GGUF file (from merge-and-convert.sh)
llama.cpp/                ← llama.cpp repository (created by setup.sh)
scripts/                  ← Individual pipeline step scripts
setup.sh                  ← Setup script (venv + llama.cpp build)
run-pipeline.sh           ← Run full pipeline (finetune → merge/convert → run)

Troubleshooting

llama.cpp build fails

See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

Common issues:

  • CUDA: Requires a systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
  • ROCm: Requires a systemwide ROCm installation (AMD drivers + ROCm toolkit)
  • Vulkan: Requires Vulkan drivers + libvulkan-dev, glslc, spirv-headers
  • cmake: Install via sudo apt install cmake (Debian/Ubuntu)

Out of memory during training

  • Reduce BATCH_SIZE in finetune.py
  • Increase GRADIENT_ACCUMULATION_STEPS to compensate
  • Reduce MAX_LENGTH to fit shorter sequences
  • Set load_in_4bit=True in finetune.py (line 77)

llama-cpp-python install fails

  • Ensure llama.cpp is built successfully first
  • Try CPU-only install first to verify: pip install llama-cpp-python
  • Check llama-cpp-python docs for other backends

Project Structure

├── finetune.py           ← Training script
├── merge.py              ← Merge LoRA into base model
├── synthetic-data.py     ← Generate synthetic training data
├── requirements.txt      ← Python dependencies
├── setup.sh              ← One-time setup
├── run-pipeline.sh       ← Run full pipeline
├── scripts/
│   ├── generate-data.sh
│   ├── finetune.sh
│   ├── merge-and-convert.sh
│   └── run-model.sh
└── README.md