Template with pipeline that goes from a model and input data to a fully finetuned GGUF
Find a file
2026-06-02 15:55:34 +02:00
scripts Initial commit 2026-06-02 15:45:59 +02:00
finetune.py Initial commit 2026-06-02 15:45:59 +02:00
merge.py Initial commit 2026-06-02 15:45:59 +02:00
README.md Fix setup 2026-06-02 15:55:34 +02:00
requirements.txt Initial commit 2026-06-02 15:45:59 +02:00
run-pipeline.sh Initial commit 2026-06-02 15:45:59 +02:00
setup.sh Fix setup 2026-06-02 15:55:34 +02:00
synthetic-data.py Initial commit 2026-06-02 15:45:59 +02:00

Unsloth Fine-Tune Template

Linux only — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.

A template for fine-tuning LLMs using Unsloth and converting to GGUF format with llama.cpp.

Prerequisites

  • Linux OS
  • Python 3.10+
  • NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
  • cmake
  • git

Quick Start

# 1. Setup (clones llama.cpp, builds it, installs dependencies)
bash setup.sh

# 2. Configure scripts (see variables below)

# 3. Run full pipeline
bash run-pipeline.sh

Workflow

scripts/generate-data.sh   → Generate synthetic training data (optional)
scripts/finetune.sh        → Fine-tune model with LoRA adapters
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
scripts/run-model.sh       → Run the converted GGUF model
run-pipeline.sh            → Run finetune → merge/convert → run in sequence

Setup

setup.sh will:

  1. Create a Python virtual environment and install Python dependencies
  2. Clone llama.cpp
  3. Build llama.cpp with your selected GPU backend
  4. Install llama-cpp-python bindings with matching backend flags

Backend Selection

Choice Backend Requirements
1 CUDA (NVIDIA) Systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
2 ROCm (AMD) Systemwide ROCm installation (AMD drivers + ROCm toolkit)
3 Vulkan Vulkan drivers + libvulkan-dev, glslc, spirv-headers
4 CPU only None

Vulkan dependencies (Ubuntu/Debian):

sudo apt-get install libvulkan-dev glslc spirv-headers

Verify Vulkan is correctly installed:

vulkaninfo

Should run without errors.

Scripts

1. scripts/generate-data.sh

Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.

Edit synthetic-data.py:

Variable Description Example
GGUF_MODEL_PATH Path to the GGUF model used for generation ./path/to/model.gguf
INPUT_PARQUET_PATH Path to existing training data to extend ./data/train.parquet
OUTPUT_PARQUET_PATH Path to save the combined dataset ./data/output.parquet
NEW_ROWS_COUNT Number of synthetic records to generate 100
bash scripts/generate-data.sh

2. scripts/finetune.sh

Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to ./model/.

Edit finetune.py:

Variable Description Example
DATA_PATH Path to training Parquet file ./data/output.parquet
OUTPUT_DIR Directory to save LoRA adapters ./model
BATCH_SIZE Per-device batch size 2
GRADIENT_ACCUMULATION_STEPS Gradient accumulation steps 8
LEARNING_RATE Training learning rate 2e-4
MAX_LENGTH Maximum sequence length 4096
TRAIN_EPOCHS Number of training epochs 1
model_name (line 74) Base model to fine-tune "unsloth/Llama-3.2-3B-Instruct"
bash scripts/finetune.sh

3. scripts/merge-and-convert.sh

Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.

Edit merge.py:

Variable Description Example
BASE_MODEL_PATH Path to the base model "" (empty to load from HuggingFace)
LORA_DIR Path to LoRA adapters ./model
MERGED_MODEL_PATH Output directory for merged model ./merged_model
bash scripts/merge-and-convert.sh

4. scripts/run-model.sh

Runs the converted GGUF model using llama.cpp's CLI interface for inference.

Edit run-model.sh:

Variable Description Example
Model path Path to the GGUF file ./merged_model/model.gguf
bash scripts/run-model.sh

Output Structure

./model/                  ← LoRA adapters (from finetune.sh)
./merged_model/           ← Merged HF model + GGUF file (from merge-and-convert.sh)
llama.cpp/                ← llama.cpp repository (created by setup.sh)
scripts/                  ← Individual pipeline step scripts
setup.sh                  ← Setup script (venv + llama.cpp build)
run-pipeline.sh           ← Run full pipeline (finetune → merge/convert → run)

Troubleshooting

llama.cpp build fails

See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

Common issues:

  • CUDA: Requires a systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
  • ROCm: Requires a systemwide ROCm installation (AMD drivers + ROCm toolkit)
  • Vulkan: Requires Vulkan drivers + libvulkan-dev, glslc, spirv-headers
  • cmake: Install via sudo apt install cmake (Debian/Ubuntu)

Out of memory during training

  • Reduce BATCH_SIZE in finetune.py
  • Increase GRADIENT_ACCUMULATION_STEPS to compensate
  • Reduce MAX_LENGTH to fit shorter sequences
  • Set load_in_4bit=True in finetune.py (line 77)

llama-cpp-python install fails

  • Ensure llama.cpp is built successfully first
  • Try CPU-only install first to verify: pip install llama-cpp-python
  • Check llama-cpp-python docs for other backends

Project Structure

├── finetune.py           ← Training script
├── merge.py              ← Merge LoRA into base model
├── synthetic-data.py     ← Generate synthetic training data
├── requirements.txt      ← Python dependencies
├── setup.sh              ← One-time setup
├── run-pipeline.sh       ← Run full pipeline
├── scripts/
│   ├── generate-data.sh
│   ├── finetune.sh
│   ├── merge-and-convert.sh
│   └── run-model.sh
└── README.md