Template with pipeline that goes from a model and input data to a fully finetuned GGUF

Find a file

Oracle da2c8e636c Initial commit		2026-06-02 15:45:59 +02:00
scripts	Initial commit	2026-06-02 15:45:59 +02:00
finetune.py	Initial commit	2026-06-02 15:45:59 +02:00
merge.py	Initial commit	2026-06-02 15:45:59 +02:00
README.md	Initial commit	2026-06-02 15:45:59 +02:00
requirements.txt	Initial commit	2026-06-02 15:45:59 +02:00
run-pipeline.sh	Initial commit	2026-06-02 15:45:59 +02:00
setup.sh	Initial commit	2026-06-02 15:45:59 +02:00
synthetic-data.py	Initial commit	2026-06-02 15:45:59 +02:00

README.md

Unsloth Fine-Tune Template

Linux only — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.

A template for fine-tuning LLMs using Unsloth and converting to GGUF format with llama.cpp.

Prerequisites

Linux OS
Python 3.10+
NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
cmake
git

Quick Start

# 1. Setup (clones llama.cpp, builds it, installs dependencies)
bash setup.sh

# 2. Configure scripts (see variables below)

# 3. Run full pipeline
bash run-pipeline.sh

Workflow

scripts/generate-data.sh   → Generate synthetic training data (optional)
scripts/finetune.sh        → Fine-tune model with LoRA adapters
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
scripts/run-model.sh       → Run the converted GGUF model
run-pipeline.sh            → Run finetune → merge/convert → run in sequence

Setup

setup.sh will:

Create a Python virtual environment and install Python dependencies
Clone llama.cpp
Build llama.cpp with your selected GPU backend
Install llama-cpp-python bindings with matching backend flags

Backend Selection

Choice	Backend	Requirements
1	CUDA (NVIDIA)	NVIDIA drivers, CUDA toolkit
2	ROCm (AMD)	AMD drivers, HIP toolkit
3	Vulkan	Vulkan drivers
4	CPU only	None

Scripts

1. scripts/generate-data.sh

Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.

Edit synthetic-data.py:

Variable	Description	Example
`GGUF_MODEL_PATH`	Path to the GGUF model used for generation	`./path/to/model.gguf`
`INPUT_PARQUET_PATH`	Path to existing training data to extend	`./data/train.parquet`
`OUTPUT_PARQUET_PATH`	Path to save the combined dataset	`./data/output.parquet`
`NEW_ROWS_COUNT`	Number of synthetic records to generate	`100`

bash scripts/generate-data.sh

2. scripts/finetune.sh

Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to ./model/.

Edit finetune.py:

Variable	Description	Example
`DATA_PATH`	Path to training Parquet file	`./data/output.parquet`
`OUTPUT_DIR`	Directory to save LoRA adapters	`./model`
`BATCH_SIZE`	Per-device batch size	`2`
`GRADIENT_ACCUMULATION_STEPS`	Gradient accumulation steps	`8`
`LEARNING_RATE`	Training learning rate	`2e-4`
`MAX_LENGTH`	Maximum sequence length	`4096`
`TRAIN_EPOCHS`	Number of training epochs	`1`
`model_name` (line 74)	Base model to fine-tune	`"unsloth/Llama-3.2-3B-Instruct"`

bash scripts/finetune.sh

3. scripts/merge-and-convert.sh

Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.

Edit merge.py:

Variable	Description	Example
`BASE_MODEL_PATH`	Path to the base model	`""` (empty to load from HuggingFace)
`LORA_DIR`	Path to LoRA adapters	`./model`
`MERGED_MODEL_PATH`	Output directory for merged model	`./merged_model`

bash scripts/merge-and-convert.sh

4. scripts/run-model.sh

Runs the converted GGUF model using llama.cpp's CLI interface for inference.

Edit run-model.sh:

Variable	Description	Example
Model path	Path to the GGUF file	`./merged_model/model.gguf`

bash scripts/run-model.sh

Output Structure

./model/                  ← LoRA adapters (from finetune.sh)
./merged_model/           ← Merged HF model + GGUF file (from merge-and-convert.sh)
llama.cpp/                ← llama.cpp repository (created by setup.sh)
scripts/                  ← Individual pipeline step scripts
setup.sh                  ← Setup script (venv + llama.cpp build)
run-pipeline.sh           ← Run full pipeline (finetune → merge/convert → run)

Troubleshooting

llama.cpp build fails

See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

Common issues:

CUDA: Ensure NVIDIA drivers and CUDA toolkit are installed
ROCm: Ensure AMD drivers and HIP toolkit are installed
Vulkan: Ensure Vulkan drivers and SDK are installed
cmake: Install via sudo apt install cmake (Debian/Ubuntu)

Out of memory during training

Reduce BATCH_SIZE in finetune.py
Increase GRADIENT_ACCUMULATION_STEPS to compensate
Reduce MAX_LENGTH to fit shorter sequences
Set load_in_4bit=True in finetune.py (line 77)

llama-cpp-python install fails

Ensure llama.cpp is built successfully first
Try CPU-only install first to verify: pip install llama-cpp-python
Check llama-cpp-python docs for other backends

Project Structure

├── finetune.py           ← Training script
├── merge.py              ← Merge LoRA into base model
├── synthetic-data.py     ← Generate synthetic training data
├── requirements.txt      ← Python dependencies
├── setup.sh              ← One-time setup
├── run-pipeline.sh       ← Run full pipeline
├── scripts/
│   ├── generate-data.sh
│   ├── finetune.sh
│   ├── merge-and-convert.sh
│   └── run-model.sh
└── README.md