Oracle/Unsloth-Finetune-Template

Fork 0

Oracle b7b30c9181

Fix typos and small mistakes in README.md

2026-06-02 17:47:54 +02:00

7.5 KiB

Raw Permalink Blame History

Unsloth Fine-Tune Template

Linux only — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.

A template for fine-tuning LLMs using Unsloth and converting to GGUF format with llama.cpp.

Prerequisites

Linux OS
Python 3.10+
NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
cmake
git

Quick Start

# 1. Setup (clones llama.cpp, builds it, installs dependencies)
bash setup.sh

# 2. Configure scripts (see variables below)

# 3. Run full pipeline
bash run-pipeline.sh

Workflow

scripts/generate-data.sh   → Generate synthetic training data (optional)
scripts/finetune.sh        → Fine-tune model with LoRA adapters
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
scripts/run-model.sh       → Run the converted GGUF model
run-pipeline.sh            → Run finetune → merge/convert → run in sequence

Setup

setup.sh will:

Create a Python virtual environment and install Python dependencies
Clone llama.cpp (fresh build) or symlink an existing build
Build llama.cpp with shared libraries (-DBUILD_SHARED_LIBS=ON)
Install llama-cpp-python bindings linked against the shared library (-DLLAMA_BUILD=OFF)

Using an existing llama.cpp build: Choose option 2 and provide the absolute path. The build must have been created with -DBUILD_SHARED_LIBS=ON and contain libllama.so. Setup will create a symlink at ./llama.cpp.

Backend Selection

Backend is only prompted when building llama.cpp from scratch. Choose based on your GPU:

Choice	Backend	Requirements
1	CUDA (NVIDIA)	Systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
2	ROCm (AMD)	Systemwide ROCm installation (AMD drivers + ROCm toolkit)
3	Vulkan	Vulkan drivers + `libvulkan-dev`, `glslc`, `spirv-headers`
4	CPU only	None

Vulkan dependencies (Ubuntu/Debian):

sudo apt-get install libvulkan-dev glslc spirv-headers

Verify Vulkan is correctly installed:

vulkaninfo

Should run without errors.

Existing llama.cpp Build

If using option 2 (existing build), ensure it was compiled with shared libraries:

cmake -B build -DBUILD_SHARED_LIBS=ON # Add your custom build options
cmake --build build --config Release -j$(nproc)

The build must contain libllama.so (typically at build/bin/libllama.so).

Scripts

1. scripts/generate-data.sh

Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.

Edit synthetic-data.py:

Variable	Description	Example
`GGUF_MODEL_PATH`	Path to the GGUF model used for generation	`./path/to/model.gguf`
`INPUT_PARQUET_PATH`	Path to existing training data to extend	`./data/train.parquet`
`OUTPUT_PARQUET_PATH`	Path to save the combined dataset	`./data/output.parquet`
`NEW_ROWS_COUNT`	Number of synthetic records to generate	`100`
`User prompt` (line 66)	Replace `"YOUR PROMPT GOES HERE"` with generation instructions	`Generate questions about machine learning...`
`System message` (line 62)	Controls the model's role	`"You are a data generator. Output ONLY the format below..."`
`max_tokens`	Max tokens per response	`200`
`temperature`	Creativity of generation	`0.7`
`top_p`	Nucleus sampling threshold	`0.95`
`top_k`	Top-k sampling threshold	`50`
`min_p`	Minimum probability threshold	`0.05`

The script expects output in the format:

Question: <generated question>
Answer: <generated answer>

bash scripts/generate-data.sh

2. scripts/finetune.sh

Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA adapter to ./model/.

Edit finetune.py:

Variable	Description	Example
`DATA_PATH`	Path to training Parquet file	`./data/output.parquet`
`OUTPUT_DIR`	Directory to save LoRA adapters (leave at default)	`./model`
`BATCH_SIZE`	Per-device batch size	`2`
`GRADIENT_ACCUMULATION_STEPS`	Gradient accumulation steps	`8`
`LEARNING_RATE`	Training learning rate	`2e-4`
`MAX_LENGTH`	Maximum sequence length	`4096`
`TRAIN_EPOCHS`	Number of training epochs	`1`
`model_name` (line 74)	Base model to fine-tune	`"Qwen/Qwen3.5-2B""`

bash scripts/finetune.sh

3. scripts/merge-and-convert.sh

Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.

Edit merge.py:

Variable	Description	Example
`BASE_MODEL_PATH`	Path to the base model (same as model_name in finetune.py)	`"Qwen/Qwen3.5-2B"`
`LORA_DIR`	Path to LoRA adapters (leave at default)	`./model`
`MERGED_MODEL_PATH`	Output directory for merged model (leave at default)	`./merged_model`

bash scripts/merge-and-convert.sh

4. scripts/run-model.sh

Runs the converted GGUF model using llama.cpp's CLI interface for inference.

Edit run-model.sh:

Variable	Description	Example
Model path	Path to the GGUF file (gguf file name will vary based on base model)	`./merged_model/model.gguf`

bash scripts/run-model.sh

Output Structure

./model/                  ← LoRA adapters (from finetune.sh)
./merged_model/           ← Merged HF model + GGUF file (from merge-and-convert.sh)
llama.cpp/                ← llama.cpp repository (created by setup.sh)
scripts/                  ← Individual pipeline step scripts
setup.sh                  ← Setup script (venv + llama.cpp build/symlink)
run-pipeline.sh           ← Run full pipeline (finetune → merge/convert → run)

Troubleshooting

llama.cpp build fails

See the official build guide: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

Common issues:

CUDA: Requires a systemwide CUDA installation (NVIDIA drivers + CUDA toolkit)
ROCm: Requires a systemwide ROCm installation (AMD drivers + ROCm toolkit)
Vulkan: Requires Vulkan drivers + libvulkan-dev, glslc, spirv-headers
cmake: Install via sudo apt install cmake (Debian/Ubuntu)

Out of memory during training

Reduce BATCH_SIZE in finetune.py (lower = less VRAM usage)
Increase GRADIENT_ACCUMULATION_STEPS to compensate (higher = longer finetuning time)
EFFECTIVE_BATCH_SIZE = BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS = 16+
Reduce MAX_LENGTH to fit shorter sequences
Set load_in_4bit=True in finetune.py (line 77) for QLoRA

llama-cpp-python install fails

Ensure llama.cpp is built successfully first (or build it yourself if you want to use a backend other than CUDA, ROCm or Vulkan)
Try CPU-only install first to verify: pip install llama-cpp-python
Check llama-cpp-python docs

Project Structure

├── finetune.py           ← Training script
├── merge.py              ← Merge LoRA into base model
├── synthetic-data.py     ← Generate synthetic training data
├── requirements.txt      ← Python dependencies
├── setup.sh              ← One-time setup
├── run-pipeline.sh       ← Run full pipeline
├── scripts/
│   ├── generate-data.sh
│   ├── finetune.sh
│   ├── merge-and-convert.sh
│   └── run-model.sh
└── README.md

7.5 KiB Raw Permalink Blame History