Initial commit

2026-06-02 15:45:59 +02:00 · 2026-06-02 15:45:59 +02:00 · da2c8e636c
commit da2c8e636c
11 changed files with 755 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,176 @@
+# Unsloth Fine-Tune Template
+
+> **Linux only** — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.
+
+A template for fine-tuning LLMs using [Unsloth](https://github.com/unslothai/unsloth) and converting to GGUF format with [llama.cpp](https://github.com/ggerganov/llama.cpp).
+
+## Prerequisites
+
+- Linux OS
+- Python 3.10+
+- NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
+- [cmake](https://cmake.org/)
+- [git](https://git-scm.com/)
+
+## Quick Start
+
+```bash
+# 1. Setup (clones llama.cpp, builds it, installs dependencies)
+bash setup.sh
+
+# 2. Configure scripts (see variables below)
+
+# 3. Run full pipeline
+bash run-pipeline.sh
+```
+
+## Workflow
+
+```
+scripts/generate-data.sh   → Generate synthetic training data (optional)
+scripts/finetune.sh        → Fine-tune model with LoRA adapters
+scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
+scripts/run-model.sh       → Run the converted GGUF model
+run-pipeline.sh            → Run finetune → merge/convert → run in sequence
+```
+
+## Setup
+
+`setup.sh` will:
+1. Create a Python virtual environment and install Python dependencies
+2. Clone [llama.cpp](https://github.com/ggml-org/llama.cpp)
+3. Build llama.cpp with your selected GPU backend
+4. Install llama-cpp-python bindings with matching backend flags
+
+### Backend Selection
+
+| Choice | Backend | Requirements |
+|---|---|---|
+| 1 | CUDA (NVIDIA) | NVIDIA drivers, CUDA toolkit |
+| 2 | ROCm (AMD) | AMD drivers, HIP toolkit |
+| 3 | Vulkan | Vulkan drivers |
+| 4 | CPU only | None |
+
+## Scripts
+
+### 1. scripts/generate-data.sh
+
+Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.
+
+**Edit `synthetic-data.py`:**
+
+| Variable | Description | Example |
+|---|---|---|
+| `GGUF_MODEL_PATH` | Path to the GGUF model used for generation | `./path/to/model.gguf` |
+| `INPUT_PARQUET_PATH` | Path to existing training data to extend | `./data/train.parquet` |
+| `OUTPUT_PARQUET_PATH` | Path to save the combined dataset | `./data/output.parquet` |
+| `NEW_ROWS_COUNT` | Number of synthetic records to generate | `100` |
+
+```bash
+bash scripts/generate-data.sh
+```
+
+### 2. scripts/finetune.sh
+
+Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to `./model/`.
+
+**Edit `finetune.py`:**
+
+| Variable | Description | Example |
+|---|---|---|
+| `DATA_PATH` | Path to training Parquet file | `./data/output.parquet` |
+| `OUTPUT_DIR` | Directory to save LoRA adapters | `./model` |
+| `BATCH_SIZE` | Per-device batch size | `2` |
+| `GRADIENT_ACCUMULATION_STEPS` | Gradient accumulation steps | `8` |
+| `LEARNING_RATE` | Training learning rate | `2e-4` |
+| `MAX_LENGTH` | Maximum sequence length | `4096` |
+| `TRAIN_EPOCHS` | Number of training epochs | `1` |
+| `model_name` (line 74) | Base model to fine-tune | `"unsloth/Llama-3.2-3B-Instruct"` |
+
+```bash
+bash scripts/finetune.sh
+```
+
+### 3. scripts/merge-and-convert.sh
+
+Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.
+
+**Edit `merge.py`:**
+
+| Variable | Description | Example |
+|---|---|---|
+| `BASE_MODEL_PATH` | Path to the base model | `""` (empty to load from HuggingFace) |
+| `LORA_DIR` | Path to LoRA adapters | `./model` |
+| `MERGED_MODEL_PATH` | Output directory for merged model | `./merged_model` |
+
+```bash
+bash scripts/merge-and-convert.sh
+```
+
+### 4. scripts/run-model.sh
+
+Runs the converted GGUF model using llama.cpp's CLI interface for inference.
+
+**Edit `run-model.sh`:**
+
+| Variable | Description | Example |
+|---|---|---|
+| Model path | Path to the GGUF file | `./merged_model/model.gguf` |
+
+```bash
+bash scripts/run-model.sh
+```
+
+## Output Structure
+
+```
+./model/                  ← LoRA adapters (from finetune.sh)
+./merged_model/           ← Merged HF model + GGUF file (from merge-and-convert.sh)
+llama.cpp/                ← llama.cpp repository (created by setup.sh)
+scripts/                  ← Individual pipeline step scripts
+setup.sh                  ← Setup script (venv + llama.cpp build)
+run-pipeline.sh           ← Run full pipeline (finetune → merge/convert → run)
+```
+
+## Troubleshooting
+
+### llama.cpp build fails
+
+See the official build guide:
+https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
+
+Common issues:
+- **CUDA**: Ensure NVIDIA drivers and CUDA toolkit are installed
+- **ROCm**: Ensure AMD drivers and HIP toolkit are installed
+- **Vulkan**: Ensure Vulkan drivers and SDK are installed
+- **cmake**: Install via `sudo apt install cmake` (Debian/Ubuntu)
+
+### Out of memory during training
+
+- Reduce `BATCH_SIZE` in `finetune.py`
+- Increase `GRADIENT_ACCUMULATION_STEPS` to compensate
+- Reduce `MAX_LENGTH` to fit shorter sequences
+- Set `load_in_4bit=True` in `finetune.py` (line 77)
+
+### llama-cpp-python install fails
+
+- Ensure llama.cpp is built successfully first
+- Try CPU-only install first to verify: `pip install llama-cpp-python`
+- Check [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/) for other backends
+
+## Project Structure
+
+```
+├── finetune.py           ← Training script
+├── merge.py              ← Merge LoRA into base model
+├── synthetic-data.py     ← Generate synthetic training data
+├── requirements.txt      ← Python dependencies
+├── setup.sh              ← One-time setup
+├── run-pipeline.sh       ← Run full pipeline
+├── scripts/
+│   ├── generate-data.sh
+│   ├── finetune.sh
+│   ├── merge-and-convert.sh
+│   └── run-model.sh
+└── README.md
+```