Initial commit
This commit is contained in:
commit
da2c8e636c
11 changed files with 755 additions and 0 deletions
176
README.md
Normal file
176
README.md
Normal file
|
|
@ -0,0 +1,176 @@
|
|||
# Unsloth Fine-Tune Template
|
||||
|
||||
> **Linux only** — This template is designed for Linux systems with NVIDIA GPU (CUDA), AMD GPU (ROCm), or Vulkan support.
|
||||
|
||||
A template for fine-tuning LLMs using [Unsloth](https://github.com/unslothai/unsloth) and converting to GGUF format with [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Linux OS
|
||||
- Python 3.10+
|
||||
- NVIDIA GPU (CUDA) or AMD GPU (ROCm) or Vulkan-compatible GPU
|
||||
- [cmake](https://cmake.org/)
|
||||
- [git](https://git-scm.com/)
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# 1. Setup (clones llama.cpp, builds it, installs dependencies)
|
||||
bash setup.sh
|
||||
|
||||
# 2. Configure scripts (see variables below)
|
||||
|
||||
# 3. Run full pipeline
|
||||
bash run-pipeline.sh
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
```
|
||||
scripts/generate-data.sh → Generate synthetic training data (optional)
|
||||
scripts/finetune.sh → Fine-tune model with LoRA adapters
|
||||
scripts/merge-and-convert.sh → Merge LoRA into base model and convert to GGUF
|
||||
scripts/run-model.sh → Run the converted GGUF model
|
||||
run-pipeline.sh → Run finetune → merge/convert → run in sequence
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
`setup.sh` will:
|
||||
1. Create a Python virtual environment and install Python dependencies
|
||||
2. Clone [llama.cpp](https://github.com/ggml-org/llama.cpp)
|
||||
3. Build llama.cpp with your selected GPU backend
|
||||
4. Install llama-cpp-python bindings with matching backend flags
|
||||
|
||||
### Backend Selection
|
||||
|
||||
| Choice | Backend | Requirements |
|
||||
|---|---|---|
|
||||
| 1 | CUDA (NVIDIA) | NVIDIA drivers, CUDA toolkit |
|
||||
| 2 | ROCm (AMD) | AMD drivers, HIP toolkit |
|
||||
| 3 | Vulkan | Vulkan drivers |
|
||||
| 4 | CPU only | None |
|
||||
|
||||
## Scripts
|
||||
|
||||
### 1. scripts/generate-data.sh
|
||||
|
||||
Generates synthetic training data using a GGUF model via llama.cpp. Run this if you need to create or extend a training dataset.
|
||||
|
||||
**Edit `synthetic-data.py`:**
|
||||
|
||||
| Variable | Description | Example |
|
||||
|---|---|---|
|
||||
| `GGUF_MODEL_PATH` | Path to the GGUF model used for generation | `./path/to/model.gguf` |
|
||||
| `INPUT_PARQUET_PATH` | Path to existing training data to extend | `./data/train.parquet` |
|
||||
| `OUTPUT_PARQUET_PATH` | Path to save the combined dataset | `./data/output.parquet` |
|
||||
| `NEW_ROWS_COUNT` | Number of synthetic records to generate | `100` |
|
||||
|
||||
```bash
|
||||
bash scripts/generate-data.sh
|
||||
```
|
||||
|
||||
### 2. scripts/finetune.sh
|
||||
|
||||
Fine-tunes a model using Unsloth with LoRA adapters. Saves LoRA weights to `./model/`.
|
||||
|
||||
**Edit `finetune.py`:**
|
||||
|
||||
| Variable | Description | Example |
|
||||
|---|---|---|
|
||||
| `DATA_PATH` | Path to training Parquet file | `./data/output.parquet` |
|
||||
| `OUTPUT_DIR` | Directory to save LoRA adapters | `./model` |
|
||||
| `BATCH_SIZE` | Per-device batch size | `2` |
|
||||
| `GRADIENT_ACCUMULATION_STEPS` | Gradient accumulation steps | `8` |
|
||||
| `LEARNING_RATE` | Training learning rate | `2e-4` |
|
||||
| `MAX_LENGTH` | Maximum sequence length | `4096` |
|
||||
| `TRAIN_EPOCHS` | Number of training epochs | `1` |
|
||||
| `model_name` (line 74) | Base model to fine-tune | `"unsloth/Llama-3.2-3B-Instruct"` |
|
||||
|
||||
```bash
|
||||
bash scripts/finetune.sh
|
||||
```
|
||||
|
||||
### 3. scripts/merge-and-convert.sh
|
||||
|
||||
Merges LoRA adapters into the base model, saves the merged model, then converts to GGUF format using llama.cpp.
|
||||
|
||||
**Edit `merge.py`:**
|
||||
|
||||
| Variable | Description | Example |
|
||||
|---|---|---|
|
||||
| `BASE_MODEL_PATH` | Path to the base model | `""` (empty to load from HuggingFace) |
|
||||
| `LORA_DIR` | Path to LoRA adapters | `./model` |
|
||||
| `MERGED_MODEL_PATH` | Output directory for merged model | `./merged_model` |
|
||||
|
||||
```bash
|
||||
bash scripts/merge-and-convert.sh
|
||||
```
|
||||
|
||||
### 4. scripts/run-model.sh
|
||||
|
||||
Runs the converted GGUF model using llama.cpp's CLI interface for inference.
|
||||
|
||||
**Edit `run-model.sh`:**
|
||||
|
||||
| Variable | Description | Example |
|
||||
|---|---|---|
|
||||
| Model path | Path to the GGUF file | `./merged_model/model.gguf` |
|
||||
|
||||
```bash
|
||||
bash scripts/run-model.sh
|
||||
```
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
./model/ ← LoRA adapters (from finetune.sh)
|
||||
./merged_model/ ← Merged HF model + GGUF file (from merge-and-convert.sh)
|
||||
llama.cpp/ ← llama.cpp repository (created by setup.sh)
|
||||
scripts/ ← Individual pipeline step scripts
|
||||
setup.sh ← Setup script (venv + llama.cpp build)
|
||||
run-pipeline.sh ← Run full pipeline (finetune → merge/convert → run)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### llama.cpp build fails
|
||||
|
||||
See the official build guide:
|
||||
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md
|
||||
|
||||
Common issues:
|
||||
- **CUDA**: Ensure NVIDIA drivers and CUDA toolkit are installed
|
||||
- **ROCm**: Ensure AMD drivers and HIP toolkit are installed
|
||||
- **Vulkan**: Ensure Vulkan drivers and SDK are installed
|
||||
- **cmake**: Install via `sudo apt install cmake` (Debian/Ubuntu)
|
||||
|
||||
### Out of memory during training
|
||||
|
||||
- Reduce `BATCH_SIZE` in `finetune.py`
|
||||
- Increase `GRADIENT_ACCUMULATION_STEPS` to compensate
|
||||
- Reduce `MAX_LENGTH` to fit shorter sequences
|
||||
- Set `load_in_4bit=True` in `finetune.py` (line 77)
|
||||
|
||||
### llama-cpp-python install fails
|
||||
|
||||
- Ensure llama.cpp is built successfully first
|
||||
- Try CPU-only install first to verify: `pip install llama-cpp-python`
|
||||
- Check [llama-cpp-python docs](https://llama-cpp-python.readthedocs.io/en/latest/) for other backends
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
├── finetune.py ← Training script
|
||||
├── merge.py ← Merge LoRA into base model
|
||||
├── synthetic-data.py ← Generate synthetic training data
|
||||
├── requirements.txt ← Python dependencies
|
||||
├── setup.sh ← One-time setup
|
||||
├── run-pipeline.sh ← Run full pipeline
|
||||
├── scripts/
|
||||
│ ├── generate-data.sh
|
||||
│ ├── finetune.sh
|
||||
│ ├── merge-and-convert.sh
|
||||
│ └── run-model.sh
|
||||
└── README.md
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue