Use existing llama.cpp build for python bindings if possible

This commit is contained in:
Oracle 2026-06-02 17:08:39 +02:00
parent 96e6e141da
commit 815c9aadec
Signed by: Oracle
SSH key fingerprint: SHA256:x4/RtnjUyuHkdvmwNDsWSfcfF1V5PNr3OpriZqOvCX8
5 changed files with 92 additions and 78 deletions

View file

@ -38,14 +38,16 @@ run-pipeline.sh → Run finetune → merge/convert → run in sequenc
`setup.sh` will:
1. Create a Python virtual environment and install Python dependencies
2. Clone [llama.cpp](https://github.com/ggml-org/llama.cpp) or symlink an existing build
3. Build llama.cpp with your selected GPU backend (skip if using existing)
4. Install llama-cpp-python bindings with matching backend flags
2. Clone [llama.cpp](https://github.com/ggml-org/llama.cpp) (fresh build) or symlink an existing build
3. Build llama.cpp with shared libraries (`-DBUILD_SHARED_LIBS=ON`)
4. Install llama-cpp-python bindings linked against the shared library (`-DLLAMA_BUILD=OFF`)
**Using an existing llama.cpp build:** Choose option 2 and provide the absolute path to your existing build. Setup will create a symlink at `./llama.cpp`.
**Using an existing llama.cpp build:** Choose option 2 and provide the absolute path. The build must have been created with `-DBUILD_SHARED_LIBS=ON` and contain `libllama.so`. Setup will create a symlink at `./llama.cpp`.
### Backend Selection
Backend is only prompted when building llama.cpp from scratch. Choose based on your GPU:
| Choice | Backend | Requirements |
|---|---|---|
| 1 | CUDA (NVIDIA) | Systemwide CUDA installation (NVIDIA drivers + CUDA toolkit) |
@ -67,6 +69,17 @@ vulkaninfo
Should run without errors.
### Existing llama.cpp Build
If using option 2 (existing build), ensure it was compiled with shared libraries:
```bash
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON # or -DGGML_HIP=ON / -DGGML_VULKAN=1
cmake --build build --config Release -j$(nproc)
```
The build must contain `libllama.so` (typically at `build/libllama.so`).
## Scripts
### 1. scripts/generate-data.sh
@ -81,8 +94,20 @@ Generates synthetic training data using a GGUF model via llama.cpp. Run this if
| `INPUT_PARQUET_PATH` | Path to existing training data to extend | `./data/train.parquet` |
| `OUTPUT_PARQUET_PATH` | Path to save the combined dataset | `./data/output.parquet` |
| `NEW_ROWS_COUNT` | Number of synthetic records to generate | `100` |
| User prompt (line 67) | Replace `"YOUR PROMPT GOES HERE"` with generation instructions | `Generate questions about machine learning...` |
| System message (line 63) | Controls the model's role | `"You are a data generator. Output ONLY the format below..."` |
| `User prompt` (line 66) | Replace `"YOUR PROMPT GOES HERE"` with generation instructions | `Generate questions about machine learning...` |
| `System message` (line 62) | Controls the model's role | `"You are a data generator. Output ONLY the format below..."` |
| `max_tokens` | Max tokens per response | `200` |
| `temperature` | Creativity of generation | `0.7` |
| `top_p` | Nucleus sampling threshold | `0.95` |
| `top_k` | Top-k sampling threshold | `50` |
| `min_p` | Minimum probability threshold | `0.05` |
The model expects output in the format:
```
Question: <generated question>
Answer: <generated answer>
```
```bash
bash scripts/generate-data.sh

View file

@ -16,7 +16,7 @@ warnings.filterwarnings("ignore")
# ==========================================
# Update these paths
DATA_PATH = "YOUR_PAQUET_FILE_PATH"
DATA_PATH = "YOUR_PARQUET_FILE_PATH"
OUTPUT_DIR = "./model"
# Training params, change these to fit your hardware
BATCH_SIZE = 2
@ -37,7 +37,7 @@ print("Loading data...")
df = pd.read_parquet(DATA_PATH)
# Check required columns
required_cols = ["question", "answer", "label"]
required_cols = ["question", "answer"]
missing_cols = [c for c in required_cols if c not in df.columns]
if missing_cols:
raise ValueError(f"Missing columns in Parquet file: {missing_cols}")

View file

@ -1,2 +1,2 @@
#!/bin/bash
./llama.cpp/build/bin/llama-cli -m ./merged_model/Merged_Model.gguf
./llama.cpp/build/bin/llama-cli -m ./merged_model/model.gguf

View file

@ -11,25 +11,26 @@ python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Select backend for llama-cpp-python binding
echo ""
echo "Select llama.cpp backend:"
echo " 1) CUDA (NVIDIA GPU)"
echo " 2) ROCm (AMD GPU)"
echo " 3) Vulkan (Cross-vendor GPU)"
echo " 4) CPU only"
echo ""
read -p "Enter choice (1-4): " BACKEND
# Ask if fresh build or existing
echo ""
echo "Would you like to:"
echo " 1) Clone and build a fresh copy of llama.cpp"
echo " 2) Use an existing llama.cpp build (symlink)"
echo " 2) Use an existing llama.cpp build"
echo ""
read -p "Enter choice (1-2): " BUILD_CHOICE
LLAMA_CPP_PATH=""
if [ "$BUILD_CHOICE" = "1" ]; then
echo ""
echo "Select llama.cpp backend:"
echo " 1) CUDA (NVIDIA GPU)"
echo " 2) ROCm (AMD GPU)"
echo " 3) Vulkan (Cross-vendor GPU)"
echo " 4) CPU only"
echo ""
read -p "Enter choice (1-4): " BACKEND
echo ""
echo "Cloning llama.cpp..."
git clone https://github.com/ggml-org/llama.cpp.git
@ -44,32 +45,35 @@ if [ "$BUILD_CHOICE" = "1" ]; then
case $BACKEND in
1)
echo "Building with CUDA support..."
cmake -B build -DGGML_CUDA=ON || BUILD_FAILED=1
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON || BUILD_FAILED=1
[ $BUILD_FAILED -eq 0 ] && cmake --build build --config Release -j$(nproc) || BUILD_FAILED=1
;;
2)
echo "Building with ROCm support..."
read -p "Enter GPU target (e.g., gfx1030, gfx942, gfx1100): " ROCM_TARGET
[ -z "$ROCM_TARGET" ] && ROCM_TARGET="gfx1030"
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release || BUILD_FAILED=1
cmake -S . -B build -DBUILD_SHARED_LIBS=ON -DGGML_HIP=ON -DGPU_TARGETS="$ROCM_TARGET" -DCMAKE_BUILD_TYPE=Release || BUILD_FAILED=1
[ $BUILD_FAILED -eq 0 ] && cmake --build build --config Release -j$(nproc) || BUILD_FAILED=1
;;
3)
echo "Building with Vulkan support..."
cmake -B build -DGGML_VULKAN=1 || BUILD_FAILED=1
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_VULKAN=1 || BUILD_FAILED=1
[ $BUILD_FAILED -eq 0 ] && cmake --build build --config Release -j$(nproc) || BUILD_FAILED=1
;;
4)
echo "Building CPU-only..."
cmake -B build || BUILD_FAILED=1
cmake -B build -DBUILD_SHARED_LIBS=ON || BUILD_FAILED=1
[ $BUILD_FAILED -eq 0 ] && cmake --build build --config Release -j$(nproc) || BUILD_FAILED=1
;;
*)
echo "Invalid choice. Building CPU-only."
cmake -B build || BUILD_FAILED=1
cmake -B build -DBUILD_SHARED_LIBS=ON || BUILD_FAILED=1
[ $BUILD_FAILED -eq 0 ] && cmake --build build --config Release -j$(nproc) || BUILD_FAILED=1
;;
esac
LLAMA_CPP_PATH=$(pwd)
cd ..
else
read -p "Enter absolute path to existing llama.cpp build: " LLAMA_CPP_PATH
@ -83,24 +87,37 @@ else
# Resolve to absolute path
LLAMA_CPP_PATH=$(realpath "$LLAMA_CPP_PATH")
# Check for shared library
if [ ! -f "$LLAMA_CPP_PATH/build/libllama.so" ] && [ ! -f "$LLAMA_CPP_PATH/libllama.so" ]; then
echo "Error: Could not find libllama.so in $LLAMA_CPP_PATH"
echo "Make sure llama.cpp was built with -DBUILD_SHARED_LIBS=ON"
exit 1
fi
echo ""
echo "Creating symlink: ./llama.cpp -> $LLAMA_CPP_PATH"
echo "Using existing build: $LLAMA_CPP_PATH"
ln -sfn "$LLAMA_CPP_PATH" llama.cpp
fi
# Install llama-cpp-python in main venv
# Find the shared library
if [ -f "$LLAMA_CPP_PATH/build/libllama.so" ]; then
LLAMA_CPP_LIB="$LLAMA_CPP_PATH/build/libllama.so"
elif [ -f "$LLAMA_CPP_PATH/libllama.so" ]; then
LLAMA_CPP_LIB="$LLAMA_CPP_PATH/libllama.so"
else
echo "Error: Could not locate libllama.so"
exit 1
fi
echo ""
echo "Using shared library: $LLAMA_CPP_LIB"
# Install llama-cpp-python with existing build
echo ""
echo "Installing llama-cpp-python..."
case $BACKEND in
1) CMAKE_ARGS="-DGGML_CUDA=on" ;;
2) CMAKE_ARGS="-DGGML_HIP=on" ;;
3) CMAKE_ARGS="-DGGML_VULKAN=on" ;;
*) CMAKE_ARGS="" ;;
esac
source venv/bin/activate
eval "CMAKE_ARGS=\"$CMAKE_ARGS\" pip install llama-cpp-python"
LLAMA_CPP_LIB="$LLAMA_CPP_LIB" LLAMA_CPP_LIB_PATH="$LLAMA_CPP_LIB" CMAKE_ARGS="-DLLAMA_BUILD=OFF" pip install llama-cpp-python
# Create convertgguf_venv for llama.cpp Python tools
echo ""

View file

@ -9,6 +9,11 @@ GGUF_MODEL_PATH = "./path/to/model.gguf"
INPUT_PARQUET_PATH = "./path/to/input.parquet"
OUTPUT_PARQUET_PATH = "./path/to/output.parquet"
NEW_ROWS_COUNT = 100
MAX_TOKENS = 200
TEMPERATURE = 0.7
TOP_P = 0.95
TOP_K = 50
MIN_P = 0.05
# Check if files exist
if not os.path.exists(GGUF_MODEL_PATH):
@ -18,7 +23,7 @@ if not os.path.exists(INPUT_PARQUET_PATH):
print(f"❌ Error: Input Parquet file not found at {INPUT_PARQUET_PATH}")
exit()
# 2. LOAD GGUF MODEL - GPU (Vulkan) ONLY
# 2. LOAD GGUF MODEL - GPU
print("Loading llama.cpp model...")
try:
model = Llama(
@ -31,7 +36,7 @@ try:
use_mmap=True,
use_mlock=False,
)
print("✅ llama.cpp model loaded with Vulkan GPU.")
print("✅ llama.cpp model loaded.")
except Exception as e:
print(f"❌ Error loading model: {e}")
exit()
@ -48,8 +53,6 @@ except Exception as e:
print(f"❌ Error loading dataset: {e}")
exit()
existing_labels = list(set(original_ds["label"]))
# 4. GENERATE SYNTHETIC DATA - STRUCTURED OUTPUT
print(f"Generating {NEW_ROWS_COUNT} synthetic records...")
synthetic_data = []
@ -70,11 +73,11 @@ for i in range(NEW_ROWS_COUNT):
# Generate with sampling parameters
response = model.create_chat_completion(
messages=messages,
max_tokens=200,
temperature=1.0,
top_p=0.95,
top_k=20,
min_p=0.0,
max_tokens=MAX_TOKENS,
temperature=TEMPERATURE,
top_p=TOP_P,
top_k=TOP_K,
min_p=MIN_P,
)
# Get response text
@ -86,67 +89,36 @@ for i in range(NEW_ROWS_COUNT):
question = None
answer = None
label = None
found_question = False
found_answer = False
found_label = False
for line in lines:
line = line.strip()
# Extract Question
if "Question:" in line and "Answer:" not in line:
match = re.search(
r"Question:\s*(.+?)(?:\nAnswer|\nLabel|$)", line, re.IGNORECASE
r"Question:\s*(.+?)(?:\nAnswer|$)", line, re.IGNORECASE
)
if match:
question = match.group(1).strip()
found_question = True
# Extract Answer
elif "Answer:" in line:
match = re.search(r"Answer:\s*(.+?)(?:\nLabel|$)", line, re.IGNORECASE)
match = re.search(r"Answer:\s*(.+)", line, re.IGNORECASE)
if match:
answer = match.group(1).strip()
found_answer = True
# Extract Label
elif "Label:" in line:
match = re.search(r"Label:\s*(.+)", line, re.IGNORECASE)
if match:
label = match.group(1).strip()
found_label = True
# VALIDATION
if not all([question, answer]):
print(f"⚠️ Row {i + 1}: Incomplete output. Skipping.")
for line in lines:
print(line)
continue
if not label:
label = "unbiased"
else:
# Normalize label
label = (
label.lower().strip('"').strip("'").replace("[", "").replace("]", "")
)
if label not in existing_labels:
print(f"⚠️ Row {i + 1}: Invalid label '{label}'. Skipping.")
continue
# Clean up
question = re.sub(r"```.*?```", "", question).strip()
answer = re.sub(r"```.*?```", "", answer).strip()
parsed_row = {"question": question, "answer": answer, "label": label}
parsed_row = {"question": question, "answer": answer}
# PRINT PARSED DATA IN TERMINAL
print(f"✅ ROW {i + 1} PARSED:")
print(f" Question: {question}")
print(f" Answer: {answer}")
print(f" Label: {label}")
print()
synthetic_data.append(parsed_row)