mirror of
https://github.com/L-yang-yang/cugenopt.git
synced 2026-04-25 12:16:21 +02:00
| .. | ||
| cugenopt | ||
| MANIFEST.in | ||
| pyproject.toml | ||
| README.md | ||
| test_cugenopt.py | ||
| test_jit.py | ||
cuGenOpt Python
GPU-accelerated general-purpose metaheuristic solver for combinatorial optimization.
All problems (built-in and custom) use the same JIT compilation pipeline. First call to each problem type takes ~9s to compile; subsequent calls use cached binaries (~0.1s).
Requirements
- NVIDIA GPU with driver installed
nvcccompiler — either:- CUDA Toolkit installed on the system, or
pip install nvidia-cuda-nvcc-cu12
- Python >= 3.8
Installation
pip install cugenopt
pip install nvidia-cuda-nvcc-cu12 # if no system CUDA Toolkit
Quick Start
import numpy as np
import cugenopt
# TSP: 20 cities
n = 20
coords = np.random.rand(n, 2).astype(np.float32)
dist = np.sqrt(((coords[:, None] - coords[None, :]) ** 2).sum(axis=2))
result = cugenopt.solve_tsp(dist, time_limit=5.0, seed=42)
print(f"Best distance: {result['objective']:.2f}")
print(f"Route: {result['solution'][0]}")
print(f"Time: {result['elapsed_ms']:.0f}ms, Generations: {result['generations']}")
# 0-1 Knapsack
weights = np.array([2, 3, 4, 5], dtype=np.float32)
values = np.array([3, 4, 5, 6], dtype=np.float32)
result = cugenopt.solve_knapsack(weights, values, capacity=10.0, max_gen=2000)
print(f"Best value: {result['objective']:.0f}")
# GPU info
info = cugenopt.gpu_info()
print(f"GPU: {info['name']}, Compute: {info['compute_capability']}")
Built-in Problems
| Function | Problem | Encoding |
|---|---|---|
solve_tsp |
Traveling Salesman | Permutation |
solve_knapsack |
0-1 Knapsack | Binary |
solve_qap |
Quadratic Assignment | Permutation |
solve_assignment |
Assignment | Permutation |
solve_vrp |
Capacitated VRP | Perm-Partition |
solve_vrptw |
VRP with Time Windows | Perm-Partition |
solve_graph_color |
Graph Coloring | Integer |
solve_bin_packing |
Bin Packing | Integer |
solve_load_balance |
Load Balancing | Integer |
Solver Parameters
All solve_* functions accept keyword arguments:
| Parameter | Default | Description |
|---|---|---|
pop_size |
0 (auto) | Population size (0 = auto-detect from GPU) |
max_gen |
1000 | Maximum generations |
time_limit |
0 (none) | Time limit in seconds |
seed |
42 | Random seed |
use_aos |
False | Enable Adaptive Operator Selection |
sa_temp_init |
0 | Simulated annealing initial temperature |
verbose |
False | Print progress |
Return Value
All functions return a dict:
{
"objective": float, # best objective value
"penalty": float, # constraint violation (0 = feasible)
"solution": [np.array], # list of row arrays
"elapsed_ms": float, # wall-clock time
"generations": int, # generations completed
"stop_reason": str, # "max_gen" | "time_limit" | "stagnation"
"objectives": [float], # all objective values
}
Custom Problems (JIT)
For problems not covered by the built-in solvers, use solve_custom() to define
your own objective function in CUDA:
import numpy as np
import cugenopt
n = 30
coords = np.random.rand(n, 2).astype(np.float32)
dist = np.sqrt(((coords[:, None] - coords[None, :]) ** 2).sum(axis=2))
result = cugenopt.solve_custom(
compute_obj="""
if (idx != 0) return 0.0f;
float total = 0.0f;
const int* route = sol.data[0];
int size = sol.dim2_sizes[0];
for (int i = 0; i < size; i++)
total += d_dist[route[i] * _n + route[(i+1) % size]];
return total;
""",
data={"d_dist": dist},
encoding="permutation",
dim2=64,
n=n,
time_limit=10.0,
)
print(f"Best: {result['objective']:.2f}")
The first call compiles the CUDA code (~9s). Subsequent calls with the same code use the cached binary (~0.1s).
solve_custom() Parameters
| Parameter | Description |
|---|---|
compute_obj |
CUDA code for objective function body |
compute_penalty |
CUDA code for penalty function body (default: return 0.0f;) |
data |
Dict of name → numpy float32 array |
int_data |
Dict of name → numpy int32 array |
encoding |
"permutation", "binary", or "integer" |
dim1, dim2 |
Solution dimensions |
n |
Problem size |
objectives |
List of (direction, weight) tuples |
value_lower, value_upper |
Bounds for integer encoding |
row_mode |
"single", "fixed", or "partition" |
Use cugenopt.clear_cache() to remove cached compilations.