8.4 KiB
cuGenOpt
A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization
Paper: cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization (Coming soon)
Overview
cuGenOpt is a high-performance, problem-agnostic GPU metaheuristic framework designed for combinatorial optimization. It provides:
- Generic Solution Encodings: Permutation, Binary, Integer, and Partition representations
- Adaptive Operator Selection (AOS): Runtime weight adjustment via exponential moving average
- Three-Layer Adaptive Architecture: Static priors (L1) + Runtime AOS (L3) for cold-start avoidance
- GPU Memory Hierarchy Optimization: L2 cache-aware population sizing and adaptive shared memory management
- Multi-GPU Support: Independent parallel solving with automatic device management
- Python API + CUDA C++: High-level interface with JIT compilation for custom problems
Key Features
| Feature | Description |
|---|---|
| 12+ Problem Types | TSP, VRP, VRPTW, Knapsack, QAP, JSP, Assignment, Graph Coloring, Bin Packing, and more |
| Adaptive Search | EMA-driven operator weight adjustment during runtime |
| Problem Profiling | Automatic initial strategy selection based on problem characteristics |
| Memory-Aware | Automatic population sizing based on GPU L2 cache capacity |
| Multi-Objective | Weighted sum and lexicographic optimization modes |
| Cross-Platform | Unified workflow on Linux and Windows |
Quick Start
Option 1: Python API (Recommended)
pip install cugenopt
pip install nvidia-cuda-nvcc-cu12 # If system CUDA Toolkit not available
Solve Built-in Problems:
import numpy as np
import cugenopt
# Solve TSP
dist = np.random.rand(50, 50).astype(np.float32)
dist = (dist + dist.T) / 2 # Make symmetric
result = cugenopt.solve_tsp(dist, time_limit=10.0)
print(f"Best tour length: {result['best_obj']}")
print(f"Tour: {result['best_solution']}")
Define Custom Problems with JIT:
result = cugenopt.solve_custom(
compute_obj="""
if (idx != 0) return 0.0f;
float total = 0.0f;
const int* route = sol.data[0];
int size = sol.dim2_sizes[0];
for (int i = 0; i < size; i++)
total += d_dist[route[i] * _n + route[(i+1) % size]];
return total;
""",
data={"d_dist": dist},
encoding="permutation",
dim2=50,
n=50,
time_limit=10.0
)
Option 2: CUDA C++ Direct Usage
cd prototype
make tsp
./tsp
Define your own problem by inheriting ProblemBase and implementing compute_obj / compute_penalty.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Python API Layer │
│ (Built-in Problems + JIT Compiler for Custom Problems) │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ Core Framework (CUDA C++) │
│ • Adaptive Solver (L1 Priors + L3 Runtime AOS) │
│ • Operator Registry (Swap, Reverse, Insert, LNS, ...) │
│ • Population Management (Elite + Diversity) │
│ • Multi-GPU Coordinator │
└─────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────┐
│ GPU Execution Engine │
│ • L2 Cache-Aware Memory Management │
│ • Adaptive Shared Memory Allocation │
│ • CUDA Kernels (Population-level + Neighborhood-level) │
└─────────────────────────────────────────────────────────┘
Project Structure
generic_solver/
├── prototype/ # Core framework (header-only .cuh files)
│ ├── core/ # Solver, operators, population, types
│ └── problems/ # 12+ problem implementations
├── python/ # Python wrapper (pip install cugenopt)
│ ├── cugenopt/ # Python package (built-ins + JIT compiler)
│ └── tests/ # Test suite
├── benchmark/ # Experiments and benchmarks
│ ├── experiments/ # E0-E13: 14 experiment groups
│ ├── data/ # Standard instances (TSPLIB, Solomon, QAPLIB)
│ └── results/ # Experimental reports
├── paper_v3_en/ # Paper source (LaTeX)
├── STATUS.md # Project status and roadmap
└── README.md # This file
Performance Highlights
Benchmark Results
| Problem | Instance | cuGenOpt | Best Known | Gap |
|---|---|---|---|---|
| TSP | kroA100 | 21,282 | 21,282 | 0.00% |
| TSP | kroA200 | 29,368 | 29,368 | 0.00% |
| QAP | nug12 | 578 | 578 | 0.00% (Optimal) |
| VRPTW | C101 | 828.94 | 828.94 | 0.00% |
| VRPTW | R101 | 1,650.80 | 1,645.79 | 0.30% |
GPU Scalability
| GPU | Memory Bandwidth | TSP n=1000 Speedup |
|---|---|---|
| T4 | 300 GB/s | 1.0× (baseline) |
| V100 | 900 GB/s | 1.6× |
| A800 | 1,935 GB/s | 3.6× |
Memory-bound workload: performance scales linearly with bandwidth.
Multi-GPU Effectiveness
| Problem | Single GPU | 2× GPU | 4× GPU | Improvement |
|---|---|---|---|---|
| TSP n=1000 | 7,542,668 | 7,277,989 | 7,236,344 | 3.51% |
| QAP n=100 | 1,520,516 | 1,502,084 | 1,498,404 | 1.45% |
With CUDA Graph enabled. Larger problems benefit more from parallel exploration.
Requirements
Hardware
- NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
- Recommended: 8GB+ GPU memory for large-scale problems
Software
- CUDA Toolkit 11.0+
- Python 3.8+ (for Python API)
- GCC 7.5+ or MSVC 2019+ (for C++ compilation)
Installation
Python Package
pip install cugenopt
Build from Source
git clone https://github.com/L-yang-yang/cugenopt.git
cd cugenopt/python
pip install -e .
CUDA C++ Only
cd prototype
make all
Documentation
| Document | Description |
|---|---|
| STATUS.md | Project status, roadmap, and design decisions |
| Python API Guide | Detailed Python API documentation |
| Benchmark Design | Experimental methodology |
| Paper | Full technical details and evaluation |
Citation
If you use cuGenOpt in your research, please cite:
@article{liu2026cugenopt,
title={cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization},
author={Liu, Yuyang},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2026}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Contact
Yuyang Liu
Independent Researcher, Shenzhen, China
Email: 15251858055@163.com
Acknowledgments
This work was conducted as independent research. Special thanks to the open-source community for providing excellent tools and libraries that made this project possible.