# cuGenOpt > **A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization** [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![CUDA](https://img.shields.io/badge/CUDA-11.0%2B-green.svg)](https://developer.nvidia.com/cuda-toolkit) [![Python](https://img.shields.io/badge/Python-3.8%2B-blue.svg)](https://www.python.org/) **Paper**: [cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization](https://arxiv.org/abs/2603.19163) --- ## Overview cuGenOpt is a high-performance, problem-agnostic GPU metaheuristic framework designed for combinatorial optimization. It provides: - **Generic Solution Encodings**: Permutation, Binary, Integer, and Partition representations - **Adaptive Operator Selection (AOS)**: Runtime weight adjustment via exponential moving average - **Three-Layer Adaptive Architecture**: Static priors (L1) + Runtime AOS (L3) for cold-start avoidance - **GPU Memory Hierarchy Optimization**: L2 cache-aware population sizing and adaptive shared memory management - **Multi-GPU Support**: Independent parallel solving with automatic device management - **Python API + CUDA C++**: High-level interface with JIT compilation for custom problems ### Key Features | Feature | Description | |---------|-------------| | **12+ Problem Types** | TSP, VRP, VRPTW, Knapsack, QAP, JSP, Assignment, Graph Coloring, Bin Packing, and more | | **Adaptive Search** | EMA-driven operator weight adjustment during runtime | | **Problem Profiling** | Automatic initial strategy selection based on problem characteristics | | **Memory-Aware** | Automatic population sizing based on GPU L2 cache capacity | | **Multi-Objective** | Weighted sum and lexicographic optimization modes | | **Cross-Platform** | Unified workflow on Linux and Windows | --- ## Quick Start ### Option 1: Python API (Recommended) ```bash pip install cugenopt pip install nvidia-cuda-nvcc-cu12 # If system CUDA Toolkit not available ``` **Solve Built-in Problems:** ```python import numpy as np import cugenopt # Solve TSP dist = np.random.rand(50, 50).astype(np.float32) dist = (dist + dist.T) / 2 # Make symmetric result = cugenopt.solve_tsp(dist, time_limit=10.0) print(f"Best tour length: {result['best_obj']}") print(f"Tour: {result['best_solution']}") ``` **Define Custom Problems with JIT:** ```python result = cugenopt.solve_custom( compute_obj=""" if (idx != 0) return 0.0f; float total = 0.0f; const int* route = sol.data[0]; int size = sol.dim2_sizes[0]; for (int i = 0; i < size; i++) total += d_dist[route[i] * _n + route[(i+1) % size]]; return total; """, data={"d_dist": dist}, encoding="permutation", dim2=50, n=50, time_limit=10.0 ) ``` ### Option 2: CUDA C++ Direct Usage ```bash cd prototype make tsp ./tsp ``` Define your own problem by inheriting `ProblemBase` and implementing `compute_obj` / `compute_penalty`. --- ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ Python API Layer │ │ (Built-in Problems + JIT Compiler for Custom Problems) │ └─────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────┐ │ Core Framework (CUDA C++) │ │ • Adaptive Solver (L1 Priors + L3 Runtime AOS) │ │ • Operator Registry (Swap, Reverse, Insert, LNS, ...) │ │ • Population Management (Elite + Diversity) │ │ • Multi-GPU Coordinator │ └─────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────┐ │ GPU Execution Engine │ │ • L2 Cache-Aware Memory Management │ │ • Adaptive Shared Memory Allocation │ │ • CUDA Kernels (Population-level + Neighborhood-level) │ └─────────────────────────────────────────────────────────┘ ``` ## Performance Highlights ### Benchmark Results | Problem | Instance | cuGenOpt | Best Known | Gap | |---------|----------|----------|------------|-----| | TSP | kroA100 | 21,282 | 21,282 | 0.00% | | TSP | kroA200 | 29,368 | 29,368 | 0.00% | | QAP | nug12 | 578 | 578 | **0.00%** (Optimal) | | VRPTW | C101 | 828.94 | 828.94 | 0.00% | | VRPTW | R101 | 1,650.80 | 1,645.79 | 0.30% | ### GPU Scalability | GPU | Memory Bandwidth | TSP n=1000 Speedup | |-----|------------------|-------------------| | T4 | 300 GB/s | 1.0× (baseline) | | V100 | 900 GB/s | 1.6× | | A800 | 1,935 GB/s | 3.6× | *Memory-bound workload: performance scales linearly with bandwidth.* ### Multi-GPU Effectiveness | Problem | Single GPU | 2× GPU | 4× GPU | Improvement | |---------|-----------|--------|--------|-------------| | TSP n=1000 | 7,542,668 | 7,277,989 | 7,236,344 | **3.51%** | | QAP n=100 | 1,520,516 | 1,502,084 | 1,498,404 | **1.45%** | *With CUDA Graph enabled. Larger problems benefit more from parallel exploration.* --- ## Requirements ### Hardware - NVIDIA GPU with Compute Capability 7.0+ (Volta or newer) - Recommended: 8GB+ GPU memory for large-scale problems ### Software - CUDA Toolkit 11.0+ - Python 3.8+ (for Python API) - GCC 7.5+ or MSVC 2019+ (for C++ compilation) --- ## Installation ### Python Package coming soon~ ```bash pip install cugenopt ``` ### Build from Source ```bash git clone https://github.com/L-yang-yang/cugenopt.git cd cugenopt/python pip install -e . ``` ### CUDA C++ Only ```bash cd prototype make all ``` ## Citation If you use cuGenOpt in your research, please cite: ```bibtex @article{liu2026cugenopt, title={cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization}, author={Liu, Yuyang}, journal={arXiv preprint arXiv:2603.19163}, year={2026} } ``` --- ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. --- ## Contact **Yuyang Liu** Independent Researcher, Shenzhen, China Email: 15251858055@163.com --- ## Acknowledgments This work was conducted as independent research. Special thanks to the open-source community for providing excellent tools and libraries that made this project possible.