mirror of https://github.com/L-yang-yang/cugenopt.git synced 2026-04-25 12:16:21 +02:00

L-yang-yang a848730459 fix: harden CUDA safety checks and translate comments to English

Safety fixes (4 critical, 4 warning) from code review:

- qap.cuh: fix clone_to_device cross-device D2H by retaining host matrices
- types.cuh: add CUDA_CHECK to InjectBuffer, track owner_gpu for safe destroy
- types.cuh: add bounds check on lexicographic priority index
- solver.cuh: cap migrate_kernel islands to MAX_ISLANDS=64 to prevent stack overflow
- multi_gpu_solver.cuh: guard against 0 GPUs, propagate stop_reason from best GPU
- types.cuh: warn on SeqRegistry overflow
- solver.cuh: warn when constraint_directed/phased_search disabled without AOS

Translate all Chinese comments to English across 25+ source files
(core/*.cuh, problems/*.cuh, Makefile, multi-GPU tests).

Verified on V100S×2 (sm_70, CUDA 12.8): e5 (12 problem types, all optimal),
e13 (multi-objective + multi-GPU, 9 configs, all passed).

2026-03-25 11:52:50 +08:00

7.1 KiB

Raw Blame History

cuGenOpt

A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization

Paper: cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization

Overview

cuGenOpt is a high-performance, problem-agnostic GPU metaheuristic framework designed for combinatorial optimization. It provides:

Generic Solution Encodings: Permutation, Binary, Integer, and Partition representations
Adaptive Operator Selection (AOS): Runtime weight adjustment via exponential moving average
Three-Layer Adaptive Architecture: Static priors (L1) + Runtime AOS (L3) for cold-start avoidance
GPU Memory Hierarchy Optimization: L2 cache-aware population sizing and adaptive shared memory management
Multi-GPU Support: Independent parallel solving with automatic device management
Python API + CUDA C++: High-level interface with JIT compilation for custom problems

Key Features

Feature	Description
12+ Problem Types	TSP, VRP, VRPTW, Knapsack, QAP, JSP, Assignment, Graph Coloring, Bin Packing, and more
Adaptive Search	EMA-driven operator weight adjustment during runtime
Problem Profiling	Automatic initial strategy selection based on problem characteristics
Memory-Aware	Automatic population sizing based on GPU L2 cache capacity
Multi-Objective	Weighted sum and lexicographic optimization modes
Cross-Platform	Unified workflow on Linux and Windows

Quick Start

Option 1: Python API (Recommended)

pip install cugenopt
pip install nvidia-cuda-nvcc-cu12  # If system CUDA Toolkit not available

Solve Built-in Problems:

import numpy as np
import cugenopt

# Solve TSP
dist = np.random.rand(50, 50).astype(np.float32)
dist = (dist + dist.T) / 2  # Make symmetric
result = cugenopt.solve_tsp(dist, time_limit=10.0)
print(f"Best tour length: {result['best_obj']}")
print(f"Tour: {result['best_solution']}")

Define Custom Problems with JIT:

result = cugenopt.solve_custom(
    compute_obj="""
        if (idx != 0) return 0.0f;
        float total = 0.0f;
        const int* route = sol.data[0];
        int size = sol.dim2_sizes[0];
        for (int i = 0; i < size; i++)
            total += d_dist[route[i] * _n + route[(i+1) % size]];
        return total;
    """,
    data={"d_dist": dist},
    encoding="permutation",
    dim2=50,
    n=50,
    time_limit=10.0
)

Option 2: CUDA C++ Direct Usage

cd prototype
make tsp
./tsp

Define your own problem by inheriting ProblemBase and implementing compute_obj / compute_penalty.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Python API Layer                     │
│  (Built-in Problems + JIT Compiler for Custom Problems) │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│                 Core Framework (CUDA C++)               │
│  • Adaptive Solver (L1 Priors + L3 Runtime AOS)        │
│  • Operator Registry (Swap, Reverse, Insert, LNS, ...)  │
│  • Population Management (Elite + Diversity)            │
│  • Multi-GPU Coordinator                                │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│              GPU Execution Engine                       │
│  • L2 Cache-Aware Memory Management                     │
│  • Adaptive Shared Memory Allocation                    │
│  • CUDA Kernels (Population-level + Neighborhood-level) │
└─────────────────────────────────────────────────────────┘

Performance Highlights

Benchmark Results

Problem	Instance	cuGenOpt	Best Known	Gap
TSP	kroA100	21,282	21,282	0.00%
TSP	kroA200	29,368	29,368	0.00%
QAP	nug12	578	578	0.00% (Optimal)
VRPTW	C101	828.94	828.94	0.00%
VRPTW	R101	1,650.80	1,645.79	0.30%

GPU Scalability

GPU	Memory Bandwidth	TSP n=1000 Speedup
T4	300 GB/s	1.0× (baseline)
V100	900 GB/s	1.6×
A800	1,935 GB/s	3.6×

Memory-bound workload: performance scales linearly with bandwidth.

Multi-GPU Effectiveness

Problem	Single GPU	2× GPU	4× GPU	Improvement
TSP n=1000	7,542,668	7,277,989	7,236,344	3.51%
QAP n=100	1,520,516	1,502,084	1,498,404	1.45%

With CUDA Graph enabled. Larger problems benefit more from parallel exploration.

Requirements

Hardware

NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
Recommended: 8GB+ GPU memory for large-scale problems

Software

CUDA Toolkit 11.0+
Python 3.8+ (for Python API)
GCC 7.5+ or MSVC 2019+ (for C++ compilation)

Installation

Python Package

come soon

pip install cugenopt

Build from Source

git clone https://github.com/L-yang-yang/cugenopt.git
cd cugenopt/python
pip install -e .

CUDA C++ Only

cd prototype
make all

Citation

If you use cuGenOpt in your research, please cite:

@article{liu2026cugenopt,
  title={cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization},
  author={Liu, Yuyang},
  journal={arXiv preprint arXiv:2603.19163},
  year={2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Contact

Yuyang Liu
Independent Researcher, Shenzhen, China
Email: 15251858055@163.com

Acknowledgments

This work was conducted as independent research. Special thanks to the open-source community for providing excellent tools and libraries that made this project possible.

7.1 KiB Raw Blame History Unescape Escape