mirror of https://github.com/L-yang-yang/cugenopt.git synced 2026-04-24 12:06:22 +02:00

L-yang-yang fc5a0ff4af Initial commit: cuGenOpt GPU optimization solver

2026-03-20 00:33:45 +08:00

8.4 KiB

Raw Blame History

cuGenOpt

A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization

Paper: cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization (Coming soon)

Overview

cuGenOpt is a high-performance, problem-agnostic GPU metaheuristic framework designed for combinatorial optimization. It provides:

Generic Solution Encodings: Permutation, Binary, Integer, and Partition representations
Adaptive Operator Selection (AOS): Runtime weight adjustment via exponential moving average
Three-Layer Adaptive Architecture: Static priors (L1) + Runtime AOS (L3) for cold-start avoidance
GPU Memory Hierarchy Optimization: L2 cache-aware population sizing and adaptive shared memory management
Multi-GPU Support: Independent parallel solving with automatic device management
Python API + CUDA C++: High-level interface with JIT compilation for custom problems

Key Features

Feature	Description
12+ Problem Types	TSP, VRP, VRPTW, Knapsack, QAP, JSP, Assignment, Graph Coloring, Bin Packing, and more
Adaptive Search	EMA-driven operator weight adjustment during runtime
Problem Profiling	Automatic initial strategy selection based on problem characteristics
Memory-Aware	Automatic population sizing based on GPU L2 cache capacity
Multi-Objective	Weighted sum and lexicographic optimization modes
Cross-Platform	Unified workflow on Linux and Windows

Quick Start

Option 1: Python API (Recommended)

pip install cugenopt
pip install nvidia-cuda-nvcc-cu12  # If system CUDA Toolkit not available

Solve Built-in Problems:

import numpy as np
import cugenopt

# Solve TSP
dist = np.random.rand(50, 50).astype(np.float32)
dist = (dist + dist.T) / 2  # Make symmetric
result = cugenopt.solve_tsp(dist, time_limit=10.0)
print(f"Best tour length: {result['best_obj']}")
print(f"Tour: {result['best_solution']}")

Define Custom Problems with JIT:

result = cugenopt.solve_custom(
    compute_obj="""
        if (idx != 0) return 0.0f;
        float total = 0.0f;
        const int* route = sol.data[0];
        int size = sol.dim2_sizes[0];
        for (int i = 0; i < size; i++)
            total += d_dist[route[i] * _n + route[(i+1) % size]];
        return total;
    """,
    data={"d_dist": dist},
    encoding="permutation",
    dim2=50,
    n=50,
    time_limit=10.0
)

Option 2: CUDA C++ Direct Usage

cd prototype
make tsp
./tsp

Define your own problem by inheriting ProblemBase and implementing compute_obj / compute_penalty.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    Python API Layer                     │
│  (Built-in Problems + JIT Compiler for Custom Problems) │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│                 Core Framework (CUDA C++)               │
│  • Adaptive Solver (L1 Priors + L3 Runtime AOS)        │
│  • Operator Registry (Swap, Reverse, Insert, LNS, ...)  │
│  • Population Management (Elite + Diversity)            │
│  • Multi-GPU Coordinator                                │
└─────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────┐
│              GPU Execution Engine                       │
│  • L2 Cache-Aware Memory Management                     │
│  • Adaptive Shared Memory Allocation                    │
│  • CUDA Kernels (Population-level + Neighborhood-level) │
└─────────────────────────────────────────────────────────┘

Project Structure

generic_solver/
├── prototype/              # Core framework (header-only .cuh files)
│   ├── core/              #   Solver, operators, population, types
│   └── problems/          #   12+ problem implementations
├── python/                 # Python wrapper (pip install cugenopt)
│   ├── cugenopt/          #   Python package (built-ins + JIT compiler)
│   └── tests/             #   Test suite
├── benchmark/              # Experiments and benchmarks
│   ├── experiments/       #   E0-E13: 14 experiment groups
│   ├── data/              #   Standard instances (TSPLIB, Solomon, QAPLIB)
│   └── results/           #   Experimental reports
├── paper_v3_en/            # Paper source (LaTeX)
├── STATUS.md               # Project status and roadmap
└── README.md               # This file

Performance Highlights

Benchmark Results

Problem	Instance	cuGenOpt	Best Known	Gap
TSP	kroA100	21,282	21,282	0.00%
TSP	kroA200	29,368	29,368	0.00%
QAP	nug12	578	578	0.00% (Optimal)
VRPTW	C101	828.94	828.94	0.00%
VRPTW	R101	1,650.80	1,645.79	0.30%

GPU Scalability

GPU	Memory Bandwidth	TSP n=1000 Speedup
T4	300 GB/s	1.0× (baseline)
V100	900 GB/s	1.6×
A800	1,935 GB/s	3.6×

Memory-bound workload: performance scales linearly with bandwidth.

Multi-GPU Effectiveness

Problem	Single GPU	2× GPU	4× GPU	Improvement
TSP n=1000	7,542,668	7,277,989	7,236,344	3.51%
QAP n=100	1,520,516	1,502,084	1,498,404	1.45%

With CUDA Graph enabled. Larger problems benefit more from parallel exploration.

Requirements

Hardware

NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
Recommended: 8GB+ GPU memory for large-scale problems

Software

CUDA Toolkit 11.0+
Python 3.8+ (for Python API)
GCC 7.5+ or MSVC 2019+ (for C++ compilation)

Installation

Python Package

pip install cugenopt

Build from Source

git clone https://github.com/L-yang-yang/cugenopt.git
cd cugenopt/python
pip install -e .

CUDA C++ Only

cd prototype
make all

Documentation

Document	Description
STATUS.md	Project status, roadmap, and design decisions
Python API Guide	Detailed Python API documentation
Benchmark Design	Experimental methodology
Paper	Full technical details and evaluation

Citation

If you use cuGenOpt in your research, please cite:

@article{liu2026cugenopt,
  title={cuGenOpt: A GPU-Accelerated General-Purpose Metaheuristic Framework for Combinatorial Optimization},
  author={Liu, Yuyang},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Contact

Yuyang Liu
Independent Researcher, Shenzhen, China
Email: 15251858055@163.com

Acknowledgments

This work was conducted as independent research. Special thanks to the open-source community for providing excellent tools and libraries that made this project possible.

8.4 KiB Raw Blame History Unescape Escape