mirror of
https://github.com/L-yang-yang/cugenopt.git
synced 2026-04-24 12:06:22 +02:00
Initial commit: cuGenOpt GPU optimization solver
This commit is contained in:
commit
fc5a0ff4af
117 changed files with 25545 additions and 0 deletions
155
skills/cugenopt-problem-gen/reference/encoding-guide.md
Normal file
155
skills/cugenopt-problem-gen/reference/encoding-guide.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
# Encoding Selection & Dimension Guide
|
||||
|
||||
## Encoding Types
|
||||
|
||||
cuGenOpt supports three encoding types. Choose based on the nature of the decision variables.
|
||||
|
||||
### Permutation
|
||||
|
||||
**Use when**: each element appears exactly once (ordering/assignment).
|
||||
|
||||
| Scenario | RowMode | D1 | D2 | dim2_default | total_elements |
|
||||
|----------|---------|----|----|-------------|----------------|
|
||||
| TSP (n cities) | Single | 1 | next_pow2(n) | n | — |
|
||||
| QAP (n facilities) | Single | 1 | next_pow2(n) | n | — |
|
||||
| Assignment (n tasks) | Single | 1 | next_pow2(n) | n | — |
|
||||
| JSP (m machines, j jobs) | Fixed | next_pow2(m) | next_pow2(j) | j | — |
|
||||
| VRP (k vehicles, n customers) | Partition | next_pow2(k) | max(next_pow2(n/k*2), 64) | 0 | n |
|
||||
| VRPTW (k vehicles, n customers) | Partition | next_pow2(k) | max(next_pow2(n/k*2), 64) | 0 | n |
|
||||
|
||||
**Partition specifics**:
|
||||
- `dim2_default = 0` tells the framework to distribute elements across rows
|
||||
- `total_elements = n` is the count of elements to distribute
|
||||
- `cross_row_prob` controls how often cross-row operators fire (typically 0.2–0.4)
|
||||
- Elements are customer/job indices `0..n-1`; depot/source is implicit (not in the solution)
|
||||
|
||||
### Binary
|
||||
|
||||
**Use when**: each position is a yes/no decision.
|
||||
|
||||
| Scenario | RowMode | D1 | D2 | dim2_default |
|
||||
|----------|---------|----|----|-------------|
|
||||
| 0-1 Knapsack (n items) | Single | 1 | next_pow2(n) | n |
|
||||
| Scheduling (n shifts) | Single | 1 | next_pow2(n) | n |
|
||||
| Subset selection (n candidates) | Single | 1 | next_pow2(n) | n |
|
||||
| Multi-row scheduling (m workers, n shifts) | Fixed | next_pow2(m) | next_pow2(n) | n |
|
||||
|
||||
**Solution values**: `sol.data[row][col]` is 0 or 1.
|
||||
|
||||
### Integer
|
||||
|
||||
**Use when**: each position takes a bounded integer value.
|
||||
|
||||
| Scenario | RowMode | D1 | D2 | dim2_default | lower_bound | upper_bound |
|
||||
|----------|---------|----|----|-------------|-------------|-------------|
|
||||
| Graph coloring (n nodes, c colors) | Single | 1 | next_pow2(n) | n | 0 | c-1 |
|
||||
| Load balancing (n tasks, m machines) | Single | 1 | next_pow2(n) | n | 0 | m-1 |
|
||||
| Multi-machine scheduling | Fixed | next_pow2(m) | next_pow2(j) | j | 0 | max_time |
|
||||
|
||||
**Solution values**: `sol.data[row][col]` is in `[value_lower_bound, value_upper_bound]`.
|
||||
|
||||
Set bounds in config:
|
||||
```cuda
|
||||
cfg.value_lower_bound = 0;
|
||||
cfg.value_upper_bound = num_colors - 1;
|
||||
```
|
||||
|
||||
## Dimension Calculation Rules
|
||||
|
||||
### D1 and D2 (Template Parameters)
|
||||
|
||||
These are **compile-time constants** and define the maximum capacity:
|
||||
- Must be sufficient for the largest instance you plan to solve
|
||||
- Power of 2 is recommended for memory alignment
|
||||
- Larger values waste registers/memory; keep as small as possible
|
||||
|
||||
```
|
||||
next_pow2(x):
|
||||
1→1, 2→2, 3→4, 5→8, 9→16, 17→32, 33→64, 65→128, ...
|
||||
```
|
||||
|
||||
### dim1 and dim2_default (Runtime Parameters)
|
||||
|
||||
Set in `config()` to the actual problem size:
|
||||
- `dim1 ≤ D1`: actual number of rows used
|
||||
- `dim2_default ≤ D2`: actual number of columns per row
|
||||
- For Partition mode: `dim2_default = 0` (framework handles distribution)
|
||||
|
||||
### Choosing D2 for Partition Mode
|
||||
|
||||
Since rows have variable length, D2 must accommodate the longest possible row:
|
||||
```
|
||||
D2 = max(next_pow2(total_elements / D1 * 2), 64)
|
||||
```
|
||||
The `*2` factor provides headroom for unbalanced distributions.
|
||||
|
||||
## Shared Memory Sizing
|
||||
|
||||
### When to Use Shared Memory
|
||||
|
||||
Shared memory provides ~10x faster access than global memory. Use it when:
|
||||
- Problem has a data matrix (distance, cost, weight)
|
||||
- The matrix is accessed repeatedly during objective/penalty evaluation
|
||||
|
||||
### How to Size
|
||||
|
||||
Report the **actual** data size. The framework handles the rest:
|
||||
|
||||
```cuda
|
||||
size_t shared_mem_bytes() const {
|
||||
// Distance matrix + demand array
|
||||
return (size_t)stride * stride * sizeof(float) + (size_t)n * sizeof(float);
|
||||
}
|
||||
```
|
||||
|
||||
The framework automatically:
|
||||
1. If ≤ 48KB: uses default shared memory
|
||||
2. If 48KB–max_smem: calls `cudaFuncSetAttribute` to extend (GPU-dependent max: T4=64KB, V100=96KB, A100/A800=164KB, H100=228KB)
|
||||
3. If > max_smem: falls back to global memory, uses `working_set_bytes()` for L2 cache population sizing
|
||||
|
||||
### working_set_bytes
|
||||
|
||||
Always return the actual data size, regardless of whether it fits in shared memory:
|
||||
|
||||
```cuda
|
||||
size_t working_set_bytes() const {
|
||||
return (size_t)n * n * sizeof(float);
|
||||
}
|
||||
```
|
||||
|
||||
This is used by the framework to auto-calculate population size based on L2 cache capacity.
|
||||
|
||||
## RowMode Details
|
||||
|
||||
### Single (default)
|
||||
- `dim1 = 1`, single row of elements
|
||||
- No cross-row operators
|
||||
- Simplest and most common
|
||||
|
||||
### Fixed
|
||||
- `dim1 > 1`, all rows have the same length (`dim2_default`)
|
||||
- Cross-row operators: ROW_SWAP, ROW_REVERSE
|
||||
- No SPLIT/MERGE (rows cannot change length)
|
||||
- Use for: JSP (machines × jobs), multi-worker scheduling
|
||||
|
||||
### Partition
|
||||
- `dim1 > 1`, rows have variable length
|
||||
- Elements are distributed across rows (total count = `total_elements`)
|
||||
- Cross-row operators: CROSS_RELOCATE, CROSS_SWAP, SEG_RELOCATE, SEG_SWAP, CROSS_EXCHANGE, SPLIT, MERGE
|
||||
- `cross_row_prob` controls the probability of selecting cross-row operators
|
||||
- Use for: VRP (vehicles × customers), any partitioning problem
|
||||
|
||||
## Quick Reference: Problem → Config
|
||||
|
||||
| Problem | Encoding | RowMode | D1 | D2 | cross_row_prob |
|
||||
|---------|----------|---------|----|----|---------------|
|
||||
| TSP-50 | Perm | Single | 1 | 64 | 0 |
|
||||
| TSP-500 | Perm | Single | 1 | 512 | 0 |
|
||||
| QAP-15 | Perm | Single | 1 | 16 | 0 |
|
||||
| Assignment-12 | Perm | Single | 1 | 16 | 0 |
|
||||
| VRP-30-4v | Perm | Partition | 4 | 32 | 0.3 |
|
||||
| VRPTW-100-25v | Perm | Partition | 32 | 32 | 0.3 |
|
||||
| Knapsack-100 | Binary | Single | 1 | 128 | 0 |
|
||||
| Scheduling-20 | Binary | Single | 1 | 32 | 0 |
|
||||
| Graph Color-50 | Integer | Single | 1 | 64 | 0 |
|
||||
| JSP-6m-6j | Perm | Fixed | 8 | 8 | 0.2 |
|
||||
621
skills/cugenopt-problem-gen/reference/examples.md
Normal file
621
skills/cugenopt-problem-gen/reference/examples.md
Normal file
|
|
@ -0,0 +1,621 @@
|
|||
# End-to-End Examples
|
||||
|
||||
Four complete examples from natural language description to generated code.
|
||||
|
||||
---
|
||||
|
||||
## Example 1: 0-1 Knapsack (Low Complexity)
|
||||
|
||||
### User Input
|
||||
> "I have 8 items with weights [2,3,4,5,9,7,8,6] and values [3,4,5,8,10,7,9,6]. Knapsack capacity is 20. Maximize total value."
|
||||
|
||||
### Analysis
|
||||
- **Decision**: select or not → **Binary**
|
||||
- **RowMode**: Single (D1=1)
|
||||
- **D2**: next_pow2(8) = 8
|
||||
- **Objective**: Maximize total value
|
||||
- **Constraint**: total weight ≤ 20
|
||||
- **Complexity**: Low (standard knapsack, direct reference)
|
||||
|
||||
### Generated: problem.cuh
|
||||
|
||||
```cuda
|
||||
#pragma once
|
||||
#include "core/types.cuh"
|
||||
#include "core/cuda_utils.cuh"
|
||||
#include "core/operators.cuh"
|
||||
|
||||
struct Knapsack8 : ProblemBase<Knapsack8, 1, 8> {
|
||||
const float* d_weights;
|
||||
const float* d_values;
|
||||
float capacity;
|
||||
int n;
|
||||
|
||||
__device__ float calc_total_value(const Sol& sol) const {
|
||||
float tv = 0.0f;
|
||||
const int* sel = sol.data[0];
|
||||
for (int i = 0; i < n; i++)
|
||||
if (sel[i]) tv += d_values[i];
|
||||
return tv;
|
||||
}
|
||||
|
||||
static constexpr ObjDef OBJ_DEFS[] = {
|
||||
{ObjDir::Maximize, 1.0f, 0.0f},
|
||||
};
|
||||
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const {
|
||||
switch (idx) {
|
||||
case 0: return calc_total_value(sol);
|
||||
default: return 0.0f;
|
||||
}
|
||||
}
|
||||
|
||||
__device__ float compute_penalty(const Sol& sol) const {
|
||||
float tw = 0.0f;
|
||||
const int* sel = sol.data[0];
|
||||
for (int i = 0; i < n; i++)
|
||||
if (sel[i]) tw += d_weights[i];
|
||||
float over = tw - capacity;
|
||||
return (over > 0.0f) ? over * 50.0f : 0.0f;
|
||||
}
|
||||
|
||||
ProblemConfig config() const {
|
||||
ProblemConfig cfg;
|
||||
cfg.encoding = EncodingType::Binary;
|
||||
cfg.dim1 = 1;
|
||||
cfg.dim2_default = n;
|
||||
fill_obj_config(cfg);
|
||||
return cfg;
|
||||
}
|
||||
|
||||
size_t shared_mem_bytes() const {
|
||||
return 2 * (size_t)n * sizeof(float);
|
||||
}
|
||||
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
float* sw = reinterpret_cast<float*>(smem);
|
||||
float* sv = sw + n;
|
||||
for (int i = tid; i < n; i += bsz) {
|
||||
sw[i] = d_weights[i];
|
||||
sv[i] = d_values[i];
|
||||
}
|
||||
d_weights = sw;
|
||||
d_values = sv;
|
||||
}
|
||||
|
||||
static Knapsack8 create(const float* hw, const float* hv, int n, float cap) {
|
||||
Knapsack8 prob;
|
||||
prob.n = n;
|
||||
prob.capacity = cap;
|
||||
float *dw, *dv;
|
||||
CUDA_CHECK(cudaMalloc(&dw, sizeof(float) * n));
|
||||
CUDA_CHECK(cudaMalloc(&dv, sizeof(float) * n));
|
||||
CUDA_CHECK(cudaMemcpy(dw, hw, sizeof(float) * n, cudaMemcpyHostToDevice));
|
||||
CUDA_CHECK(cudaMemcpy(dv, hv, sizeof(float) * n, cudaMemcpyHostToDevice));
|
||||
prob.d_weights = dw;
|
||||
prob.d_values = dv;
|
||||
return prob;
|
||||
}
|
||||
|
||||
void destroy() {
|
||||
if (d_weights) cudaFree(const_cast<float*>(d_weights));
|
||||
if (d_values) cudaFree(const_cast<float*>(d_values));
|
||||
d_weights = nullptr;
|
||||
d_values = nullptr;
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Generated: main.cu
|
||||
|
||||
```cuda
|
||||
#include "core/solver.cuh"
|
||||
#include "problem.cuh"
|
||||
#include <cstdio>
|
||||
|
||||
int main() {
|
||||
const int n = 8;
|
||||
float weights[] = {2, 3, 4, 5, 9, 7, 8, 6};
|
||||
float values[] = {3, 4, 5, 8, 10, 7, 9, 6};
|
||||
float capacity = 20.0f;
|
||||
|
||||
auto prob = Knapsack8::create(weights, values, n, capacity);
|
||||
|
||||
SolverConfig scfg;
|
||||
scfg.time_limit_sec = 5.0f;
|
||||
scfg.use_aos = true;
|
||||
scfg.verbose = true;
|
||||
|
||||
auto result = solve(prob, scfg);
|
||||
|
||||
printf("Best value: %.2f\n", result.best_solution.objectives[0]);
|
||||
printf("Penalty: %.2f\n", result.best_solution.penalty);
|
||||
printf("Selected items: ");
|
||||
for (int i = 0; i < n; i++)
|
||||
if (result.best_solution.data[0][i]) printf("%d ", i);
|
||||
printf("\n");
|
||||
|
||||
prob.destroy();
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 2: Assignment Problem (Low Complexity)
|
||||
|
||||
### User Input
|
||||
> "Assign 10 workers to 10 tasks. Cost matrix is in a file `cost_10x10.txt`. Minimize total cost."
|
||||
|
||||
### Analysis
|
||||
- **Decision**: assign each worker to a unique task → **Permutation**
|
||||
- **RowMode**: Single (D1=1)
|
||||
- **D2**: next_pow2(10) = 16
|
||||
- **Objective**: Minimize total cost
|
||||
- **Constraint**: none (permutation encoding guarantees one-to-one)
|
||||
- **Data**: read from file
|
||||
- **Complexity**: Low (standard assignment)
|
||||
|
||||
### Generated: problem.cuh
|
||||
|
||||
```cuda
|
||||
#pragma once
|
||||
#include "core/types.cuh"
|
||||
#include "core/cuda_utils.cuh"
|
||||
#include "core/operators.cuh"
|
||||
|
||||
struct Assignment10 : ProblemBase<Assignment10, 1, 16> {
|
||||
const float* d_cost;
|
||||
int n;
|
||||
|
||||
__device__ float calc_total_cost(const Sol& sol) const {
|
||||
float total = 0.0f;
|
||||
const int* assign = sol.data[0];
|
||||
for (int i = 0; i < n; i++)
|
||||
total += d_cost[i * n + assign[i]];
|
||||
return total;
|
||||
}
|
||||
|
||||
static constexpr ObjDef OBJ_DEFS[] = {
|
||||
{ObjDir::Minimize, 1.0f, 0.0f},
|
||||
};
|
||||
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const {
|
||||
switch (idx) {
|
||||
case 0: return calc_total_cost(sol);
|
||||
default: return 0.0f;
|
||||
}
|
||||
}
|
||||
|
||||
__device__ float compute_penalty(const Sol& sol) const {
|
||||
return 0.0f;
|
||||
}
|
||||
|
||||
ProblemConfig config() const {
|
||||
ProblemConfig cfg;
|
||||
cfg.encoding = EncodingType::Permutation;
|
||||
cfg.dim1 = 1;
|
||||
cfg.dim2_default = n;
|
||||
fill_obj_config(cfg);
|
||||
return cfg;
|
||||
}
|
||||
|
||||
size_t shared_mem_bytes() const {
|
||||
return (size_t)n * n * sizeof(float);
|
||||
}
|
||||
|
||||
size_t working_set_bytes() const {
|
||||
return (size_t)n * n * sizeof(float);
|
||||
}
|
||||
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
float* sc = reinterpret_cast<float*>(smem);
|
||||
int total = n * n;
|
||||
for (int i = tid; i < total; i += bsz) sc[i] = d_cost[i];
|
||||
d_cost = sc;
|
||||
}
|
||||
|
||||
static Assignment10 create(const float* hc, int n) {
|
||||
Assignment10 prob;
|
||||
prob.n = n;
|
||||
float* dc;
|
||||
CUDA_CHECK(cudaMalloc(&dc, sizeof(float) * n * n));
|
||||
CUDA_CHECK(cudaMemcpy(dc, hc, sizeof(float) * n * n, cudaMemcpyHostToDevice));
|
||||
prob.d_cost = dc;
|
||||
return prob;
|
||||
}
|
||||
|
||||
void destroy() {
|
||||
if (d_cost) { cudaFree(const_cast<float*>(d_cost)); d_cost = nullptr; }
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Generated: main.cu
|
||||
|
||||
```cuda
|
||||
#include "core/solver.cuh"
|
||||
#include "problem.cuh"
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
|
||||
int main() {
|
||||
const int n = 10;
|
||||
float cost[n * n];
|
||||
|
||||
FILE* f = fopen("cost_10x10.txt", "r");
|
||||
if (!f) { fprintf(stderr, "Cannot open cost_10x10.txt\n"); return 1; }
|
||||
for (int i = 0; i < n * n; i++) fscanf(f, "%f", &cost[i]);
|
||||
fclose(f);
|
||||
|
||||
auto prob = Assignment10::create(cost, n);
|
||||
|
||||
SolverConfig scfg;
|
||||
scfg.time_limit_sec = 10.0f;
|
||||
scfg.use_aos = true;
|
||||
scfg.verbose = true;
|
||||
|
||||
auto result = solve(prob, scfg);
|
||||
|
||||
printf("Best cost: %.2f\n", result.best_solution.objectives[0]);
|
||||
printf("Assignment: ");
|
||||
for (int i = 0; i < n; i++)
|
||||
printf("worker %d → task %d ", i, result.best_solution.data[0][i]);
|
||||
printf("\n");
|
||||
|
||||
prob.destroy();
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 3: Vehicle Routing with Capacity (Medium Complexity)
|
||||
|
||||
### User Input
|
||||
> "I have 1 depot and 30 customers. 4 trucks, each with capacity 100. Customer coordinates and demands are in `customers.csv` (columns: id, x, y, demand). Minimize total travel distance."
|
||||
|
||||
### Analysis
|
||||
- **Decision**: assign customers to trucks and determine visit order → **Permutation**
|
||||
- **RowMode**: Partition (variable-length routes)
|
||||
- **D1**: next_pow2(4) = 4
|
||||
- **D2**: max(next_pow2(30/4*2), 64) = 64
|
||||
- **Objective**: Minimize total distance (depot → customers → depot for each truck)
|
||||
- **Constraint**: each truck's total demand ≤ 100
|
||||
- **Data**: CSV with coordinates → compute distance matrix
|
||||
- **Complexity**: Medium (custom constraint, Partition encoding)
|
||||
|
||||
### Logic Summary (for user confirmation)
|
||||
> "Objective: minimize total travel distance across all trucks. Each truck starts and ends at depot (id=0). Constraint: total demand per truck ≤ 100, penalty = 100 × excess. Encoding: Permutation with Partition, 4 trucks, 30 customers."
|
||||
|
||||
### Generated: problem.cuh
|
||||
|
||||
```cuda
|
||||
#pragma once
|
||||
#include "core/types.cuh"
|
||||
#include "core/cuda_utils.cuh"
|
||||
#include "core/operators.cuh"
|
||||
#include <cmath>
|
||||
|
||||
struct VRP30 : ProblemBase<VRP30, 4, 64> {
|
||||
const float* d_dist; // (n+1)×(n+1) distance matrix including depot
|
||||
const float* d_demand; // n customer demands
|
||||
int n; // number of customers (excluding depot)
|
||||
int stride; // n+1
|
||||
float capacity;
|
||||
int num_vehicles;
|
||||
|
||||
__device__ float compute_route_dist(const int* route, int size) const {
|
||||
if (size == 0) return 0.0f;
|
||||
float dist = 0.0f;
|
||||
int prev = 0; // depot
|
||||
for (int j = 0; j < size; j++) {
|
||||
int node = route[j] + 1; // customer indices are 0-based, node indices 1-based
|
||||
dist += d_dist[prev * stride + node];
|
||||
prev = node;
|
||||
}
|
||||
dist += d_dist[prev * stride + 0]; // return to depot
|
||||
return dist;
|
||||
}
|
||||
|
||||
__device__ float calc_total_distance(const Sol& sol) const {
|
||||
float total = 0.0f;
|
||||
for (int r = 0; r < num_vehicles; r++)
|
||||
total += compute_route_dist(sol.data[r], sol.dim2_sizes[r]);
|
||||
return total;
|
||||
}
|
||||
|
||||
static constexpr ObjDef OBJ_DEFS[] = {
|
||||
{ObjDir::Minimize, 1.0f, 0.0f},
|
||||
};
|
||||
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const {
|
||||
switch (idx) {
|
||||
case 0: return calc_total_distance(sol);
|
||||
default: return 0.0f;
|
||||
}
|
||||
}
|
||||
|
||||
__device__ float compute_penalty(const Sol& sol) const {
|
||||
float penalty = 0.0f;
|
||||
for (int r = 0; r < num_vehicles; r++) {
|
||||
float load = 0.0f;
|
||||
for (int j = 0; j < sol.dim2_sizes[r]; j++)
|
||||
load += d_demand[sol.data[r][j]];
|
||||
if (load > capacity)
|
||||
penalty += (load - capacity) * 100.0f;
|
||||
}
|
||||
return penalty;
|
||||
}
|
||||
|
||||
ProblemConfig config() const {
|
||||
ProblemConfig cfg;
|
||||
cfg.encoding = EncodingType::Permutation;
|
||||
cfg.dim1 = num_vehicles;
|
||||
cfg.dim2_default = 0;
|
||||
fill_obj_config(cfg);
|
||||
cfg.row_mode = RowMode::Partition;
|
||||
cfg.cross_row_prob = 0.3f;
|
||||
cfg.total_elements = n;
|
||||
return cfg;
|
||||
}
|
||||
|
||||
size_t shared_mem_bytes() const {
|
||||
return (size_t)stride * stride * sizeof(float) + (size_t)n * sizeof(float);
|
||||
}
|
||||
|
||||
size_t working_set_bytes() const {
|
||||
return (size_t)stride * stride * sizeof(float) + (size_t)n * sizeof(float);
|
||||
}
|
||||
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
float* sd = reinterpret_cast<float*>(smem);
|
||||
int dist_size = stride * stride;
|
||||
for (int i = tid; i < dist_size; i += bsz) sd[i] = d_dist[i];
|
||||
d_dist = sd;
|
||||
float* sdem = sd + dist_size;
|
||||
for (int i = tid; i < n; i += bsz) sdem[i] = d_demand[i];
|
||||
d_demand = sdem;
|
||||
}
|
||||
|
||||
static VRP30 create(const float* h_dist, const float* h_demand,
|
||||
int n, float capacity, int num_vehicles) {
|
||||
VRP30 prob;
|
||||
prob.n = n;
|
||||
prob.stride = n + 1;
|
||||
prob.capacity = capacity;
|
||||
prob.num_vehicles = num_vehicles;
|
||||
|
||||
int nodes = n + 1;
|
||||
float* dd;
|
||||
CUDA_CHECK(cudaMalloc(&dd, sizeof(float) * nodes * nodes));
|
||||
CUDA_CHECK(cudaMemcpy(dd, h_dist, sizeof(float) * nodes * nodes, cudaMemcpyHostToDevice));
|
||||
prob.d_dist = dd;
|
||||
|
||||
float* ddem;
|
||||
CUDA_CHECK(cudaMalloc(&ddem, sizeof(float) * n));
|
||||
CUDA_CHECK(cudaMemcpy(ddem, h_demand, sizeof(float) * n, cudaMemcpyHostToDevice));
|
||||
prob.d_demand = ddem;
|
||||
|
||||
return prob;
|
||||
}
|
||||
|
||||
void destroy() {
|
||||
if (d_dist) { cudaFree(const_cast<float*>(d_dist)); d_dist = nullptr; }
|
||||
if (d_demand) { cudaFree(const_cast<float*>(d_demand)); d_demand = nullptr; }
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Generated: main.cu
|
||||
|
||||
```cuda
|
||||
#include "core/solver.cuh"
|
||||
#include "problem.cuh"
|
||||
#include <cstdio>
|
||||
#include <cmath>
|
||||
|
||||
int main() {
|
||||
const int n = 30;
|
||||
const int num_vehicles = 4;
|
||||
const float capacity = 100.0f;
|
||||
|
||||
float x[n + 1], y[n + 1], demand[n];
|
||||
|
||||
FILE* f = fopen("customers.csv", "r");
|
||||
if (!f) { fprintf(stderr, "Cannot open customers.csv\n"); return 1; }
|
||||
|
||||
char header[256];
|
||||
fgets(header, sizeof(header), f); // skip header
|
||||
|
||||
// Read depot (id=0)
|
||||
int id;
|
||||
fscanf(f, "%d,%f,%f,%*f", &id, &x[0], &y[0]); // depot has no demand
|
||||
|
||||
// Read customers
|
||||
for (int i = 0; i < n; i++) {
|
||||
fscanf(f, "%d,%f,%f,%f", &id, &x[i + 1], &y[i + 1], &demand[i]);
|
||||
}
|
||||
fclose(f);
|
||||
|
||||
// Compute distance matrix
|
||||
int nodes = n + 1;
|
||||
float dist[nodes * nodes];
|
||||
for (int i = 0; i < nodes; i++)
|
||||
for (int j = 0; j < nodes; j++) {
|
||||
float dx = x[i] - x[j], dy = y[i] - y[j];
|
||||
dist[i * nodes + j] = sqrtf(dx * dx + dy * dy);
|
||||
}
|
||||
|
||||
auto prob = VRP30::create(dist, demand, n, capacity, num_vehicles);
|
||||
|
||||
SolverConfig scfg;
|
||||
scfg.time_limit_sec = 30.0f;
|
||||
scfg.use_aos = true;
|
||||
scfg.verbose = true;
|
||||
|
||||
auto result = solve(prob, scfg);
|
||||
|
||||
printf("Best distance: %.2f\n", result.best_solution.objectives[0]);
|
||||
printf("Penalty: %.2f\n", result.best_solution.penalty);
|
||||
for (int r = 0; r < num_vehicles; r++) {
|
||||
printf("Truck %d: depot", r);
|
||||
for (int j = 0; j < result.best_solution.dim2_sizes[r]; j++)
|
||||
printf(" → %d", result.best_solution.data[r][j] + 1);
|
||||
printf(" → depot\n");
|
||||
}
|
||||
|
||||
prob.destroy();
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example 4: Graph Coloring (Low Complexity)
|
||||
|
||||
### User Input
|
||||
> "Color a graph with 20 nodes using at most 4 colors. Edges: (0,1),(0,2),(1,3),(2,3),(3,4),... Minimize the number of colors used, with no two adjacent nodes sharing a color."
|
||||
|
||||
### Analysis
|
||||
- **Decision**: assign a color (0–3) to each node → **Integer**
|
||||
- **RowMode**: Single (D1=1)
|
||||
- **D2**: next_pow2(20) = 32
|
||||
- **Objective**: Minimize number of distinct colors used
|
||||
- **Constraint**: adjacent nodes must have different colors
|
||||
- **Complexity**: Low (standard graph coloring)
|
||||
|
||||
### Generated: problem.cuh
|
||||
|
||||
```cuda
|
||||
#pragma once
|
||||
#include "core/types.cuh"
|
||||
#include "core/cuda_utils.cuh"
|
||||
#include "core/operators.cuh"
|
||||
|
||||
struct GraphColor20 : ProblemBase<GraphColor20, 1, 32> {
|
||||
const int* d_adj; // adjacency matrix n×n (1=edge, 0=no edge)
|
||||
int n;
|
||||
int max_colors;
|
||||
|
||||
__device__ float calc_num_colors(const Sol& sol) const {
|
||||
int used[4] = {0, 0, 0, 0};
|
||||
const int* colors = sol.data[0];
|
||||
for (int i = 0; i < n; i++) {
|
||||
int c = colors[i];
|
||||
if (c >= 0 && c < max_colors) used[c] = 1;
|
||||
}
|
||||
float count = 0.0f;
|
||||
for (int c = 0; c < max_colors; c++) count += used[c];
|
||||
return count;
|
||||
}
|
||||
|
||||
static constexpr ObjDef OBJ_DEFS[] = {
|
||||
{ObjDir::Minimize, 1.0f, 0.0f},
|
||||
};
|
||||
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const {
|
||||
switch (idx) {
|
||||
case 0: return calc_num_colors(sol);
|
||||
default: return 0.0f;
|
||||
}
|
||||
}
|
||||
|
||||
__device__ float compute_penalty(const Sol& sol) const {
|
||||
float conflicts = 0.0f;
|
||||
const int* colors = sol.data[0];
|
||||
for (int i = 0; i < n; i++)
|
||||
for (int j = i + 1; j < n; j++)
|
||||
if (d_adj[i * n + j] && colors[i] == colors[j])
|
||||
conflicts += 1.0f;
|
||||
return conflicts * 10.0f;
|
||||
}
|
||||
|
||||
ProblemConfig config() const {
|
||||
ProblemConfig cfg;
|
||||
cfg.encoding = EncodingType::Integer;
|
||||
cfg.dim1 = 1;
|
||||
cfg.dim2_default = n;
|
||||
cfg.value_lower_bound = 0;
|
||||
cfg.value_upper_bound = max_colors - 1;
|
||||
fill_obj_config(cfg);
|
||||
return cfg;
|
||||
}
|
||||
|
||||
size_t shared_mem_bytes() const {
|
||||
return (size_t)n * n * sizeof(int);
|
||||
}
|
||||
|
||||
size_t working_set_bytes() const {
|
||||
return (size_t)n * n * sizeof(int);
|
||||
}
|
||||
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
int* sa = reinterpret_cast<int*>(smem);
|
||||
int total = n * n;
|
||||
for (int i = tid; i < total; i += bsz) sa[i] = d_adj[i];
|
||||
d_adj = sa;
|
||||
}
|
||||
|
||||
static GraphColor20 create(const int* h_adj, int n, int max_colors) {
|
||||
GraphColor20 prob;
|
||||
prob.n = n;
|
||||
prob.max_colors = max_colors;
|
||||
int* da;
|
||||
CUDA_CHECK(cudaMalloc(&da, sizeof(int) * n * n));
|
||||
CUDA_CHECK(cudaMemcpy(da, h_adj, sizeof(int) * n * n, cudaMemcpyHostToDevice));
|
||||
prob.d_adj = da;
|
||||
return prob;
|
||||
}
|
||||
|
||||
void destroy() {
|
||||
if (d_adj) { cudaFree(const_cast<int*>(d_adj)); d_adj = nullptr; }
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### Generated: main.cu
|
||||
|
||||
```cuda
|
||||
#include "core/solver.cuh"
|
||||
#include "problem.cuh"
|
||||
#include <cstdio>
|
||||
|
||||
int main() {
|
||||
const int n = 20;
|
||||
const int max_colors = 4;
|
||||
|
||||
int adj[n * n] = {0};
|
||||
// Define edges
|
||||
int edges[][2] = {{0,1},{0,2},{1,3},{2,3},{3,4},
|
||||
{4,5},{5,6},{6,7},{7,8},{8,9},
|
||||
{9,10},{10,11},{11,12},{12,13},{13,14},
|
||||
{14,15},{15,16},{16,17},{17,18},{18,19},
|
||||
{0,19},{1,4},{2,5},{6,9},{7,10}};
|
||||
int num_edges = sizeof(edges) / sizeof(edges[0]);
|
||||
for (int e = 0; e < num_edges; e++) {
|
||||
int u = edges[e][0], v = edges[e][1];
|
||||
adj[u * n + v] = 1;
|
||||
adj[v * n + u] = 1;
|
||||
}
|
||||
|
||||
auto prob = GraphColor20::create(adj, n, max_colors);
|
||||
|
||||
SolverConfig scfg;
|
||||
scfg.time_limit_sec = 10.0f;
|
||||
scfg.use_aos = true;
|
||||
scfg.verbose = true;
|
||||
|
||||
auto result = solve(prob, scfg);
|
||||
|
||||
printf("Colors used: %.0f\n", result.best_solution.objectives[0]);
|
||||
printf("Conflicts (penalty): %.2f\n", result.best_solution.penalty);
|
||||
printf("Coloring: ");
|
||||
for (int i = 0; i < n; i++)
|
||||
printf("node%d=%d ", i, result.best_solution.data[0][i]);
|
||||
printf("\n");
|
||||
|
||||
prob.destroy();
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
280
skills/cugenopt-problem-gen/reference/problem-api.md
Normal file
280
skills/cugenopt-problem-gen/reference/problem-api.md
Normal file
|
|
@ -0,0 +1,280 @@
|
|||
# ProblemBase API Reference
|
||||
|
||||
Complete interface specification for `ProblemBase<Derived, D1, D2>` (defined in `core/types.cuh`).
|
||||
|
||||
## Template Parameters
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `Derived` | struct | The concrete problem type (CRTP pattern) |
|
||||
| `D1` | int | Maximum number of rows (compile-time constant, power of 2 recommended) |
|
||||
| `D2` | int | Maximum columns per row (compile-time constant, power of 2 recommended) |
|
||||
|
||||
The base class provides:
|
||||
- `using Sol = Solution<D1, D2>;` — the solution type
|
||||
- `static constexpr int NUM_OBJ` — auto-derived from `Derived::OBJ_DEFS`
|
||||
- `evaluate(Sol&)` — calls `compute_obj` for each objective + `compute_penalty`
|
||||
- `fill_obj_config(ProblemConfig&)` — populates objective fields from `OBJ_DEFS`
|
||||
- `obj_config()` — returns `ObjConfig` for the solver
|
||||
|
||||
## Required Interface
|
||||
|
||||
### 1. `OBJ_DEFS` — Objective Definitions (static constexpr)
|
||||
|
||||
```cuda
|
||||
static constexpr ObjDef OBJ_DEFS[] = {
|
||||
{ObjDir::Minimize, 1.0f, 0.0f}, // index 0
|
||||
// {ObjDir::Maximize, 0.5f, 0.0f}, // index 1 (multi-objective)
|
||||
};
|
||||
```
|
||||
|
||||
Each `ObjDef`:
|
||||
- `dir`: `ObjDir::Minimize` or `ObjDir::Maximize`
|
||||
- `weight`: importance weight for `CompareMode::Weighted` (default mode)
|
||||
- `tolerance`: tolerance for `CompareMode::Lexicographic`
|
||||
|
||||
Most problems have a single objective. Multi-objective (up to 4) is supported.
|
||||
|
||||
### 2. `compute_obj` — Objective Calculation
|
||||
|
||||
```cuda
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const;
|
||||
```
|
||||
|
||||
- Runs on GPU (`__device__`)
|
||||
- `idx` corresponds to `OBJ_DEFS[idx]`
|
||||
- Use a `switch` statement dispatching to helper functions
|
||||
- Access solution data via `sol.data[row][col]` and `sol.dim2_sizes[row]`
|
||||
|
||||
**Pattern**:
|
||||
```cuda
|
||||
__device__ float compute_obj(int idx, const Sol& sol) const {
|
||||
switch (idx) {
|
||||
case 0: return calc_total_cost(sol);
|
||||
default: return 0.0f;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. `compute_penalty` — Constraint Violation
|
||||
|
||||
```cuda
|
||||
__device__ float compute_penalty(const Sol& sol) const;
|
||||
```
|
||||
|
||||
- Returns `0.0f` for feasible solutions
|
||||
- Returns a positive value proportional to violation magnitude for infeasible solutions
|
||||
- The solver always prefers feasible solutions (penalty=0) over infeasible ones
|
||||
- For multiple constraints, sum all violations
|
||||
|
||||
**Guidelines**:
|
||||
- Scale penalty to be comparable to objective magnitude
|
||||
- Example: capacity overflow → `(excess_load) * 100.0f`
|
||||
- Example: vehicle count exceeded → `(excess_vehicles) * 1000.0f`
|
||||
|
||||
### 4. `config` — Problem Configuration
|
||||
|
||||
```cuda
|
||||
ProblemConfig config() const;
|
||||
```
|
||||
|
||||
Returns runtime metadata. Must set:
|
||||
|
||||
```cuda
|
||||
ProblemConfig config() const {
|
||||
ProblemConfig cfg;
|
||||
cfg.encoding = EncodingType::Permutation; // or Binary, Integer
|
||||
cfg.dim1 = /* actual rows used */;
|
||||
cfg.dim2_default = /* actual columns */;
|
||||
fill_obj_config(cfg); // auto-fills objectives from OBJ_DEFS
|
||||
|
||||
// Multi-row problems:
|
||||
// cfg.row_mode = RowMode::Fixed; // equal-length rows
|
||||
// cfg.row_mode = RowMode::Partition; // variable-length rows
|
||||
// cfg.cross_row_prob = 0.3f; // cross-row operator probability
|
||||
// cfg.total_elements = n; // Partition: total elements across all rows
|
||||
|
||||
// Integer encoding:
|
||||
// cfg.value_lower_bound = 0;
|
||||
// cfg.value_upper_bound = num_colors - 1;
|
||||
|
||||
return cfg;
|
||||
}
|
||||
```
|
||||
|
||||
### 5. `create` / `destroy` — Factory Methods
|
||||
|
||||
```cuda
|
||||
static MyProblem create(/* host-side data */) {
|
||||
MyProblem prob;
|
||||
prob.n = n;
|
||||
// Allocate GPU memory and copy data
|
||||
float* d_ptr;
|
||||
CUDA_CHECK(cudaMalloc(&d_ptr, sizeof(float) * n * n));
|
||||
CUDA_CHECK(cudaMemcpy(d_ptr, h_ptr, sizeof(float) * n * n, cudaMemcpyHostToDevice));
|
||||
prob.d_data = d_ptr;
|
||||
return prob;
|
||||
}
|
||||
|
||||
void destroy() {
|
||||
if (d_data) { cudaFree(const_cast<float*>(d_data)); d_data = nullptr; }
|
||||
}
|
||||
```
|
||||
|
||||
**Rules**:
|
||||
- All GPU memory allocated in `create()`, freed in `destroy()`
|
||||
- Use `CUDA_CHECK()` for every CUDA API call
|
||||
- Store both `d_` (device) and optionally `h_` (host) pointers
|
||||
- `const_cast` needed in `destroy()` because pointers are `const float*`
|
||||
|
||||
## Optional Interface
|
||||
|
||||
### 6. `shared_mem_bytes` — Shared Memory Requirement
|
||||
|
||||
```cuda
|
||||
size_t shared_mem_bytes() const;
|
||||
```
|
||||
|
||||
- Returns total bytes of problem data to cache in shared memory
|
||||
- Return the **actual** data size; the framework handles overflow:
|
||||
- ≤ 48KB: fits default shared memory
|
||||
- 48KB–164KB: framework calls `cudaFuncSetAttribute` to extend (GPU-dependent)
|
||||
- Too large: framework falls back to global memory automatically
|
||||
- Default (from base class): returns 0
|
||||
|
||||
**Example** (distance matrix):
|
||||
```cuda
|
||||
size_t shared_mem_bytes() const {
|
||||
return (size_t)n * n * sizeof(float); // report actual need
|
||||
}
|
||||
```
|
||||
|
||||
### 7. `working_set_bytes` — Global Memory Working Set
|
||||
|
||||
```cuda
|
||||
size_t working_set_bytes() const;
|
||||
```
|
||||
|
||||
- Returns the per-block hot data size in global memory
|
||||
- Used by the framework to estimate L2 cache pressure and auto-size population
|
||||
- Default: returns `shared_mem_bytes()`
|
||||
- **Override when** `shared_mem_bytes()` returns 0 (data doesn't fit in shared memory) — return the actual data size so population sizing works correctly
|
||||
|
||||
**Example**:
|
||||
```cuda
|
||||
size_t working_set_bytes() const {
|
||||
return (size_t)n * n * sizeof(float) + (size_t)n * sizeof(float);
|
||||
}
|
||||
```
|
||||
|
||||
### 8. `load_shared` — Load Data into Shared Memory
|
||||
|
||||
```cuda
|
||||
__device__ void load_shared(char* smem, int tid, int bsz);
|
||||
```
|
||||
|
||||
- Called by framework when `shared_mem_bytes() > 0`
|
||||
- Copy data from global memory to shared memory using cooperative loading
|
||||
- **Redirect the device pointer** to shared memory after loading
|
||||
|
||||
**Pattern**:
|
||||
```cuda
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
float* s_data = reinterpret_cast<float*>(smem);
|
||||
int total = n * n;
|
||||
for (int i = tid; i < total; i += bsz)
|
||||
s_data[i] = d_data[i];
|
||||
d_data = s_data; // redirect pointer to shared memory
|
||||
}
|
||||
```
|
||||
|
||||
For multiple arrays, lay them out sequentially in `smem`:
|
||||
```cuda
|
||||
__device__ void load_shared(char* smem, int tid, int bsz) {
|
||||
float* s_dist = reinterpret_cast<float*>(smem);
|
||||
int dist_size = stride * stride;
|
||||
for (int i = tid; i < dist_size; i += bsz) s_dist[i] = d_dist[i];
|
||||
d_dist = s_dist;
|
||||
|
||||
float* s_demand = s_dist + dist_size;
|
||||
for (int i = tid; i < n; i += bsz) s_demand[i] = d_demand[i];
|
||||
d_demand = s_demand;
|
||||
}
|
||||
```
|
||||
|
||||
### 9. `heuristic_matrices` — Data for Heuristic Initialization
|
||||
|
||||
```cuda
|
||||
int heuristic_matrices(HeuristicMatrix* out, int max_count) const;
|
||||
```
|
||||
|
||||
- Returns host-side matrices for constructing heuristic initial solutions
|
||||
- The framework sorts elements by row/column sums to generate better-than-random starting points
|
||||
- Return value: number of matrices provided (0 = no heuristic init)
|
||||
|
||||
**Example** (distance matrix for TSP):
|
||||
```cuda
|
||||
int heuristic_matrices(HeuristicMatrix* out, int max_count) const {
|
||||
if (max_count < 1 || !h_dist) return 0;
|
||||
out[0] = {h_dist, n};
|
||||
return 1;
|
||||
}
|
||||
```
|
||||
|
||||
### 10. `init_relation_matrix` — G/O Matrix for Guided Rebuild
|
||||
|
||||
```cuda
|
||||
void init_relation_matrix(float* h_G, float* h_O, int N) const;
|
||||
```
|
||||
|
||||
- Provides prior knowledge for the LNS guided rebuild operator
|
||||
- `G[i*N+j]`: grouping tendency (symmetric, higher = more likely in same group)
|
||||
- `O[i*N+j]`: ordering tendency (asymmetric, higher = i before j)
|
||||
- Values in [0, 1], typically scaled from problem data (e.g., distance proximity)
|
||||
- Default: does nothing (matrices stay zero, learned from search history)
|
||||
|
||||
## Solution Data Access
|
||||
|
||||
```cuda
|
||||
sol.data[row][col] // element value at (row, col)
|
||||
sol.dim2_sizes[row] // actual length of row (may be < D2)
|
||||
sol.objectives[idx] // objective value (set by evaluate())
|
||||
sol.penalty // penalty value (set by evaluate())
|
||||
```
|
||||
|
||||
- **Permutation (Single)**: `sol.data[0][0..n-1]` contains a permutation of `0..n-1`
|
||||
- **Permutation (Partition)**: `sol.data[r][0..sol.dim2_sizes[r]-1]` for each route/partition
|
||||
- **Binary**: `sol.data[0][i]` is 0 or 1
|
||||
- **Integer**: `sol.data[0][i]` is in `[value_lower_bound, value_upper_bound]`
|
||||
|
||||
## Key Types Reference
|
||||
|
||||
```cuda
|
||||
enum class EncodingType { Permutation, Binary, Integer };
|
||||
enum class RowMode { Single, Fixed, Partition };
|
||||
enum class ObjDir { Minimize, Maximize };
|
||||
enum class CompareMode { Weighted, Lexicographic };
|
||||
|
||||
struct ObjDef { ObjDir dir; float weight; float tolerance; };
|
||||
struct HeuristicMatrix { const float* data; int N; };
|
||||
|
||||
struct ProblemConfig {
|
||||
EncodingType encoding;
|
||||
int dim1, dim2_default, num_objectives;
|
||||
ObjDir obj_dirs[4]; float obj_weights[4];
|
||||
CompareMode compare_mode;
|
||||
RowMode row_mode;
|
||||
float cross_row_prob;
|
||||
int total_elements;
|
||||
int value_lower_bound, value_upper_bound;
|
||||
};
|
||||
|
||||
struct SolverConfig {
|
||||
int pop_size; // 0 = auto
|
||||
int max_gen; // max generations
|
||||
float time_limit_sec; // 0 = no limit
|
||||
bool use_aos; // adaptive operator selection
|
||||
bool verbose;
|
||||
unsigned seed;
|
||||
};
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue