cuGenOpt/skills/cugenopt-problem-gen/SKILL.md
2026-03-20 00:33:45 +08:00

11 KiB
Raw Blame History

name description
cugenopt-problem-gen Generate cuGenOpt GPU-accelerated optimization Problem definitions from natural language descriptions. Use when the user describes a combinatorial optimization problem (TSP, VRP, knapsack, scheduling, assignment, graph coloring, bin packing, etc.) and wants to solve it with cuGenOpt, or asks to generate, create, model, or run an optimization problem on GPU.

cuGenOpt Problem Generator

Generate cuGenOpt Problem definitions from natural language descriptions. Users describe their optimization problem in plain language; this Skill produces compilable .cuh + main.cu files and optionally compiles/runs them.

When to Use

Activate this Skill when the user:

  • Describes a combinatorial optimization problem and wants to solve it with cuGenOpt
  • Asks to "generate", "create", "write", or "define" a cuGenOpt Problem
  • Asks to "solve", "run", or "try" an optimization problem on GPU
  • Mentions problem types like TSP, VRP, knapsack, scheduling, assignment, bin packing, graph coloring, etc.
  • Asks "how to express / model [problem] in cuGenOpt"

Prerequisites

The cuGenOpt framework source must be accessible. Locate it by searching for prototype/core/solver.cuh:

FRAMEWORK_ROOT = directory containing prototype/core/solver.cuh

If not found, ask the user for the framework location.

Workflow

Step 0: Determine Execution Depth

Infer from the user's wording how far to go:

Signal Intent Depth
"generate" / "write" / "define" Code only Phase 1 → 2
"solve" / "run" / "try" / "see results" Full run Phase 1 → 2 → 3
Provides data (CSV, array, file path) Full run Phase 1 → 2 → 3
"model" / "express" / "how to" Code + explanation Phase 1 → 2
Ambiguous Code, then ask Phase 1 → 2, ask

Principle: when in doubt, ask rather than silently consuming GPU.

Step 1: Analyze the Problem (Phase 1)

Extract from the user's description:

  1. Decision variables — what is being decided?
  2. Encoding type — use the decision tree below
  3. Dimensions — D1, D2, dim1, dim2_default, total_elements
  4. Objective(s) — minimize/maximize what?
  5. Constraints — what makes a solution infeasible?
  6. Data — what input data is needed?

Encoding Decision Tree

What are the decision variables?
├── Ordering / assignment (no repeats) → Permutation
│   ├── Single sequence (TSP, QAP, assignment) → RowMode::Single, D1=1
│   ├── Multiple equal-length sequences (JSP) → RowMode::Fixed, D1=num_machines
│   └── Variable-length partitions (VRP, VRPTW) → RowMode::Partition, D1=max_vehicles
├── Select / not-select (0-1) → Binary
│   └── Knapsack, scheduling, subset selection → RowMode::Single, D1=1
└── Bounded integer (multi-level) → Integer
    ├── Single row (graph coloring, load balance) → RowMode::Single, D1=1
    └── Multiple equal rows (multi-machine) → RowMode::Fixed, D1=num_machines

Dimension Calculation

Parameter Rule
D1 Template max rows. Single=1; VRP=max_vehicles; JSP=num_machines. Round up to power of 2 if > 1.
D2 Template max columns. Single=num_elements; Partition=max(num_elements/D1*2, 64). Round up to power of 2.
dim1 Actual rows at runtime. ≤ D1.
dim2_default Single/Fixed: num_elements; Partition: 0 (framework allocates).
total_elements Partition only: total number of elements to distribute across rows.

Confirmation Strategy

Complexity Criteria Action
Low Standard problem with direct reference (TSP, knapsack, assignment) Proceed without pause
Medium Custom constraints (prioritized VRP, skill-matched scheduling) Generate code, then summarize logic in natural language for user confirmation
High Multi-objective, ambiguous description, non-standard encoding Output problem spec first for confirmation, then generate code

For medium/high complexity, summarize like:

"Objective: minimize total distance. Constraints: each vehicle ≤ 100 capacity, penalty = 100 × excess. Encoding: Permutation with Partition (5 vehicles, 30 customers)."

The user confirms the logic summary, not the code.

Step 2: Generate Code (Phase 2)

Generate two files in the user's chosen directory (default: a new folder under the workspace):

problem.cuh

#pragma once
#include "core/types.cuh"
#include "core/cuda_utils.cuh"
#include "core/operators.cuh"

struct MyProblem : ProblemBase<MyProblem, D1, D2> {
    // GPU data pointers
    const float* d_data;
    int n;

    // === Objective definition ===
    static constexpr ObjDef OBJ_DEFS[] = {
        {ObjDir::Minimize, 1.0f, 0.0f},
    };

    __device__ float compute_obj(int idx, const Sol& sol) const {
        switch (idx) {
            case 0: return /* objective calculation */;
            default: return 0.0f;
        }
    }

    __device__ float compute_penalty(const Sol& sol) const {
        return /* 0.0f if no constraints, else penalty value */;
    }

    ProblemConfig config() const {
        ProblemConfig cfg;
        cfg.encoding = EncodingType::/* Permutation | Binary | Integer */;
        cfg.dim1 = /* actual rows */;
        cfg.dim2_default = /* actual columns */;
        fill_obj_config(cfg);
        // For multi-row:
        // cfg.row_mode = RowMode::/* Fixed | Partition */;
        // cfg.cross_row_prob = 0.3f;
        // cfg.total_elements = n;  // Partition only
        return cfg;
    }

    // === Shared memory (when data fits) ===
    size_t shared_mem_bytes() const {
        return /* total bytes of data to cache, or 0 if too large */;
    }

    size_t working_set_bytes() const {
        return /* actual data size in bytes (for L2 cache estimation) */;
    }

    __device__ void load_shared(char* smem, int tid, int bsz) {
        // Copy d_data into shared memory, then redirect pointer
    }

    // === Factory ===
    static MyProblem create(/* host data params */) {
        MyProblem prob;
        // cudaMalloc + cudaMemcpy for each data array
        return prob;
    }

    void destroy() {
        // cudaFree each GPU pointer
    }
};

main.cu

#include "core/solver.cuh"
#include "problem.cuh"
#include <cstdio>

int main() {
    // 1. Prepare data (hardcoded or read from file)

    // 2. Create problem
    auto prob = MyProblem::create(/* ... */);

    // 3. Configure solver
    SolverConfig scfg;
    scfg.time_limit_sec = 30.0f;
    scfg.use_aos = true;
    scfg.verbose = true;

    // 4. Solve
    auto result = solve(prob, scfg);

    // 5. Print results
    printf("Best objective: %.4f\n", result.best_solution.objectives[0]);
    printf("Penalty: %.4f\n", result.best_solution.penalty);
    printf("Generations: %d, Time: %.1f ms\n", result.generations, result.elapsed_ms);

    // 6. Print solution details
    // ...

    prob.destroy();
    return 0;
}

Code generation rules:

  • Read reference/problem-api.md for the full ProblemBase interface specification
  • Read reference/encoding-guide.md for encoding selection and dimension calculation details
  • Read reference/examples.md for end-to-end examples of natural language → code
  • shared_mem_bytes() should return the actual data size; the framework handles overflow automatically via cudaFuncSetAttribute
  • working_set_bytes() should always return the actual data size (used for population sizing)
  • For Partition encoding, set cfg.dim2_default = 0 and cfg.total_elements = n
  • Use CUDA_CHECK() macro for all CUDA API calls
  • Penalty should be proportional to constraint violation magnitude

Step 3: Validate & Run (Phase 3)

Only enter this phase if execution depth requires it.

3a. Environment Detection

1. Run: nvidia-smi
   ├── Success → local/direct mode, go to 3b
   └── Failure → no local GPU
       └── Ask: "No GPU detected locally. Do you have a remote GPU server?"
           ├── User provides SSH info → remote mode
           │   - scp files to remote
           │   - Run all commands via ssh <host> "..."
           ├── User says "I use Cursor Remote SSH" → suggest switching to remote window
           └── No GPU → deliver code only, print compile command

2. Detect compute capability:
   nvidia-smi --query-gpu=compute_cap --format=csv,noheader → sm_XX

3b. Compile

nvcc -O2 -std=c++17 --extended-lambda \
     -arch=sm_XX \
     -I FRAMEWORK_ROOT/prototype \
     -I FRAMEWORK_ROOT/prototype/core \
     main.cu -o solve

The two -I paths are both needed: prototype/ for #include "core/solver.cuh" in user code, and prototype/core/ for #include "types.cuh" inside framework headers.

If nvcc not found:

  • Check ls /usr/local/cuda*/bin/nvcc — if found, fix PATH
  • Otherwise ask: "nvcc not found. Want me to help configure CUDA?"
    • Check and guide: CUDA Toolkit install, g++ install, driver compatibility

If compilation fails: read error, fix code, retry (up to 3 attempts).

3c. Run & Verify

./solve

Verification checklist:

Check Method Pass
No crash exit code 0
penalty = 0 check output Feasible solution
Objective reasonable compare to known/estimated optimum Within expected range

If penalty > 0: report constraint violations, suggest adjusting penalty weights or constraint logic.

3d. Remote Mode

When using a remote GPU server:

  1. Check if framework exists on remote: ssh <host> "ls <path>/prototype/core/solver.cuh"
    • Not found → scp -r FRAMEWORK_ROOT/prototype <host>:<remote_path>/
  2. Upload generated files: scp problem.cuh main.cu <host>:<remote_path>/
  3. Compile and run via SSH: ssh <host> "cd <remote_path> && nvcc ... && ./solve"
  4. Capture stdout for results

Common Errors

Error Cause Fix
nvcc: command not found CUDA not installed or not in PATH Trigger environment setup
no supported gcc/g++ host compiler g++ version incompatible Install compatible version
D2 too small Elements exceed template parameter Increase D2 (power of 2)
penalty always > 0 Constraints too tight or penalty logic wrong Review compute_penalty
Objective is 0 or NaN Data not loaded to GPU Check cudaMalloc/cudaMemcpy in create()
Very low gens/s Data matrix too large for shared memory Check shared_mem_bytes / working_set_bytes

Reference Files

For detailed specifications, read these files in the reference/ directory alongside this SKILL.md:

  • problem-api.md — Complete ProblemBase interface: every method, its signature, when to implement it, and gotchas
  • encoding-guide.md — Encoding type selection, dimension calculation rules, RowMode details, shared memory sizing
  • examples.md — 4 end-to-end examples: natural language input → analysis → generated code