nomyo-js/doc/rate-limits.md
2026-04-16 16:44:26 +02:00

3.6 KiB

Rate Limits

The NOMYO API (api.nomyo.ai) enforces rate limits to ensure fair usage and service stability for all users.

Default Rate Limit

By default, each API key is limited to 2 requests per second.

Burst Allowance

Short bursts above the default limit are permitted. You may send up to 4 requests per second in burst mode, provided you have not exceeded burst usage within the current 10-second window.

Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.

Rate Limit Summary

Mode Limit Condition
Default 2 requests/second Always active
Burst 4 requests/second Once per 10-second window

Error Responses

429 Too Many Requests

Returned when your request rate exceeds the allowed limit.

The client retries automatically (see below). If all retries are exhausted, RateLimitError is thrown:

import { SecureChatCompletion, RateLimitError } from 'nomyo-js';

try {
  const response = await client.create({ ... });
} catch (err) {
  if (err instanceof RateLimitError) {
    // All retries exhausted — back off manually before trying again
    console.error('Rate limit exceeded:', err.message);
  }
}

503 Service Unavailable (Cool-down)

Returned when burst limits are abused repeatedly. A 30-minute cool-down is applied to the offending API key.

What to do: Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.

Automatic Retry Behaviour

The client retries automatically on 429, 500, 502, 503, 504, and network errors using exponential backoff:

Attempt Delay before attempt
1st (initial)
2nd 1 second
3rd 2 seconds

The default is 2 retries (3 total attempts). Adjust per client:

// More retries for high-throughput workloads
const client = new SecureChatCompletion({
  apiKey: process.env.NOMYO_API_KEY,
  maxRetries: 5,
});

// Disable retries entirely (fail fast)
const client2 = new SecureChatCompletion({
  apiKey: process.env.NOMYO_API_KEY,
  maxRetries: 0,
});

Best Practices

  • Throttle requests client-side to stay at or below 2 requests/second under normal load.
  • Use burst sparingly — it is intended for occasional spikes, not sustained high-throughput usage.
  • Increase maxRetries for background jobs that can tolerate extra latency.
  • Monitor for 503 responses — repeated occurrences indicate your usage pattern is triggering the abuse threshold.
  • Parallel requests (e.g. Promise.all) count against the same rate limit — be careful with large batches.

Batch Processing Example

Throttle parallel requests to stay within the rate limit:

import { SecureChatCompletion } from 'nomyo-js';

const client = new SecureChatCompletion({ apiKey: process.env.NOMYO_API_KEY });

async function throttledBatch(queries, requestsPerSecond = 2) {
  const results = [];
  const delayMs = 1000 / requestsPerSecond;

  for (const query of queries) {
    const start = Date.now();

    const response = await client.create({
      model: 'Qwen/Qwen3-0.6B',
      messages: [{ role: 'user', content: query }],
    });
    results.push(response.choices[0].message.content);

    // Throttle: wait for the remainder of the time slot
    const elapsed = Date.now() - start;
    if (elapsed < delayMs) {
      await new Promise(resolve => setTimeout(resolve, delayMs - elapsed));
    }
  }

  client.dispose();
  return results;
}