3.6 KiB
Rate Limits
The NOMYO API (api.nomyo.ai) enforces rate limits to ensure fair usage and service stability for all users.
Default Rate Limit
By default, each API key is limited to 2 requests per second.
Burst Allowance
Short bursts above the default limit are permitted. You may send up to 4 requests per second in burst mode, provided you have not exceeded burst usage within the current 10-second window.
Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.
Rate Limit Summary
| Mode | Limit | Condition |
|---|---|---|
| Default | 2 requests/second | Always active |
| Burst | 4 requests/second | Once per 10-second window |
Error Responses
429 Too Many Requests
Returned when your request rate exceeds the allowed limit.
The client retries automatically (see below). If all retries are exhausted, RateLimitError is thrown:
import { SecureChatCompletion, RateLimitError } from 'nomyo-js';
try {
const response = await client.create({ ... });
} catch (err) {
if (err instanceof RateLimitError) {
// All retries exhausted — back off manually before trying again
console.error('Rate limit exceeded:', err.message);
}
}
503 Service Unavailable (Cool-down)
Returned when burst limits are abused repeatedly. A 30-minute cool-down is applied to the offending API key.
What to do: Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.
Automatic Retry Behaviour
The client retries automatically on 429, 500, 502, 503, 504, and network errors using exponential backoff:
| Attempt | Delay before attempt |
|---|---|
| 1st (initial) | — |
| 2nd | 1 second |
| 3rd | 2 seconds |
The default is 2 retries (3 total attempts). Adjust per client:
// More retries for high-throughput workloads
const client = new SecureChatCompletion({
apiKey: process.env.NOMYO_API_KEY,
maxRetries: 5,
});
// Disable retries entirely (fail fast)
const client2 = new SecureChatCompletion({
apiKey: process.env.NOMYO_API_KEY,
maxRetries: 0,
});
Best Practices
- Throttle requests client-side to stay at or below 2 requests/second under normal load.
- Use burst sparingly — it is intended for occasional spikes, not sustained high-throughput usage.
- Increase
maxRetriesfor background jobs that can tolerate extra latency. - Monitor for
503responses — repeated occurrences indicate your usage pattern is triggering the abuse threshold. - Parallel requests (e.g.
Promise.all) count against the same rate limit — be careful with large batches.
Batch Processing Example
Throttle parallel requests to stay within the rate limit:
import { SecureChatCompletion } from 'nomyo-js';
const client = new SecureChatCompletion({ apiKey: process.env.NOMYO_API_KEY });
async function throttledBatch(queries, requestsPerSecond = 2) {
const results = [];
const delayMs = 1000 / requestsPerSecond;
for (const query of queries) {
const start = Date.now();
const response = await client.create({
model: 'Qwen/Qwen3-0.6B',
messages: [{ role: 'user', content: query }],
});
results.push(response.choices[0].message.content);
// Throttle: wait for the remainder of the time slot
const elapsed = Date.now() - start;
if (elapsed < delayMs) {
await new Promise(resolve => setTimeout(resolve, delayMs - elapsed));
}
}
client.dispose();
return results;
}