2.5 KiB
Rate Limits
The NOMYO API (api.nomyo.ai) enforces rate limits to ensure fair usage and service stability for all users.
Default Rate Limit
By default, each API key is limited to 2 requests per second.
Burst Allowance
Short bursts above the default limit are permitted. You may send up to 4 requests per second in burst mode, provided you have not exceeded burst usage within the current 10-second window.
Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.
Rate Limit Summary
| Mode | Limit | Condition |
|---|---|---|
| Default | 2 requests/second | Always active |
| Burst | 4 requests/second | Once per 10-second window |
Error Responses
429 Too Many Requests
Returned when your request rate exceeds the allowed limit.
HTTP/1.1 429 Too Many Requests
What to do: Back off and retry after a short delay. Implement exponential backoff in your client to avoid repeated limit hits.
503 Service Unavailable (Cool-down)
Returned when burst limits are abused repeatedly. A 30-minute cool-down is applied to the offending API key.
HTTP/1.1 503 Service Unavailable
What to do: Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.
Best Practices
- Throttle your requests client-side to stay at or below 2 requests/second under normal load.
- Use burst sparingly — it is intended for occasional spikes, not sustained high-throughput usage.
- Implement exponential backoff when you receive a
429response. Start with a short delay (e.g. 500 ms) and double it on each subsequent failure, up to a reasonable maximum. - Monitor for
503responses — repeated occurrences indicate that your usage pattern is triggering the abuse threshold. Refactor your request logic before the cool-down expires.
Example: Exponential Backoff
import asyncio
import httpx
async def request_with_backoff(client, *args, max_retries=5, **kwargs):
delay = 0.5
for attempt in range(max_retries):
response = await client.create(*args, **kwargs)
if response.status_code == 429:
await asyncio.sleep(delay)
delay = min(delay * 2, 30)
continue
return response
raise RuntimeError("Rate limit exceeded after maximum retries")