nomyo/doc/rate-limits.md

2.5 KiB

Rate Limits

The NOMYO API (api.nomyo.ai) enforces rate limits to ensure fair usage and service stability for all users.

Default Rate Limit

By default, each API key is limited to 2 requests per second.

Burst Allowance

Short bursts above the default limit are permitted. You may send up to 4 requests per second in burst mode, provided you have not exceeded burst usage within the current 10-second window.

Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.

Rate Limit Summary

Mode Limit Condition
Default 2 requests/second Always active
Burst 4 requests/second Once per 10-second window

Error Responses

429 Too Many Requests

Returned when your request rate exceeds the allowed limit.

HTTP/1.1 429 Too Many Requests

What to do: Back off and retry after a short delay. Implement exponential backoff in your client to avoid repeated limit hits.

503 Service Unavailable (Cool-down)

Returned when burst limits are abused repeatedly. A 30-minute cool-down is applied to the offending API key.

HTTP/1.1 503 Service Unavailable

What to do: Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.

Best Practices

  • Throttle your requests client-side to stay at or below 2 requests/second under normal load.
  • Use burst sparingly — it is intended for occasional spikes, not sustained high-throughput usage.
  • Implement exponential backoff when you receive a 429 response. Start with a short delay (e.g. 500 ms) and double it on each subsequent failure, up to a reasonable maximum.
  • Monitor for 503 responses — repeated occurrences indicate that your usage pattern is triggering the abuse threshold. Refactor your request logic before the cool-down expires.

Example: Exponential Backoff

import asyncio
import httpx

async def request_with_backoff(client, *args, max_retries=5, **kwargs):
    delay = 0.5
    for attempt in range(max_retries):
        response = await client.create(*args, **kwargs)
        if response.status_code == 429:
            await asyncio.sleep(delay)
            delay = min(delay * 2, 30)
            continue
        return response
    raise RuntimeError("Rate limit exceeded after maximum retries")