nomyo/doc/rate-limits.md
alpha nerd 93adb6c45c
All checks were successful
Publish to PyPI / publish (push) Successful in 16s
feat: add automatic client retry logic with exponential backoff
2026-04-15 12:08:21 +02:00

2.5 KiB

Rate Limits

The NOMYO API (api.nomyo.ai) enforces rate limits to ensure fair usage and service stability for all users.

Default Rate Limit

By default, each API key is limited to 2 requests per second.

Burst Allowance

Short bursts above the default limit are permitted. You may send up to 4 requests per second in burst mode, provided you have not exceeded burst usage within the current 10-second window.

Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.

Rate Limit Summary

Mode Limit Condition
Default 2 requests/second Always active
Burst 4 requests/second Once per 10-second window

Error Responses

429 Too Many Requests

Returned when your request rate exceeds the allowed limit.

HTTP/1.1 429 Too Many Requests

What to do: Back off and retry after a short delay. Implement exponential backoff in your client to avoid repeated limit hits.

503 Service Unavailable (Cool-down)

Returned when burst limits are abused repeatedly. A 30-minute cool-down is applied to the offending API key.

HTTP/1.1 503 Service Unavailable

What to do: Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.

Best Practices

  • Throttle your requests client-side to stay at or below 2 requests/second under normal load.
  • Use burst sparingly — it is intended for occasional spikes, not sustained high-throughput usage.
  • Implement exponential backoff when you receive a 429 response. Start with a short delay (e.g. 500 ms) and double it on each subsequent failure, up to a reasonable maximum.
  • Monitor for 503 responses — repeated occurrences indicate that your usage pattern is triggering the abuse threshold. Refactor your request logic before the cool-down expires.

Retry Behaviour

The client retries automatically on 429, 500, 502, 503, 504, and network errors using exponential backoff (1 s, 2 s, …). The default is 2 retries. You can raise or disable this per client:

# More retries for high-throughput workloads
client = SecureChatCompletion(api_key="...", max_retries=5)

# Disable retries entirely
client = SecureChatCompletion(api_key="...", max_retries=0)