fix: base_url

doc: created
2026-04-16 16:44:26 +02:00 · 2026-04-16 16:44:26 +02:00 · 43165f86f2
commit 43165f86f2
parent 6e02559f4e
17 changed files with 2151 additions and 293 deletions
--- a/doc/rate-limits.md
+++ b/doc/rate-limits.md
@ -0,0 +1,115 @@
+# Rate Limits
+
+The NOMYO API (`api.nomyo.ai`) enforces rate limits to ensure fair usage and service stability for all users.
+
+## Default Rate Limit
+
+By default, each API key is limited to **2 requests per second**.
+
+## Burst Allowance
+
+Short bursts above the default limit are permitted. You may send up to **4 requests per second** in burst mode, provided you have not exceeded burst usage within the current **10-second window**.
+
+Burst capacity is granted once per 10-second window. If you consume the burst allowance, you must wait for the window to reset before burst is available again.
+
+## Rate Limit Summary
+
+| Mode | Limit | Condition |
+|------|-------|-----------|
+| Default | 2 requests/second | Always active |
+| Burst | 4 requests/second | Once per 10-second window |
+
+## Error Responses
+
+### 429 Too Many Requests
+
+Returned when your request rate exceeds the allowed limit.
+
+The client retries automatically (see below). If all retries are exhausted, `RateLimitError` is thrown:
+
+```javascript
+import { SecureChatCompletion, RateLimitError } from 'nomyo-js';
+
+try {
+  const response = await client.create({ ... });
+} catch (err) {
+  if (err instanceof RateLimitError) {
+    // All retries exhausted — back off manually before trying again
+    console.error('Rate limit exceeded:', err.message);
+  }
+}
+```
+
+### 503 Service Unavailable (Cool-down)
+
+Returned when burst limits are abused repeatedly. A **30-minute cool-down** is applied to the offending API key.
+
+**What to do:** Wait 30 minutes before retrying. Review your request patterns to ensure you stay within the permitted limits.
+
+## Automatic Retry Behaviour
+
+The client retries automatically on `429`, `500`, `502`, `503`, `504`, and network errors using exponential backoff:
+
+| Attempt | Delay before attempt |
+|---------|----------------------|
+| 1st (initial) | — |
+| 2nd | 1 second |
+| 3rd | 2 seconds |
+
+The default is **2 retries** (3 total attempts). Adjust per client:
+
+```javascript
+// More retries for high-throughput workloads
+const client = new SecureChatCompletion({
+  apiKey: process.env.NOMYO_API_KEY,
+  maxRetries: 5,
+});
+
+// Disable retries entirely (fail fast)
+const client2 = new SecureChatCompletion({
+  apiKey: process.env.NOMYO_API_KEY,
+  maxRetries: 0,
+});
+```
+
+## Best Practices
+
+- **Throttle requests client-side** to stay at or below 2 requests/second under normal load.
+- **Use burst sparingly** — it is intended for occasional spikes, not sustained high-throughput usage.
+- **Increase `maxRetries`** for background jobs that can tolerate extra latency.
+- **Monitor for `503` responses** — repeated occurrences indicate your usage pattern is triggering the abuse threshold.
+- **Parallel requests** (e.g. `Promise.all`) count against the same rate limit — be careful with large batches.
+
+## Batch Processing Example
+
+Throttle parallel requests to stay within the rate limit:
+
+```javascript
+import { SecureChatCompletion } from 'nomyo-js';
+
+const client = new SecureChatCompletion({ apiKey: process.env.NOMYO_API_KEY });
+
+async function throttledBatch(queries, requestsPerSecond = 2) {
+  const results = [];
+  const delayMs = 1000 / requestsPerSecond;
+
+  for (const query of queries) {
+    const start = Date.now();
+
+    const response = await client.create({
+      model: 'Qwen/Qwen3-0.6B',
+      messages: [{ role: 'user', content: query }],
+    });
+    results.push(response.choices[0].message.content);
+
+    // Throttle: wait for the remainder of the time slot
+    const elapsed = Date.now() - start;
+    if (elapsed < delayMs) {
+      await new Promise(resolve => setTimeout(resolve, delayMs - elapsed));
+    }
+  }
+
+  client.dispose();
+  return results;
+}
+```