feat: add automatic client retry logic with exponential backoff

2026-04-15 12:08:21 +02:00 · 2026-04-15 12:08:21 +02:00 · 93adb6c45c
commit 93adb6c45c
parent 5040d181d2
7 changed files with 87 additions and 66 deletions
--- a/doc/api-reference.md
+++ b/doc/api-reference.md
@ -11,7 +11,8 @@ SecureChatCompletion(
    base_url: str = "https://api.nomyo.ai",
    allow_http: bool = False,
    api_key: Optional[str] = None,
-    secure_memory: bool = True
+    secure_memory: bool = True,
+    max_retries: int = 2
 )
 ```

@ -21,6 +22,7 @@ SecureChatCompletion(
 - `allow_http` (bool): Allow HTTP connections (ONLY for local development, never in production)
 - `api_key` (Optional[str]): Optional API key for bearer authentication
 - `secure_memory` (bool): Enable secure memory protection (default: True)
+- `max_retries` (int): Number of retries on retryable errors (429, 500, 502, 503, 504, network errors). Uses exponential backoff. Default: 2

 ### Methods

@ -92,13 +94,18 @@ The `SecureCompletionClient` class handles the underlying encryption, key manage
 ### Constructor

 ```python
-SecureCompletionClient(router_url: str = "https://api.nomyo.ai", allow_http: bool = False)
+SecureCompletionClient(
+    router_url: str = "https://api.nomyo.ai",
+    allow_http: bool = False,
+    max_retries: int = 2
+)
 ```

 **Parameters:**

 - `router_url` (str): Base URL of the NOMYO Router (must use HTTPS for production)
 - `allow_http` (bool): Allow HTTP connections (ONLY for local development, never in production)
+- `max_retries` (int): Number of retries on retryable errors (429, 500, 502, 503, 504, network errors). Uses exponential backoff. Default: 2

 ### Methods

--- a/doc/rate-limits.md
+++ b/doc/rate-limits.md
@ -48,20 +48,14 @@ HTTP/1.1 503 Service Unavailable
 - **Implement exponential backoff** when you receive a `429` response. Start with a short delay (e.g. 500 ms) and double it on each subsequent failure, up to a reasonable maximum.
 - **Monitor for `503` responses** — repeated occurrences indicate that your usage pattern is triggering the abuse threshold. Refactor your request logic before the cool-down expires.

-## Example: Exponential Backoff
+## Retry Behaviour
+
+The client retries automatically on `429`, `500`, `502`, `503`, `504`, and network errors using exponential backoff (1 s, 2 s, …). The default is **2 retries**. You can raise or disable this per client:

 ```python
-import asyncio
-import httpx
+# More retries for high-throughput workloads
+client = SecureChatCompletion(api_key="...", max_retries=5)

-async def request_with_backoff(client, *args, max_retries=5, **kwargs):
-    delay = 0.5
-    for attempt in range(max_retries):
-        response = await client.create(*args, **kwargs)
-        if response.status_code == 429:
-            await asyncio.sleep(delay)
-            delay = min(delay * 2, 30)
-            continue
-        return response
-    raise RuntimeError("Rate limit exceeded after maximum retries")
+# Disable retries entirely
+client = SecureChatCompletion(api_key="...", max_retries=0)
 ```