doc: correction
This commit is contained in:
parent
b33bb415dd
commit
3d8e5044d6
2 changed files with 31 additions and 26 deletions
54
README.md
54
README.md
|
|
@ -2,7 +2,6 @@
|
||||||
|
|
||||||
**Async semantic caching for LLM API calls — reduce costs with one decorator.**
|
**Async semantic caching for LLM API calls — reduce costs with one decorator.**
|
||||||
|
|
||||||
[](https://pypi.org/project/semantic-llm-cache/)
|
|
||||||
[](LICENSE)
|
[](LICENSE)
|
||||||
[](https://pypi.org/project/semantic-llm-cache/)
|
[](https://pypi.org/project/semantic-llm-cache/)
|
||||||
|
|
||||||
|
|
@ -21,16 +20,17 @@ LLM API calls are expensive and slow. In production applications, **20-40% of pr
|
||||||
|
|
||||||
## What changed from the original
|
## What changed from the original
|
||||||
|
|
||||||
|
|
||||||
| Area | Original | This fork |
|
| Area | Original | This fork |
|
||||||
| -------------------- | ------------------------- | ------------------------------------------------------------------- |
|
| ---------------------- | --------------------------- | --------------------------------------------------------------------- |
|
||||||
| Backends | sync (`sqlite3`, `redis`) | async (`aiosqlite`, `redis.asyncio`) |
|
| Backends | sync (`sqlite3`, `redis`) | async (`aiosqlite`, `redis.asyncio`) |
|
||||||
| `@cache` decorator | sync only | auto-detects async/sync |
|
| `@cache` decorator | sync only | auto-detects async/sync |
|
||||||
| `EmbeddingCache` | sync `encode()` | adds `async aencode()` via `asyncio.to_thread` |
|
| `EmbeddingCache` | sync`encode()` | adds`async aencode()` via `asyncio.to_thread` |
|
||||||
| `CacheContext` | sync only | supports both `with` and `async with` |
|
| `CacheContext` | sync only | supports both`with` and `async with` |
|
||||||
| `CachedLLM` | `chat()` | adds `achat()` |
|
| `CachedLLM` | `chat()` | adds`achat()` |
|
||||||
| Utility functions | sync | `clear_cache`, `invalidate`, `warm_cache`, `export_cache` all async |
|
| Utility functions | sync | `clear_cache`, `invalidate`, `warm_cache`, `export_cache` all async |
|
||||||
| `StorageBackend` ABC | sync abstract methods | all abstract methods are `async def` |
|
| `StorageBackend` ABC | sync abstract methods | all abstract methods are`async def` |
|
||||||
| Min Python | 3.9 | 3.10 (uses `X \| Y` union syntax) |
|
| Min Python | 3.9 | 3.10 (uses`X | Y` union syntax) |
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
|
|
@ -198,14 +198,15 @@ async def my_llm_function(prompt: str) -> str:
|
||||||
|
|
||||||
### Parameters
|
### Parameters
|
||||||
|
|
||||||
| Parameter | Type | Default | Description |
|
|
||||||
| ------------ | ------------- | ----------- | --------------------------------------------------------- |
|
| Parameter | Type | Default | Description |
|
||||||
| `similarity` | `float` | `1.0` | Cosine similarity threshold (1.0 = exact, 0.9 = semantic) |
|
| -------------- | -------------- | ------------- | ----------------------------------------------------------- |
|
||||||
| `ttl` | `int \| None` | `3600` | Time-to-live in seconds (None = never expires) |
|
| `similarity` | `float` | `1.0` | Cosine similarity threshold (1.0 = exact, 0.9 = semantic) |
|
||||||
| `backend` | `Backend` | `None` | Storage backend (None = in-memory) |
|
| `ttl` | `int | None` | `3600` | Time-to-live in seconds (None = never expires) |
|
||||||
| `namespace` | `str` | `"default"` | Isolate different use cases |
|
| `backend` | `Backend` | `None` | Storage backend (None = in-memory) |
|
||||||
| `enabled` | `bool` | `True` | Enable/disable caching |
|
| `namespace` | `str` | `"default"` | Isolate different use cases |
|
||||||
| `key_func` | `Callable` | `None` | Custom cache key function |
|
| `enabled` | `bool` | `True` | Enable/disable caching |
|
||||||
|
| `key_func` | `Callable` | `None` | Custom cache key function |
|
||||||
|
|
||||||
### Utility Functions
|
### Utility Functions
|
||||||
|
|
||||||
|
|
@ -221,19 +222,21 @@ from semantic_llm_cache.stats import (
|
||||||
|
|
||||||
## Backends
|
## Backends
|
||||||
|
|
||||||
| Backend | Description | I/O |
|
|
||||||
| --------------- | ------------------------------------ | ------------------------- |
|
| Backend | Description | I/O |
|
||||||
|
| ----------------- | -------------------------------------- | ---------------------------- |
|
||||||
| `MemoryBackend` | In-memory LRU (default) | none — runs in event loop |
|
| `MemoryBackend` | In-memory LRU (default) | none — runs in event loop |
|
||||||
| `SQLiteBackend` | Persistent, file-based (`aiosqlite`) | async non-blocking |
|
| `SQLiteBackend` | Persistent, file-based (`aiosqlite`) | async non-blocking |
|
||||||
| `RedisBackend` | Distributed (`redis.asyncio`) | async non-blocking |
|
| `RedisBackend` | Distributed (`redis.asyncio`) | async non-blocking |
|
||||||
|
|
||||||
## Embedding Providers
|
## Embedding Providers
|
||||||
|
|
||||||
| Provider | Quality | Notes |
|
|
||||||
| ----------------------------- | ---------------------------- | --------------------------- |
|
| Provider | Quality | Notes |
|
||||||
| `DummyEmbeddingProvider` | hash-only, no semantic match | zero deps, default |
|
| ------------------------------- | ------------------------------ | ---------------------------- |
|
||||||
| `SentenceTransformerProvider` | high (local model) | requires `[semantic]` extra |
|
| `DummyEmbeddingProvider` | hash-only, no semantic match | zero deps, default |
|
||||||
| `OpenAIEmbeddingProvider` | high (API) | requires `[openai]` extra |
|
| `SentenceTransformerProvider` | high (local model) | requires`[semantic]` extra |
|
||||||
|
| `OpenAIEmbeddingProvider` | high (API) | requires`[openai]` extra |
|
||||||
|
|
||||||
Embedding inference is offloaded via `asyncio.to_thread` — model loading is blocking and should be done at application startup, not on first request.
|
Embedding inference is offloaded via `asyncio.to_thread` — model loading is blocking and should be done at application startup, not on first request.
|
||||||
|
|
||||||
|
|
@ -250,8 +253,9 @@ embedding = await embedding_cache.aencode("my prompt")
|
||||||
|
|
||||||
## Performance
|
## Performance
|
||||||
|
|
||||||
|
|
||||||
| Metric | Value |
|
| Metric | Value |
|
||||||
| -------------------------- | ---------------------------------------- |
|
| ---------------------------- | ------------------------------------------ |
|
||||||
| Cache hit latency | <10ms |
|
| Cache hit latency | <10ms |
|
||||||
| Embedding overhead on miss | ~50ms (sentence-transformers, offloaded) |
|
| Embedding overhead on miss | ~50ms (sentence-transformers, offloaded) |
|
||||||
| Typical hit rate | 25-40% |
|
| Typical hit rate | 25-40% |
|
||||||
|
|
|
||||||
|
|
@ -20,6 +20,7 @@ keywords = [
|
||||||
"openai",
|
"openai",
|
||||||
"anthropic",
|
"anthropic",
|
||||||
"ollama",
|
"ollama",
|
||||||
|
"llama.cpp",
|
||||||
"prompt",
|
"prompt",
|
||||||
"optimization",
|
"optimization",
|
||||||
"cost-reduction",
|
"cost-reduction",
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue