doc: correction
This commit is contained in:
parent
b33bb415dd
commit
3d8e5044d6
2 changed files with 31 additions and 26 deletions
20
README.md
20
README.md
|
|
@ -2,7 +2,6 @@
|
|||
|
||||
**Async semantic caching for LLM API calls — reduce costs with one decorator.**
|
||||
|
||||
[](https://pypi.org/project/semantic-llm-cache/)
|
||||
[](LICENSE)
|
||||
[](https://pypi.org/project/semantic-llm-cache/)
|
||||
|
||||
|
|
@ -21,8 +20,9 @@ LLM API calls are expensive and slow. In production applications, **20-40% of pr
|
|||
|
||||
## What changed from the original
|
||||
|
||||
|
||||
| Area | Original | This fork |
|
||||
| -------------------- | ------------------------- | ------------------------------------------------------------------- |
|
||||
| ---------------------- | --------------------------- | --------------------------------------------------------------------- |
|
||||
| Backends | sync (`sqlite3`, `redis`) | async (`aiosqlite`, `redis.asyncio`) |
|
||||
| `@cache` decorator | sync only | auto-detects async/sync |
|
||||
| `EmbeddingCache` | sync`encode()` | adds`async aencode()` via `asyncio.to_thread` |
|
||||
|
|
@ -30,7 +30,7 @@ LLM API calls are expensive and slow. In production applications, **20-40% of pr
|
|||
| `CachedLLM` | `chat()` | adds`achat()` |
|
||||
| Utility functions | sync | `clear_cache`, `invalidate`, `warm_cache`, `export_cache` all async |
|
||||
| `StorageBackend` ABC | sync abstract methods | all abstract methods are`async def` |
|
||||
| Min Python | 3.9 | 3.10 (uses `X \| Y` union syntax) |
|
||||
| Min Python | 3.9 | 3.10 (uses`X | Y` union syntax) |
|
||||
|
||||
## Installation
|
||||
|
||||
|
|
@ -198,10 +198,11 @@ async def my_llm_function(prompt: str) -> str:
|
|||
|
||||
### Parameters
|
||||
|
||||
|
||||
| Parameter | Type | Default | Description |
|
||||
| ------------ | ------------- | ----------- | --------------------------------------------------------- |
|
||||
| -------------- | -------------- | ------------- | ----------------------------------------------------------- |
|
||||
| `similarity` | `float` | `1.0` | Cosine similarity threshold (1.0 = exact, 0.9 = semantic) |
|
||||
| `ttl` | `int \| None` | `3600` | Time-to-live in seconds (None = never expires) |
|
||||
| `ttl` | `int | None` | `3600` | Time-to-live in seconds (None = never expires) |
|
||||
| `backend` | `Backend` | `None` | Storage backend (None = in-memory) |
|
||||
| `namespace` | `str` | `"default"` | Isolate different use cases |
|
||||
| `enabled` | `bool` | `True` | Enable/disable caching |
|
||||
|
|
@ -221,16 +222,18 @@ from semantic_llm_cache.stats import (
|
|||
|
||||
## Backends
|
||||
|
||||
|
||||
| Backend | Description | I/O |
|
||||
| --------------- | ------------------------------------ | ------------------------- |
|
||||
| ----------------- | -------------------------------------- | ---------------------------- |
|
||||
| `MemoryBackend` | In-memory LRU (default) | none — runs in event loop |
|
||||
| `SQLiteBackend` | Persistent, file-based (`aiosqlite`) | async non-blocking |
|
||||
| `RedisBackend` | Distributed (`redis.asyncio`) | async non-blocking |
|
||||
|
||||
## Embedding Providers
|
||||
|
||||
|
||||
| Provider | Quality | Notes |
|
||||
| ----------------------------- | ---------------------------- | --------------------------- |
|
||||
| ------------------------------- | ------------------------------ | ---------------------------- |
|
||||
| `DummyEmbeddingProvider` | hash-only, no semantic match | zero deps, default |
|
||||
| `SentenceTransformerProvider` | high (local model) | requires`[semantic]` extra |
|
||||
| `OpenAIEmbeddingProvider` | high (API) | requires`[openai]` extra |
|
||||
|
|
@ -250,8 +253,9 @@ embedding = await embedding_cache.aencode("my prompt")
|
|||
|
||||
## Performance
|
||||
|
||||
|
||||
| Metric | Value |
|
||||
| -------------------------- | ---------------------------------------- |
|
||||
| ---------------------------- | ------------------------------------------ |
|
||||
| Cache hit latency | <10ms |
|
||||
| Embedding overhead on miss | ~50ms (sentence-transformers, offloaded) |
|
||||
| Typical hit rate | 25-40% |
|
||||
|
|
|
|||
|
|
@ -20,6 +20,7 @@ keywords = [
|
|||
"openai",
|
||||
"anthropic",
|
||||
"ollama",
|
||||
"llama.cpp",
|
||||
"prompt",
|
||||
"optimization",
|
||||
"cost-reduction",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue