nomyo-router

Author	SHA1	Message	Date
alpha-nerd-nomyo	ca773d6ddb	Merge pull request #35 from nomyo-ai/dev-v0.7.x-semcache Dev v0.7.x semcache -> dev-v0.7.x	2026-03-11 09:40:55 +01:00
alpha-nerd-nomyo	46da392a53	fix: semcache version pinned	2026-03-11 09:40:00 +01:00
alpha-nerd-nomyo	fbdc73eebb	fix: improvements, fixes and opt-in cache doc: semantic-cache.md added with detailed write-up	2026-03-10 15:19:37 +01:00
alpha-nerd-nomyo	a5108486e3	conf: clean default conf	2026-03-08 09:35:40 +01:00
alpha-nerd-nomyo	e8b8981421	doc: updated usage.md	2026-03-08 09:26:53 +01:00
alpha-nerd-nomyo	dd4b12da6a	feat: adding a semantic cache layer	2026-03-08 09:12:09 +01:00
alpha-nerd-nomyo	c3d47c7ffe	docs: adding ghcr docker pull instructions	2026-03-05 11:54:42 +01:00
alpha-nerd-nomyo	b951cc82e3	bump version	2026-03-05 11:09:20 +01:00
alpha-nerd-nomyo	00a06dca51	feat: add docker publish workflow	2026-03-05 11:09:16 +01:00
alpha-nerd-nomyo	8037706f0b	fix(db.py): remove full table scans with proper where clauses for dashboard statistics and calc in db rather than python	2026-03-03 17:20:33 +01:00
alpha-nerd-nomyo	45315790d1	fix(router.py): - added global for orphaned token_worker_task and flust_task - fixed a regex to effectively _mask_secrets - fixed several Type and KeyErrors - fixed model deduplication for llama_server_endpoints	2026-03-03 16:34:16 +01:00
alpha-nerd-nomyo	e96e890511	refactor: make choose_endpoint use cache incrementer for atomic updates	2026-03-03 14:57:37 +01:00
alpha-nerd-nomyo	e7196146ad	feat: add uvloop to requirements.txt as optional dependency to improve performance in high concurrent scenarios	2026-03-03 10:31:10 +01:00
alpha-nerd-nomyo	10c83c3e1e	fix(router): treat missing status as loaded for llama model check Add check for `status is None` in `_is_llama_model_loaded`. Models without a status field (e.g., single-model servers) are assumed to be always loaded rather than failing the check. Also updated docstring to clarify this behavior.	2026-03-02 08:54:46 +01:00
alpha-nerd-nomyo	cac0580eec	feat: adding /v1/rerank endpoint with cohere,jina,llama.cpp compatibility	2026-02-28 09:31:25 +01:00
alpha-nerd-nomyo	ad4a1d07b2	fix(/v1/embeddings): returning the async_gen forced FastAPI serialization which caused Pydantic Errors. Also sanizted nan/inf values to floats (0.0). Use try - finally to properly decrement usage counters in case of error.	2026-02-27 16:39:27 +01:00
alpha-nerd-nomyo	d2ea65f74a	fix(router): use normalized model keys for endpoint selection Refactor endpoint selection logic to consistently use tracking model keys (normalized via `get_tracking_model`) instead of raw model names, ensuring usage counts are accurately compared with how increment/decrement operations store them. This fixes inconsistent load balancing and model affinity behavior caused by mismatches between raw and tracked model identifiers.	2026-02-19 17:32:54 +01:00
alpha-nerd-nomyo	07751ddd3b	fix: endpoint selection logic again	2026-02-19 10:11:53 +01:00
alpha-nerd-nomyo	7cba67cce0	feat(router): normalize model names for usage tracking across endpoints (continued) Introduce `get_tracking_model()` to standardize model names for consistent usage tracking in Prometheus metrics. This ensures llama-server models are stripped of HF prefixes and quantization suffixes, Ollama models append `:latest` when versionless, and external OpenAI models remain unchanged—aligning all tracking keys with the PS table.	2026-02-18 11:45:37 +01:00
alpha-nerd-nomyo	b2980a7d24	fix(router): handle invalid version responses with 503 error Filter out non-string version responses (e.g., empty lists from failed requests) and return a 503 Service Unavailable error if no valid versions are received from any endpoint.	2026-02-17 15:56:09 +01:00
alpha-nerd-nomyo	836c5f41ea	fix(router): normalize model names for usage tracking across endpoints	2026-02-17 11:35:53 +01:00
alpha-nerd-nomyo	372fe9fb72	feat(router): parallelize llama-server props fetch and add reasoning/tool call support - Fetch `/props` endpoints in parallel to get context length and auto-unload sleeping models - Add support for reasoning content and tool calls in streaming openai chat/completions responses	2026-02-15 17:05:35 +01:00
alpha-nerd-nomyo	4d40048fd2	fix: loaded_models_cache timing restored	2026-02-15 12:15:36 +01:00
alpha-nerd-nomyo	0bad604b02	feat: deduplicate background refresh tasks and extend cache TTL Adds lock-protected dictionaries to track running background refresh tasks, preventing duplicate executions per endpoint. Increases cache freshness thresholds from 30s to 300s to reduce blocking behavior. fix: /v1 endpoints use correct media_types and usage information with proper logging	2026-02-14 14:51:44 +01:00
alpha-nerd-nomyo	c9ff384bb2	fix(router): /v1/models endpoint Shows now all available models	2026-02-13 16:27:06 +01:00
alpha-nerd-nomyo	4d80dc5e7c	feat: adding logprobs to /v1/chat/completion	2026-02-13 14:43:10 +01:00
alpha-nerd-nomyo	eda48562da	feat(router): add logprob support in /api/chat Add logprob support to the OpenAI-to-Ollama proxy by converting OpenAI logprob formats to Ollama types. Also update the ollama dependency.	2026-02-13 13:29:45 +01:00
alpha-nerd-nomyo	07af6e2e36	fix: better sample config	2026-02-13 10:52:14 +01:00
alpha-nerd-nomyo	9ef1b770ba	Merge pull request #25 from nomyo-ai/dev-v0.6 - updated reasoning handling - improved model and error caches - fixed openai tool calling incl. ollama translations - direct support for llama.cpp's llama_server via llama_server_endpoint config - basic llama_server model info in dashboard - improved endpoint info fetching behaviour in error cases	2026-02-13 10:34:42 +01:00
alpha-nerd-nomyo	1b355d8435	Merge branch 'main' into dev-v0.6	2026-02-13 10:33:36 +01:00
alpha-nerd-nomyo	c545f413a5	Merge pull request #23 from JTHesse/main Fix for SSL verification and SQL Bug	2026-02-13 10:19:07 +01:00
alpha-nerd-nomyo	08b77428b8	refactor(router): bump cache TTLs and skip error cache for health checks - Increased error and loaded model cache freshness thresholds from 10s to 30s. - Added `skip_error_cache` parameter to `endpoint_details` to prevent cached failures from blocking health checks. - Implemented automatic error recording in `_available_error_cache` on API request failures.	2026-02-13 10:11:41 +01:00
alpha-nerd-nomyo	f7ef413090	replays `3af166c8a4` to grant merge into main	2026-02-12 16:28:40 +01:00
alpha-nerd-nomyo	b649dcd8d6	proposal: use global truststore ctx for all connections	2026-02-12 16:15:39 +01:00
alpha-nerd-nomyo	5c4e1e81a6	Merge pull request #24 from nomyo-ai:dependabot/pip/pillow-12.1.1 Bump pillow from 11.3.0 to 12.1.1	2026-02-12 15:58:33 +01:00
dependabot[bot]	99f5a3bc91	Bump pillow from 11.3.0 to 12.1.1 Bumps [pillow](https://github.com/python-pillow/Pillow) from 11.3.0 to 12.1.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/11.3.0...12.1.1) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.1.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-02-11 17:43:19 +00:00
Jan-Timo	dd30ab9422	fix SSL: CERTIFICATE_VERIFY_FAILED	2026-02-11 13:47:11 +01:00
Jan-Timo	3af166c8a4	fix sqlite3.OperationalError: no such table: main.token_time_series	2026-02-11 13:46:37 +01:00
alpha-nerd-nomyo	9875eb977a	feat: Add tool call normalization and streaming delta accumulation Adds support for correctly handling tool calls in chat requests. Normalizes tool call data (ensuring IDs, types, and JSON arguments) in non-streaming mode and accumulates OpenAI-style deltas during streaming to build the final Ollama response.	2026-02-10 20:21:46 +01:00
alpha-nerd-nomyo	4892998abc	feat(router): Add llama-server endpoints support and model parsing Add `llama_server_endpoints` configuration field to support llama_server OpenAI-compatible endpoints for status checks. Implement helper functions to parse model names and quantization levels from llama-server responses (best effort). Update `is_ext_openai_endpoint` to properly distinguish these endpoints from external OpenAI services. Update sample configuration documentation.	2026-02-10 16:46:51 +01:00
alpha-nerd-nomyo	1f81e69ce1	refactor(router.py): correctly implement OpenAI tool_calls to Ollama format conversion	2026-02-09 11:04:14 +01:00
alpha-nerd-nomyo	7deb088c6a	refactor(cache): split error cache and add stale-while-revalidate Refactor error tracking to use separate caches for 'available' and 'loaded' models, preventing cross-contamination of transient errors. Implement background refresh for available models to prevent blocking requests, and use stale-while-revalidate (300-600s) to serve stale data immediately when the cache is between 300s and 600s old.	2026-02-08 16:46:40 +01:00
alpha-nerd-nomyo	92cea1dead	feat: update reasoning handling Updated reasoning content handling in router.py to check for both "reasoning_content" and "reasoning" attributes.	2026-02-08 11:29:47 +01:00
alpha-nerd-nomyo	bd0d210b2a	feat: enforce api key authentication and update table header - Added proper API key validation in router.py with 401 response when key is missing - Implemented CORS headers for authentication requests - Updated table header from "Until" to "Unload" in static/index.html - Improved security by preventing API key leakage in access logs	2026-02-01 10:05:46 +01:00
alpha-nerd-nomyo	b718d575b7	Merge pull request #22 from nomyo-ai/dev-v0.5.x Dev v0.5.x	2026-01-30 18:18:42 +01:00
alpha-nerd-nomyo	d80b29e4f2	feat: enhance code quality and documentation - Renamed Feedback class to follow PascalCase convention - Fixed candidate enumeration start index from 0 to 1 - Simplified candidate content access by removing .message.content - Updated CONFIG_PATH environment variable name to CONFIG_PATH_ARG - Bumped version from 0.5 to 0.6 - Removed unnecessary return statement and trailing newline	2026-01-29 19:59:08 +01:00
alpha-nerd-nomyo	0ebfa7c519	security: bump orjson to >=3.11.5 preventing a recursive DOS attack	2026-01-29 18:12:05 +01:00
alpha-nerd-nomyo	4ca1a5667e	feat(router): implement in-flight request tracking to prevent cache stampede in high concurrency scenarios Added in-flight request tracking mechanism to prevent cache stampede when multiple concurrent requests arrive for the same endpoint. This introduces new dictionaries to track ongoing requests and a lock to coordinate access. The available_models method was refactored to use an internal helper function and includes request coalescing logic to ensure only one HTTP request is made per endpoint when cache entries expire. The loaded_models method was also updated to use the new caching and coalescing pattern.	2026-01-29 18:00:33 +01:00
alpha-nerd-nomyo	7c25ffafb2	Merge pull request #21 from YetheSamartaka/model-ps-improvements Add endpoint differentiation for models ps board	2026-01-29 10:57:22 +01:00
alpha-nerd-nomyo	efdf14a207	fix: optimize table column widths and improve time formatting for responsive layout - Reduced min-width of model columns from 340px to 200px with max-width of 300px - Added specific styling for narrow columns (3rd-5th) with fixed width and center alignment - Removed "Instance count" as it has redundant information - Enhanced time formatting logic to show relative time instead of absolute dates - Simplified digest display to show last 6 characters instead of truncated format - Added proper handling for various time value types (number, string, null)	2026-01-29 10:54:43 +01:00

1 2 3 4 5

213 commits