alpha-nerd-nomyo 067cdf641a feat: add timestamp index and improve cache concurrency

- Added index on token_time_series timestamp for faster queries
- Introduced cache locks to prevent race conditions

2026-01-16 16:47:24 +01:00

5.5 KiB

Raw Blame History

Race Condition Fixes in router.py

This document summarizes all the race condition fixes that have been implemented in the router.py file.

Summary of Issues Found and Fixed

1. Cache Access Race Conditions

Problem: The _models_cache, _loaded_models_cache, and _error_cache dictionaries were being accessed concurrently without proper synchronization, leading to potential data corruption and inconsistent state.

Solution: Added three new asyncio.Lock objects to protect cache access:

_models_cache_lock - protects _models_cache
_loaded_models_cache_lock - protects _loaded_models_cache
_error_cache_lock - protects _error_cache

Implementation:

All cache reads now acquire the appropriate lock before accessing the cache
All cache writes now acquire the appropriate lock before modifying the cache
Cache entries are checked for freshness while holding the lock
Stale entries are removed while holding the lock

2. Timestamp Calculation Race Condition

Problem: In the token_worker() function, the timestamp calculation was performed inside the lock, which could lead to:

Inconsistent timestamps across multiple token entries
Potential delays in processing due to lock contention
Race conditions if the datetime object was modified between calculation and use

Solution: Moved timestamp calculation outside the lock:

Calculate timestamp before acquiring the buffer lock
Pass the pre-calculated timestamp into the critical section
This ensures consistent timestamps while minimizing lock duration

Implementation:

# Calculate timestamp once before acquiring lock
now = datetime.now(tz=timezone.utc)
timestamp = int(datetime(now.year, now.month, now.day, now.hour, now.minute, tzinfo=timezone.utc).timestamp())

# Then use it inside the lock
async with buffer_lock:
    time_series_buffer.append({
        'endpoint': endpoint,
        'model': model,
        'input_tokens': prompt,
        'output_tokens': comp,
        'total_tokens': prompt + comp,
        'timestamp': timestamp  # Use pre-calculated timestamp
    })

3. Buffer Access Race Conditions

Problem: The token_buffer and time_series_buffer were accessed concurrently from multiple sources (token_worker, flush_buffer, flush_remaining_buffers) without proper synchronization.

Solution: The existing buffer_lock was correctly used, but we ensured all access patterns properly acquire the lock.

Implementation:

All reads and writes to token_buffer are protected by buffer_lock
All reads and writes to time_series_buffer are protected by buffer_lock
Buffer copies are made while holding the lock, then released before DB operations

4. Usage Count Race Conditions

Problem: The usage_counts and token_usage_counts dictionaries were accessed concurrently without proper synchronization.

Solution: The existing usage_lock and token_usage_lock were correctly used throughout the codebase.

Implementation:

All increments/decrements of usage_counts are protected by usage_lock
All increments/decrements of token_usage_counts are protected by token_usage_lock
The publish_snapshot() function acquires usage_lock before creating snapshots

5. Subscriber Management Race Conditions

Problem: The _subscribers set was accessed concurrently without proper synchronization.

Solution: The existing _subscribers_lock was correctly used throughout the codebase.

Implementation:

All additions/removals from _subscribers are protected by _subscribers_lock
The subscribe() function acquires the lock before adding new subscribers
The unsubscribe() function acquires the lock before removing subscribers
The publish_snapshot() function acquires the lock before iterating over subscribers

Testing Recommendations

To verify the race condition fixes:

Concurrent Request Testing:
- Send multiple simultaneous requests to the same endpoint/model
- Verify that usage counts remain consistent
- Verify that token counts are accurately tracked
Cache Consistency Testing:
- Rapidly query /api/tags and /api/ps endpoints
- Verify that cached responses remain consistent
- Verify that stale cache entries are properly invalidated
Timestamp Consistency Testing:
- Send multiple requests in quick succession
- Verify that timestamps in time_series_buffer are consistent and accurate
- Verify that timestamps are properly rounded to minute boundaries
Load Testing:
- Simulate high load with many concurrent connections
- Verify that the system remains stable under load
- Verify that no deadlocks occur

Performance Considerations

The race condition fixes introduce additional locking, which may have performance implications:

Lock Granularity: The locks are fine-grained (one per cache type), minimizing contention.
Lock Duration: Locks are held for minimal durations - typically just for cache reads/writes.
Concurrent Operations: Multiple endpoints/models can be processed concurrently as long as they don't contend for the same cache entry.
Monitoring: Consider adding metrics to track lock contention and wait times.

Future Improvements

Read-Write Locks: Consider using read-write locks if read-heavy workloads are identified.
Cache Invalidation: Implement more sophisticated cache invalidation strategies.
Lock Timeouts: Add timeout mechanisms to prevent deadlocks.
Performance Monitoring: Add instrumentation to track lock contention and performance.

5.5 KiB Raw Blame History

Race Condition Fixes in router.py

Summary of Issues Found and Fixed

1. Cache Access Race Conditions

2. Timestamp Calculation Race Condition

3. Buffer Access Race Conditions

4. Usage Count Race Conditions

5. Subscriber Management Race Conditions

Testing Recommendations

Performance Considerations

Future Improvements

5.5 KiB

Raw Blame History