- Added index on token_time_series timestamp for faster queries - Introduced cache locks to prevent race conditions
5.5 KiB
Race Condition Fixes in router.py
This document summarizes all the race condition fixes that have been implemented in the router.py file.
Summary of Issues Found and Fixed
1. Cache Access Race Conditions
Problem: The _models_cache, _loaded_models_cache, and _error_cache dictionaries were being accessed concurrently without proper synchronization, leading to potential data corruption and inconsistent state.
Solution: Added three new asyncio.Lock objects to protect cache access:
_models_cache_lock- protects_models_cache_loaded_models_cache_lock- protects_loaded_models_cache_error_cache_lock- protects_error_cache
Implementation:
- All cache reads now acquire the appropriate lock before accessing the cache
- All cache writes now acquire the appropriate lock before modifying the cache
- Cache entries are checked for freshness while holding the lock
- Stale entries are removed while holding the lock
2. Timestamp Calculation Race Condition
Problem: In the token_worker() function, the timestamp calculation was performed inside the lock, which could lead to:
- Inconsistent timestamps across multiple token entries
- Potential delays in processing due to lock contention
- Race conditions if the datetime object was modified between calculation and use
Solution: Moved timestamp calculation outside the lock:
- Calculate timestamp before acquiring the buffer lock
- Pass the pre-calculated timestamp into the critical section
- This ensures consistent timestamps while minimizing lock duration
Implementation:
# Calculate timestamp once before acquiring lock
now = datetime.now(tz=timezone.utc)
timestamp = int(datetime(now.year, now.month, now.day, now.hour, now.minute, tzinfo=timezone.utc).timestamp())
# Then use it inside the lock
async with buffer_lock:
time_series_buffer.append({
'endpoint': endpoint,
'model': model,
'input_tokens': prompt,
'output_tokens': comp,
'total_tokens': prompt + comp,
'timestamp': timestamp # Use pre-calculated timestamp
})
3. Buffer Access Race Conditions
Problem: The token_buffer and time_series_buffer were accessed concurrently from multiple sources (token_worker, flush_buffer, flush_remaining_buffers) without proper synchronization.
Solution: The existing buffer_lock was correctly used, but we ensured all access patterns properly acquire the lock.
Implementation:
- All reads and writes to
token_bufferare protected bybuffer_lock - All reads and writes to
time_series_bufferare protected bybuffer_lock - Buffer copies are made while holding the lock, then released before DB operations
4. Usage Count Race Conditions
Problem: The usage_counts and token_usage_counts dictionaries were accessed concurrently without proper synchronization.
Solution: The existing usage_lock and token_usage_lock were correctly used throughout the codebase.
Implementation:
- All increments/decrements of
usage_countsare protected byusage_lock - All increments/decrements of
token_usage_countsare protected bytoken_usage_lock - The
publish_snapshot()function acquiresusage_lockbefore creating snapshots
5. Subscriber Management Race Conditions
Problem: The _subscribers set was accessed concurrently without proper synchronization.
Solution: The existing _subscribers_lock was correctly used throughout the codebase.
Implementation:
- All additions/removals from
_subscribersare protected by_subscribers_lock - The
subscribe()function acquires the lock before adding new subscribers - The
unsubscribe()function acquires the lock before removing subscribers - The
publish_snapshot()function acquires the lock before iterating over subscribers
Testing Recommendations
To verify the race condition fixes:
-
Concurrent Request Testing:
- Send multiple simultaneous requests to the same endpoint/model
- Verify that usage counts remain consistent
- Verify that token counts are accurately tracked
-
Cache Consistency Testing:
- Rapidly query
/api/tagsand/api/psendpoints - Verify that cached responses remain consistent
- Verify that stale cache entries are properly invalidated
- Rapidly query
-
Timestamp Consistency Testing:
- Send multiple requests in quick succession
- Verify that timestamps in time_series_buffer are consistent and accurate
- Verify that timestamps are properly rounded to minute boundaries
-
Load Testing:
- Simulate high load with many concurrent connections
- Verify that the system remains stable under load
- Verify that no deadlocks occur
Performance Considerations
The race condition fixes introduce additional locking, which may have performance implications:
-
Lock Granularity: The locks are fine-grained (one per cache type), minimizing contention.
-
Lock Duration: Locks are held for minimal durations - typically just for cache reads/writes.
-
Concurrent Operations: Multiple endpoints/models can be processed concurrently as long as they don't contend for the same cache entry.
-
Monitoring: Consider adding metrics to track lock contention and wait times.
Future Improvements
-
Read-Write Locks: Consider using read-write locks if read-heavy workloads are identified.
-
Cache Invalidation: Implement more sophisticated cache invalidation strategies.
-
Lock Timeouts: Add timeout mechanisms to prevent deadlocks.
-
Performance Monitoring: Add instrumentation to track lock contention and performance.