* Tidy up duplicate tech specs in doc directory
* Streaming LLM text-completion service tech spec.
* text-completion and prompt interfaces
* streaming change applied to all LLMs, so far tested with VertexAI
* Skip Pinecone unit tests, upstream module issue is affecting things, tests are passing again
* Added agent streaming, not working and has broken tests
- Keeps processing in different flows separate so that data can go to different stores / collections etc.
- Potentially supports different processing flows
- Tidies the processing API with common base-classes for e.g. LLMs, and automatic configuration of 'clients' to use the right queue names in a flow
* - Refactored retry for rate limits into the base class
- ConsumerProducer is derived from Consumer to simplify code
- Added rate_limit_count metrics for rate limit events
* Add rate limit events to VertexAI and Google AI Studio
* Added Grafana rate limit dashboard
* Add rate limit handling to all LLMs
- Change templates to interpolate environment variables in docker compose
- Change templates to invoke secrets for environment variable credentials in K8s configuration
- Update LLMs to pull in credentials from environment variables if not specified