mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-04 13:22:41 +02:00
feat: modify architecture examples and scaling checklist to system architecture skill
This commit is contained in:
parent
b22fe012d5
commit
056d3c456b
3 changed files with 302 additions and 183 deletions
76
.cursor/skills/system-architecture/scaling-checklist.md
Normal file
76
.cursor/skills/system-architecture/scaling-checklist.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Scaling Checklist
|
||||
|
||||
Concrete techniques for when the complexity checklist in SKILL.md confirms scale is a real problem. Apply in order - each level solves the previous level's bottleneck.
|
||||
|
||||
---
|
||||
|
||||
## Level 0: Optimize First
|
||||
|
||||
Before adding infrastructure, exhaust these:
|
||||
|
||||
- [ ] Database queries have proper indexes
|
||||
- [ ] N+1 queries eliminated
|
||||
- [ ] Connection pooling configured
|
||||
- [ ] Slow endpoints profiled and optimized
|
||||
- [ ] Static assets served via CDN
|
||||
|
||||
## Level 1: Read-Heavy
|
||||
|
||||
**Symptom**: Database reads are the bottleneck.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Application cache (in-memory) | Small, frequently accessed data | Stale data, memory pressure |
|
||||
| Redis/Memcached | Shared cache across instances | Network hop, cache invalidation complexity |
|
||||
| Read replicas | High read volume, slight staleness OK | Replication lag, eventual consistency |
|
||||
| CDN | Static or semi-static content | Cache invalidation delay |
|
||||
|
||||
## Level 2: Write-Heavy
|
||||
|
||||
**Symptom**: Database writes or processing are the bottleneck.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Async task queue (Celery, SQS) | Work can be deferred | Eventual consistency, failure handling |
|
||||
| Write-behind cache | Batch frequent writes | Data loss risk on crash |
|
||||
| Event streaming (Kafka) | Multiple consumers of same data | Operational complexity, ordering guarantees |
|
||||
| CQRS | Reads and writes have diverged significantly | Two models to maintain |
|
||||
|
||||
## Level 3: Traffic Spikes
|
||||
|
||||
**Symptom**: Individual instances can't handle peak load.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Horizontal scaling + load balancer | Stateless services | Session management, deploy complexity |
|
||||
| Auto-scaling | Unpredictable traffic patterns | Cold start latency, cost spikes |
|
||||
| Rate limiting | Protect against abuse/spikes | Legitimate users may be throttled |
|
||||
| Circuit breakers | Downstream services degrade | Partial functionality during failures |
|
||||
|
||||
## Level 4: Data Growth
|
||||
|
||||
**Symptom**: Single database can't hold or query all the data efficiently.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Table partitioning | Time-series or naturally partitioned data | Query complexity, partition management |
|
||||
| Archival / cold storage | Old data rarely accessed | Access latency for archived data |
|
||||
| Database sharding | Partitioning insufficient, clear shard key exists | Cross-shard queries, operational burden |
|
||||
| Search index (Elasticsearch) | Full-text or complex queries on large datasets | Index lag, another system to operate |
|
||||
|
||||
## Level 5: Multi-Region
|
||||
|
||||
**Symptom**: Users are geographically distributed, latency matters.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| CDN + edge caching | Static/semi-static content | Cache invalidation |
|
||||
| Read replicas per region | Read-heavy, slight staleness OK | Replication lag |
|
||||
| Active-passive failover | Disaster recovery | Failover time, cost of standby |
|
||||
| Active-active multi-region | True global low-latency required | Conflict resolution, extreme complexity |
|
||||
|
||||
---
|
||||
|
||||
## Decision Rule
|
||||
|
||||
Always start at Level 0. Move to the next level only when you have **measured evidence** that the current level is insufficient. Skipping levels is how you end up with Kafka for a TODO app.
|
||||
Loading…
Add table
Add a link
Reference in a new issue