trustgraph/ts/deploy/prometheus/prometheus.yml
elpresidank 3a80872482 fix: comprehensive QA — resolve 13 bugs, add UX improvements across all services
Client SDK: add .catch() to graphRagStreaming/documentRagStreaming (silent timeout),
null-guard JSON.parse in getPrompts/getSystemPrompt/getPrompt.

Backend: implement "getvalues" config operation for token costs, null-check
createTerm() in FalkorDB triples query, add knowledge-cores service entrypoint
and Docker entry, return proper HTTP 400/404 for gateway error responses.

Workbench: cancel button + elapsed timer for chat, clear agent spinner on error,
flow dialog inline validation, responsive header wrapping, knowledge cores
loading timeout, sidebar/page naming consistency, theme toggle indicator.

Infrastructure: enable Grafana Explore for viewers, add gateway Prometheus
scrape target, fix RAG pipeline dashboard layout (6 panels visible),
filter Service Health to configured targets only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 05:20:10 -05:00

34 lines
932 B
YAML

global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: "trustgraph-ts"
scrape_configs:
# Prometheus self-monitoring
- job_name: "prometheus"
scrape_interval: 15s
static_configs:
- targets:
- "prometheus:9090"
# NATS monitoring (uses nats-prometheus-exporter format)
# NATS exposes JSON at /varz, not Prometheus format.
# To get proper Prometheus metrics, deploy nats-exporter sidecar.
# For now, we rely on NATS healthcheck and JetStream monitoring via /jsz.
# OpenTelemetry Collector (exposes Prometheus metrics from OTLP pipeline)
- job_name: "otel-collector"
scrape_interval: 15s
static_configs:
- targets:
- "otel-collector:8889"
# TrustGraph gateway metrics (prom-client)
- job_name: "gateway"
scrape_interval: 15s
metrics_path: "/api/v1/metrics"
static_configs:
- targets:
- "gateway:8088"