Monitoring
Prometheus metrics, Grafana dashboards, Loki log aggregation, and alerting
Overview
The monitoring stack is included in Docker Compose and activated with a single profile flag:
docker compose --profile monitoring up -d
This starts Prometheus, Grafana, Loki, Promtail, Node Exporter, and the OpenTelemetry Collector — all pre-configured to scrape PiSovereign metrics and collect logs.
┌─────────────────┐
│ PiSovereign │
│ /metrics/ │
│ prometheus │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Prometheus │────▶│ Grafana │
│ (Metrics) │ │ (Dashboards) │
└─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ Promtail │────▶│ Loki │
│ (Log Shipper) │ │ (Log Storage) │
└─────────────────┘ └─────────────────┘
Resource Usage (Raspberry Pi 5)
| Component | Memory | Storage/Day |
|---|---|---|
| Prometheus | ~100 MB | ~50 MB |
| Grafana | ~150 MB | Minimal |
| Loki | ~200 MB | ~100 MB |
| Promtail | ~30 MB | — |
| Total | ~480 MB | ~150 MB |
Accessing Dashboards
After enabling the monitoring profile:
| Service | URL |
|---|---|
| Grafana | http://localhost/grafana (via Traefik) |
| Prometheus | http://localhost:9090 |
Default Grafana credentials are admin / admin (change on first login).
Dashboards and data sources are auto-provisioned — no manual setup required.
Prometheus Metrics
PiSovereign exposes metrics at /metrics/prometheus:
Application Metrics
| Metric | Type | Description |
|---|---|---|
app_uptime_seconds | Counter | Application uptime |
app_version_info | Gauge | Version information |
HTTP Metrics
| Metric | Type | Description |
|---|---|---|
http_requests_total | Counter | Total HTTP requests |
http_requests_success_total | Counter | 2xx responses |
http_requests_client_error_total | Counter | 4xx responses |
http_requests_server_error_total | Counter | 5xx responses |
http_requests_active | Gauge | Active requests |
http_response_time_avg_ms | Gauge | Average response time |
http_response_time_ms_bucket | Histogram | Response time distribution |
Inference Metrics
| Metric | Type | Description |
|---|---|---|
inference_requests_total | Counter | Total inference requests |
inference_requests_success_total | Counter | Successful inferences |
inference_requests_failed_total | Counter | Failed inferences |
inference_time_avg_ms | Gauge | Average inference time |
inference_time_ms_bucket | Histogram | Inference time distribution |
inference_tokens_total | Counter | Total tokens generated |
inference_healthy | Gauge | Health status (0/1) |
Cache Metrics
| Metric | Type | Description |
|---|---|---|
cache_hits_total | Counter | Cache hits |
cache_misses_total | Counter | Cache misses |
cache_size | Gauge | Current cache size |
Model Routing Metrics
These metrics are only present when [model_routing] is enabled.
| Metric | Type | Description |
|---|---|---|
model_routing_requests_total{tier="..."} | Counter | Requests per tier (trivial/simple/moderate/complex) |
model_routing_template_hits_total | Counter | Trivial queries answered by template |
model_routing_upgrades_total | Counter | Tier upgrades due to low confidence |
Grafana Dashboard Panels
The pre-built PiSovereign dashboard includes:
Overview Row
| Panel | Description |
|---|---|
| Uptime | Application uptime counter |
| Inference Status | Health indicator |
| Total Requests | Cumulative request count |
| Active Requests | Current in-flight requests |
| Avg Response Time | Mean latency |
| Total Tokens | LLM tokens generated |
HTTP Requests Row
| Panel | Visualization | Description |
|---|---|---|
| Request Rate | Time series | Requests/second over time |
| Status Distribution | Pie chart | Success/error breakdown |
| Response Time P50/P90/P99 | Stat | Latency percentiles |
Inference Row
| Panel | Visualization | Description |
|---|---|---|
| Inference Rate | Time series | Inferences/second |
| Inference Latency | Gauge | Current avg latency |
| Token Rate | Time series | Tokens/second |
| Model Usage | Table | Per-model statistics |
System Row
| Panel | Description |
|---|---|
| CPU Usage | System CPU utilization |
| Memory Usage | RAM usage |
| Disk I/O | Storage throughput |
| Network I/O | Network traffic |
Alerting
Alert rules are pre-configured in docker/prometheus/rules/ (if present) or can be added:
# prometheus/rules/pisovereign.yml
groups:
- name: pisovereign
rules:
- alert: PiSovereignDown
expr: up{job="pisovereign"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PiSovereign is down"
- alert: InferenceEngineUnhealthy
expr: inference_healthy == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Inference engine is unhealthy"
- alert: HighResponseTime
expr: http_response_time_avg_ms > 5000
for: 5m
labels:
severity: warning
annotations:
summary: "Average response time is {{ $value }}ms"
- alert: HighErrorRate
expr: rate(http_requests_server_error_total[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Server error rate is {{ $value | humanizePercentage }}"
- alert: InferenceFailures
expr: rate(inference_requests_failed_total[5m]) / rate(inference_requests_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Inference failure rate is {{ $value | humanizePercentage }}"
Log Aggregation
Loki and Promtail are included in the monitoring profile. Logs from all Docker containers are automatically collected and available in Grafana under the Loki data source.
To query logs in Grafana:
- Go to Explore → select Loki data source
- Use LogQL queries:
{container="pisovereign"} |= "error"
{container="ollama"} | json | level="error"
Resource Optimization
If running on constrained hardware, tune these settings:
# In docker/prometheus/prometheus.yml
global:
scrape_interval: 30s # Increase from 15s to reduce load
# Prometheus storage flags (in compose.yml command)
--storage.tsdb.retention.time=3d # Reduce from 7d
--storage.tsdb.retention.size=500MB # Cap storage
# In docker/loki/loki.yml
limits_config:
retention_period: 72h # 3 days instead of 7
Troubleshooting
Metrics not appearing
# Check PiSovereign exposes metrics
curl http://localhost:3000/metrics/prometheus
# Check Prometheus scrape targets
curl http://localhost:9090/api/v1/targets
Grafana dashboard empty
- Verify time range includes recent data
- Check Prometheus data source is connected (Settings → Data Sources)
- Query Prometheus directly at
http://localhost:9090/graph
Next Steps
- Backup & Restore — Protect your data
- Security Hardening — Secure monitoring endpoints