Monitoring

Prometheus metrics, Grafana dashboards, Loki log aggregation, and alerting

Overview

The monitoring stack is included in Docker Compose and activated with a single profile flag:

docker compose --profile monitoring up -d

This starts Prometheus, Grafana, Loki, Promtail, Node Exporter, and the OpenTelemetry Collector — all pre-configured to scrape PiSovereign metrics and collect logs.

┌─────────────────┐
│   PiSovereign   │
│  /metrics/      │
│  prometheus     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌─────────────────┐
│   Prometheus    │────▶│    Grafana      │
│   (Metrics)     │     │  (Dashboards)   │
└─────────────────┘     └─────────────────┘

┌─────────────────┐     ┌─────────────────┐
│    Promtail     │────▶│      Loki       │
│  (Log Shipper)  │     │  (Log Storage)  │
└─────────────────┘     └─────────────────┘

Resource Usage (Raspberry Pi 5)

ComponentMemoryStorage/Day
Prometheus~100 MB~50 MB
Grafana~150 MBMinimal
Loki~200 MB~100 MB
Promtail~30 MB
Total~480 MB~150 MB

Accessing Dashboards

After enabling the monitoring profile:

ServiceURL
Grafanahttp://localhost/grafana (via Traefik)
Prometheushttp://localhost:9090

Default Grafana credentials are admin / admin (change on first login). Dashboards and data sources are auto-provisioned — no manual setup required.


Prometheus Metrics

PiSovereign exposes metrics at /metrics/prometheus:

Application Metrics

MetricTypeDescription
app_uptime_secondsCounterApplication uptime
app_version_infoGaugeVersion information

HTTP Metrics

MetricTypeDescription
http_requests_totalCounterTotal HTTP requests
http_requests_success_totalCounter2xx responses
http_requests_client_error_totalCounter4xx responses
http_requests_server_error_totalCounter5xx responses
http_requests_activeGaugeActive requests
http_response_time_avg_msGaugeAverage response time
http_response_time_ms_bucketHistogramResponse time distribution

Inference Metrics

MetricTypeDescription
inference_requests_totalCounterTotal inference requests
inference_requests_success_totalCounterSuccessful inferences
inference_requests_failed_totalCounterFailed inferences
inference_time_avg_msGaugeAverage inference time
inference_time_ms_bucketHistogramInference time distribution
inference_tokens_totalCounterTotal tokens generated
inference_healthyGaugeHealth status (0/1)

Cache Metrics

MetricTypeDescription
cache_hits_totalCounterCache hits
cache_misses_totalCounterCache misses
cache_sizeGaugeCurrent cache size

Model Routing Metrics

These metrics are only present when [model_routing] is enabled.

MetricTypeDescription
model_routing_requests_total{tier="..."}CounterRequests per tier (trivial/simple/moderate/complex)
model_routing_template_hits_totalCounterTrivial queries answered by template
model_routing_upgrades_totalCounterTier upgrades due to low confidence

Grafana Dashboard Panels

The pre-built PiSovereign dashboard includes:

Overview Row

PanelDescription
UptimeApplication uptime counter
Inference StatusHealth indicator
Total RequestsCumulative request count
Active RequestsCurrent in-flight requests
Avg Response TimeMean latency
Total TokensLLM tokens generated

HTTP Requests Row

PanelVisualizationDescription
Request RateTime seriesRequests/second over time
Status DistributionPie chartSuccess/error breakdown
Response Time P50/P90/P99StatLatency percentiles

Inference Row

PanelVisualizationDescription
Inference RateTime seriesInferences/second
Inference LatencyGaugeCurrent avg latency
Token RateTime seriesTokens/second
Model UsageTablePer-model statistics

System Row

PanelDescription
CPU UsageSystem CPU utilization
Memory UsageRAM usage
Disk I/OStorage throughput
Network I/ONetwork traffic

Alerting

Alert rules are pre-configured in docker/prometheus/rules/ (if present) or can be added:

# prometheus/rules/pisovereign.yml
groups:
  - name: pisovereign
    rules:
      - alert: PiSovereignDown
        expr: up{job="pisovereign"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "PiSovereign is down"

      - alert: InferenceEngineUnhealthy
        expr: inference_healthy == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Inference engine is unhealthy"

      - alert: HighResponseTime
        expr: http_response_time_avg_ms > 5000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Average response time is {{ $value }}ms"

      - alert: HighErrorRate
        expr: rate(http_requests_server_error_total[5m]) / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Server error rate is {{ $value | humanizePercentage }}"

      - alert: InferenceFailures
        expr: rate(inference_requests_failed_total[5m]) / rate(inference_requests_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Inference failure rate is {{ $value | humanizePercentage }}"

Log Aggregation

Loki and Promtail are included in the monitoring profile. Logs from all Docker containers are automatically collected and available in Grafana under the Loki data source.

To query logs in Grafana:

  1. Go to Explore → select Loki data source
  2. Use LogQL queries:
{container="pisovereign"} |= "error"
{container="ollama"} | json | level="error"

Resource Optimization

If running on constrained hardware, tune these settings:

# In docker/prometheus/prometheus.yml
global:
  scrape_interval: 30s  # Increase from 15s to reduce load

# Prometheus storage flags (in compose.yml command)
--storage.tsdb.retention.time=3d    # Reduce from 7d
--storage.tsdb.retention.size=500MB # Cap storage
# In docker/loki/loki.yml
limits_config:
  retention_period: 72h  # 3 days instead of 7

Troubleshooting

Metrics not appearing

# Check PiSovereign exposes metrics
curl http://localhost:3000/metrics/prometheus

# Check Prometheus scrape targets
curl http://localhost:9090/api/v1/targets

Grafana dashboard empty

  1. Verify time range includes recent data
  2. Check Prometheus data source is connected (Settings → Data Sources)
  3. Query Prometheus directly at http://localhost:9090/graph

Next Steps