PiSovereign Documentation
A self-hosted, privacy-first AI assistant platform — deploy anywhere with Docker Compose.
Welcome to the official PiSovereign documentation. This guide covers everything from first deployment to production operations and development.
Introduction
PiSovereign runs a complete AI assistant stack on your own hardware. All inference stays local via Ollama — no data ever leaves your network. It deploys as a set of Docker containers on any Linux or macOS host and is optimized for the Raspberry Pi 5 with Hailo-10H NPU.
Core Principles:
- Privacy First — All processing happens locally on your hardware
- GDPR Compliant — No data leaves your network
- Open Source — MIT licensed, fully auditable,
#![forbid(unsafe_code)] - Extensible — Clean Architecture with Ports & Adapters
Key Features
| Feature | Description |
|---|---|
| Local LLM Inference | Ollama with dynamic model routing by task complexity |
| Signal & WhatsApp | Bidirectional messaging with voice message support |
| Voice Processing | Local STT (whisper.cpp) and TTS (Piper), optional OpenAI fallback |
| Calendar & Contacts | CalDAV/CardDAV (Baïkal, Radicale, Nextcloud) |
| IMAP/SMTP with any provider (Gmail, Outlook, Proton Mail) | |
| Weather & Transit | Open-Meteo forecasts, German public transit via HAFAS |
| Web Search | Brave Search with automatic DuckDuckGo fallback |
| Persistent Memory | RAG with embeddings, decay, deduplication, XChaCha20 encryption |
| Reminders | Natural language scheduling with morning briefings |
| Agentic Mode | Multi-agent orchestration for complex tasks with parallel sub-agents |
| Secret Management | HashiCorp Vault with AppRole authentication |
| Observability | Prometheus, Grafana, Loki, OpenTelemetry |
| Docker Compose | Single-command deployment with optional monitoring and CalDAV profiles |
Quick Links
User Guide
| Document | Description |
|---|---|
| Getting Started | 5-minute Docker deployment |
| Hardware Setup | Raspberry Pi 5 + Hailo-10H assembly |
| Docker Setup | Detailed deployment and operations guide |
| Vault Setup | Secret management with HashiCorp Vault |
| Configuration | All config.toml options |
| External Services | WhatsApp, email, CalDAV, search setup |
| Signal Setup | Signal messenger registration |
| Reminder System | Reminders and morning briefings |
| Troubleshooting | Common issues and solutions |
Developer Guide
| Document | Description |
|---|---|
| Architecture | Clean Architecture overview |
| Memory System | RAG pipeline and encryption |
| Contributing | Development setup and workflow |
| Crate Reference | All 16 workspace crates documented |
| API Reference | REST API with OpenAPI spec |
Operations & Security
| Document | Description |
|---|---|
| Production Deployment | TLS, production config, multi-arch builds |
| Monitoring | Prometheus, Grafana, Loki, alerting |
| Backup & Restore | Data protection and recovery |
| Security Hardening | Application, network, and Vault security |
Getting Help
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
- Security Issues: Report vulnerabilities privately via GitHub Security Advisories
Features at a Glance
A teenager-friendly guide to PiSovereign’s architecture and features
What is PiSovereign? It’s your own private AI assistant that runs on your computer (or a Raspberry Pi) instead of sending your data to the cloud. Think of it as having ChatGPT, but it lives in your house and keeps all your conversations private.
This page explains all the cool stuff PiSovereign can do using simple terms and real-world comparisons.
How It’s Built (Architecture)
PiSovereign is organized like a well-run school where each department has clear responsibilities and rules about who talks to whom.
The Layer Cake
┌─────────────────────────────────────────────────────────────┐
│ 🖥️ PRESENTATION (What you see and interact with) │
│ Web UI, REST API, Command Line │
├─────────────────────────────────────────────────────────────┤
│ 🔌 INFRASTRUCTURE (The plumbing) │
│ Database, Cache, Secrets, Metrics │
├─────────────────────────────────────────────────────────────┤
│ 🔗 INTEGRATION (Connections to the outside world) │
│ WhatsApp, Signal, Email, Calendar, Weather, Transit │
├─────────────────────────────────────────────────────────────┤
│ 🧠 AI (The smart stuff) │
│ LLM Inference, Speech-to-Text, Text-to-Speech │
├─────────────────────────────────────────────────────────────┤
│ ⚙️ APPLICATION (Business logic) │
│ Services, Use Cases, Rules │
├─────────────────────────────────────────────────────────────┤
│ 💎 DOMAIN (Core rules and data) │
│ Entities, Value Objects, Commands │
└─────────────────────────────────────────────────────────────┘
The Golden Rule: Inner layers never depend on outer layers. The Domain layer doesn’t care if you’re using WhatsApp or a CLI — it just knows about messages and conversations.
Architecture Patterns Explained
| Pattern | Real-World Analogy | What It Does |
|---|---|---|
| Clean Architecture | A school with separate buildings for classes, admin, and sports | Keeps code organized so the AI brain doesn’t need to know about databases |
| Ports & Adapters | Universal phone charger that fits any outlet | Different services (WhatsApp, Email) plug in without changing the core code |
| Decorator Chain | Matryoshka (Russian nesting) dolls | Each layer wraps the previous one, adding features like caching or sanitization |
| Dependency Injection | LEGO bricks that snap together | Easy to swap real services for test versions without rewriting code |
| Event-Driven | Waiter who takes your order while the kitchen cooks | Background tasks run without making you wait for responses |
| Circuit Breaker | Electrical fuse that prevents house fires | When a service fails repeatedly, stop trying and use a backup plan |
| Multi-Layer Cache | Sticky notes (fast) + notebook (permanent) | Frequently used data stays in memory; everything else on disk |
Feature Quick Reference
Here’s everything PiSovereign can do, explained simply:
⚡ Performance Features
| Feature | What It Does | Real-World Analogy | Why It’s Cool |
|---|---|---|---|
| Adaptive Model Routing | Sends easy questions to small, fast AI; hard questions to bigger AI | Express checkout vs. full-service lane at the grocery store | 4× faster for simple questions |
| Semantic Caching | Remembers similar questions you asked before, even if worded differently | A teacher who remembers “What’s 2+2?” and “Two plus two equals?” are the same question | No waiting for repeat questions |
| Multi-Layer Cache | Stores answers in fast memory + disk backup | Sticky notes on your desk (fast) + a notebook in your drawer (permanent) | Under 1ms for cached answers |
| In-Process Event Bus | Handles background work (saving memories, logging) without slowing your reply | A restaurant where the waiter takes your order while another waiter clears tables | 100-500ms saved per message |
| Proactive Pre-Computation | Prepares common answers before you ask | A friend who checks the weather before your camping trip | Instant morning briefings |
| Template Responder | Answers trivial questions instantly without using AI | Automated phone menu for simple requests | Under 10ms for “Hello!” |
🧠 AI Features
| Feature | What It Does | Real-World Analogy | Why It’s Cool |
|---|---|---|---|
| ReAct Agent (Tool Calling) | AI can use 18 tools: check weather, search web, read calendar, send emails | An assistant who can look things up instead of just guessing | AI acts, not just talks |
| Multi-Agent Orchestration | Multiple AIs work together on complex tasks | A group project where each person handles their specialty | Parallel work = faster results |
| RAG Memory System | Remembers your preferences, name, and past conversations | A personal diary that the AI actually reads | “Hey, I remember you like dark mode!” |
| Fact Extraction | Automatically pulls important facts from conversations | Highlighting key points in a textbook | Never forgets important stuff |
| Complexity Classification | Figures out how hard a question is before answering | A teacher deciding if it’s a pop quiz or a final exam | Right-sized AI for every question |
🔒 Security Features
| Feature | What It Does | Real-World Analogy | Why It’s Cool |
|---|---|---|---|
| Prompt Injection Defense | Blocks 60+ patterns of attempts to trick the AI | A bouncer checking IDs at a club entrance | Stops “ignore your instructions” attacks |
| Output Sanitization | Hides sensitive info (passwords, credit cards, emails) from responses | A TV censor bleeping out swear words | PII protection with 17 detection patterns |
| Context Sanitization | Cleans external data (web results, tool outputs) before feeding to AI | Airport security scanning luggage | Blocks hidden malicious instructions |
| Secret Management | Stores API keys and passwords in a secure vault | A safe with a combination lock | Secrets never appear in logs |
| Encryption at Rest | Encrypts your memories and conversations on disk | A locked diary with a key only you have | XChaCha20-Poly1305 encryption |
| Rate Limiting | Prevents abuse by limiting requests per minute | A “take a number” system at a deli counter | Auto-cleanup of old entries |
| HMAC Tool Receipts | Signs tool results to detect tampering | A wax seal on a medieval letter | Cryptographic proof nothing was changed |
🔊 Speech Features
| Feature | What It Does | Real-World Analogy | Why It’s Cool |
|---|---|---|---|
| Speech-to-Text (STT) | Converts voice messages to text | A court stenographer | Local processing via Whisper |
| Text-to-Speech (TTS) | Reads responses aloud | An audiobook narrator | Piper voices for natural speech |
| Hybrid Provider | Falls back to OpenAI if local processing fails | Having a backup phone charger | 99.9% uptime even when hardware struggles |
Integrations (External Services)
PiSovereign connects to 8 external services. Each one plugs in via the Ports & Adapters pattern, so adding new ones is easy.
| Service | What You Can Do | Example Commands |
|---|---|---|
| Send and receive messages via WhatsApp Cloud API | “Send Mom: Don’t forget the groceries!” | |
| Signal | Private encrypted messaging via signal-cli | “Message my Signal group: meeting at 5pm” |
| Calendar (CalDAV) | View, create, and manage events on any CalDAV server | “What’s on my calendar this week?” |
| Contacts (CardDAV) | Look up phone numbers and emails | “What’s Sarah’s email address?” |
| Email (IMAP/SMTP) | Read inbox, search, draft, and send emails | “Any new emails from GitHub?” |
| Weather (Open-Meteo) | Current conditions and 7-day forecast | “Will it rain tomorrow in Berlin?” |
| Web Search (Brave/DDG) | Search the internet privately | “Search for vegan pasta recipes” |
| Transit (HAFAS) | German public transport schedules | “Next train from Munich to Hamburg?” |
How a Request Flows Through the System
Here’s what happens when you ask “What’s the weather tomorrow?”:
1. 📱 You send message via Web UI / WhatsApp / Signal
│
▼
2. 🚦 Adaptive Model Routing classifies complexity
│ → "weather question" = Simple tier
│
▼
3. 💾 Check Semantic Cache
│ → Similar question asked before? Return cached answer!
│ → No hit? Continue...
│
▼
4. 🧠 ReAct Agent decides to use the weather tool
│ → Calls Open-Meteo API
│ → Sanitizes the result (removes any hidden tricks)
│
▼
5. 🤖 Small AI model (gemma3:1b) formats the response
│
▼
6. 📤 Output Sanitizer checks for leaked secrets
│
▼
7. 💬 Response sent back to you: "Tomorrow: 18°C, partly cloudy"
│
▼
8. 📝 Event Bus (background): Save to cache, extract facts, log metrics
Total time: ~500ms (vs. 5-8 seconds without optimizations)
The 18 Crates (Code Modules)
PiSovereign is split into 18 Rust crates (think of them as LEGO sets that snap together):
| Layer | Crate | One-Line Description |
|---|---|---|
| Domain | domain | Core rules: what is a message, user, conversation? |
| Application | application | Business logic: how do we handle a chat request? |
| AI | ai_core | Ollama LLM inference and model routing |
| AI | ai_speech | Speech-to-text and text-to-speech |
| Infrastructure | infrastructure | Database, cache, secrets, metrics adapters |
| Integration | integration_whatsapp | WhatsApp Cloud API connector |
| Integration | integration_signal | Signal messenger via signal-cli |
| Integration | integration_caldav | CalDAV calendar protocol |
| Integration | integration_carddav | CardDAV contacts protocol |
| Integration | integration_email | IMAP/SMTP email |
| Integration | integration_weather | Open-Meteo weather API |
| Integration | integration_websearch | Brave Search + DuckDuckGo fallback |
| Integration | integration_transit | German public transit (HAFAS) |
| Presentation | presentation_http | REST API with Axum web framework |
| Presentation | presentation_cli | Command-line interface |
| Presentation | presentation_web | SolidJS web frontend |
Technology Stack
| Category | Technology | Why We Use It |
|---|---|---|
| Language | Rust 2024 | Fast, safe, no garbage collector pauses |
| Async Runtime | Tokio | Handle thousands of requests concurrently |
| Web Framework | Axum | Type-safe, fast HTTP handling |
| Frontend | SolidJS + Tailwind CSS | Reactive UI without React’s overhead |
| Database | PostgreSQL + pgvector | SQL + vector similarity search |
| Cache | Moka (memory) + Redb (disk) | Multi-layer for speed + persistence |
| LLM | Ollama | Run AI models locally, no cloud needed |
| Secrets | HashiCorp Vault | Enterprise-grade secret storage |
| Containers | Docker Compose | Easy deployment with profiles |
| Observability | Prometheus + Grafana + Loki | Metrics, dashboards, logs |
Glossary
| Term | Simple Definition |
|---|---|
| LLM | Large Language Model — the AI brain that generates text |
| RAG | Retrieval-Augmented Generation — giving the AI context from your memories |
| Embedding | Converting text to numbers so computers can measure similarity |
| Inference | The process of the AI generating a response |
| Port | An interface (contract) that says “I need X capability” |
| Adapter | A concrete implementation that fulfills a port’s contract |
| Decorator | A wrapper that adds behavior to something without changing it |
| Circuit Breaker | Pattern that stops calling a failing service to let it recover |
| Event Bus | A message highway where components publish and subscribe to events |
| STT/TTS | Speech-to-Text / Text-to-Speech |
| CalDAV/CardDAV | Calendar / Contact protocols (like HTTP for calendars) |
| HAFAS | German public transit API standard |
Learn More
Want the full technical details? Check out these pages:
- Architecture Deep Dive — The complete system design
- Tool Calling (ReAct Agent) — How the AI uses tools
- Memory System — RAG and long-term memory
- Model Routing — Complexity-based AI selection
- Security Hardening — All security measures
- API Reference — REST API documentation
Getting Started
Get PiSovereign running in under 5 minutes
PiSovereign is deployed as a set of Docker containers using Docker Compose. This is the only supported installation method.
Prerequisites
- Docker Engine 24+ with Docker Compose v2
- 8 GB RAM recommended (4 GB minimum)
- 20 GB disk space (models + data)
- A domain name with DNS pointing to your server (for HTTPS)
Quick Start
# Clone the repository
git clone https://github.com/twohreichel/PiSovereign.git
cd PiSovereign/docker
# Create your environment file
cp .env.example .env
nano .env # Set PISOVEREIGN_DOMAIN and TRAEFIK_ACME_EMAIL
# Start all core services
docker compose up -d
# Initialize Vault (first run only — save the output!)
docker compose exec vault /vault/init.sh
# Wait for model download to complete
docker compose logs -f ollama-init
What Gets Deployed
| Service | Description |
|---|---|
| PiSovereign | AI assistant application |
| Traefik | HTTPS reverse proxy with Let’s Encrypt |
| Vault | Secret management (API keys, passwords) |
| Ollama | LLM inference engine |
| Signal-CLI | Signal messenger integration |
| Whisper | Speech-to-text processing |
| Piper | Text-to-speech synthesis |
Post-Setup
- Store secrets in Vault — See Vault Setup
- Register Signal number — See Signal Setup
- Configure integrations — See External Services
- Enable monitoring (optional) —
docker compose --profile monitoring up -d
Verify Installation
# Check all services are running
docker compose ps
# Test the health endpoint
curl https://your-domain.example.com/health
# Check individual services
curl https://your-domain.example.com/health/inference
curl https://your-domain.example.com/health/vault
Next Steps
-
Docker Setup — Full deployment reference and operations
-
Configuration Reference — All available settings
-
Docker Setup — Detailed deployment guide
-
Configuration Reference — All configuration options
-
Troubleshooting — Common issues and solutions
Hardware Setup
Hardware assembly guide for Raspberry Pi 5 with Hailo-10H AI HAT+
This guide covers the physical hardware setup. For software installation, see the Docker Setup guide.
Required Components
| Component | Recommended Model | Notes |
|---|---|---|
| Raspberry Pi 5 | 8 GB RAM variant | 4 GB works but limits concurrent operations |
| Hailo AI HAT+ 2 | Hailo-10H (26 TOPS) | Mounts via 40-pin GPIO + PCIe |
| Power Supply | Official 27W USB-C | Required for HAT+ power delivery |
| Cooling | Active Cooler for Pi 5 | Essential for sustained AI inference |
| Storage | NVMe SSD (256 GB+) | Via Hailo HAT+ PCIe or separate HAT |
| MicroSD Card | 32 GB+ Class 10 | For boot (if not using NVMe boot) |
| Case | Official Pi 5 Case (tall) | Must accommodate HAT+ height |
Assembly Instructions
Important: Always work on a static-free surface and handle boards by edges only.
Step 1: Prepare the Raspberry Pi
- Unbox the Raspberry Pi 5
- Attach the Active Cooler:
- Remove the protective film from the thermal pad
- Align with the CPU and press firmly
- Connect the 4-pin fan connector to the FAN header
Step 2: Install the Hailo AI HAT+
- Locate the 40-pin GPIO header on the Pi
- Align the Hailo HAT+ with the GPIO pins
- Gently press down until fully seated (approximately 3mm gap)
- Connect the PCIe FPC cable:
- Open the Pi 5’s PCIe connector latch
- Insert the flat cable (contacts facing down)
- Close the latch to secure
Step 3: Install Storage (Optional NVMe)
If using the Hailo HAT+ built-in M.2 slot:
- Insert NVMe SSD into M.2 slot (M key, 2242/2280)
- Secure with the provided screw
Step 4: Enclose and Power
- Place assembly in case
- Connect Ethernet cable (recommended over WiFi for production)
- Connect power supply
OS Installation
Flash Raspberry Pi OS
-
Install Raspberry Pi Imager on your computer
-
Choose Device: Raspberry Pi 5
-
Choose OS: Raspberry Pi OS Lite (64-bit)
-
Click Edit Settings:
- Set hostname:
pisovereign - Set username and strong password
- Enable SSH with public-key authentication
- Set your timezone
- Set hostname:
-
Flash to SD card / NVMe
First Boot
# SSH into the Pi
ssh pi@pisovereign.local
# Update system
sudo apt update && sudo apt full-upgrade -y
# Install Docker (required for PiSovereign)
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
# Log out and back in for group change
exit
Configure Boot (Optional NVMe)
sudo raspi-config
- Advanced Options → Boot Order → NVMe/USB Boot
Next Steps
Once hardware is assembled and Docker is installed, proceed to the Docker Setup guide for PiSovereign deployment.
Docker Setup
Production deployment guide using Docker Compose
PiSovereign runs as a set of Docker containers orchestrated by Docker Compose. This is the recommended and only supported deployment method.
Prerequisites
- Docker Engine 24+ and Docker Compose v2
- 4 GB+ RAM (8 GB recommended)
- 20 GB+ free disk space
Install Docker if not already installed:
# Raspberry Pi / Debian / Ubuntu
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
# Log out and back in
# macOS
brew install --cask docker
Quick Start
# 1. Clone the repository
git clone https://github.com/twohreichel/PiSovereign.git
cd PiSovereign/docker
# 2. Configure environment
cp .env.example .env
# Edit .env with your domain and email for TLS certificates
nano .env
# 3. Start core services
docker compose up -d
# 4. Initialize Vault (first time only)
docker compose exec vault /vault/init.sh
# Save the unseal key and root token printed to stdout!
# 5. Wait for Ollama model download
docker compose logs -f ollama-init
PiSovereign is now running at https://your-domain.example.com.
Architecture
The deployment consists of these core services:
| Service | Purpose | Port | URL |
|---|---|---|---|
| pisovereign | Main application server | 3000 (internal) | http://localhost/ via Traefik |
| traefik | Reverse proxy + TLS | 80, 443 | http://localhost:80 |
| vault | Secret management | 8200 (internal) | Internal only |
| ollama | LLM inference engine | 11434 (internal) | Internal only |
| signal-cli | Signal messenger daemon | Unix socket | Internal only |
| whisper | Speech-to-text (STT) | 8081 (internal) | Internal only |
| piper | Text-to-speech (TTS) | 8082 (internal) | Internal only |
Monitoring Stack (profile: monitoring)
| Service | Purpose | Port | URL |
|---|---|---|---|
| prometheus | Metrics collection & alerting | 9090 | http://localhost:9090 |
| grafana | Dashboards & visualization | 3000 (internal) | http://localhost/grafana via Traefik |
| loki | Log aggregation | 3100 (internal) | Internal only |
| promtail | Log shipping agent | — | Internal only |
| node-exporter | Host metrics exporter | 9100 (internal) | Internal only |
| otel-collector | OpenTelemetry Collector | 4317/4318 (internal) | Internal only |
CalDAV Server (profile: caldav)
| Service | Purpose | Port | URL |
|---|---|---|---|
| baikal | CalDAV/CardDAV server | 80 (internal) | http://localhost/caldav via Traefik |
Key Endpoints
| Endpoint | Description |
|---|---|
http://localhost/health | Application health check |
http://localhost/metrics/prometheus | Prometheus metrics scrape target |
http://localhost/grafana | Grafana dashboards (monitoring profile) |
http://localhost/caldav | Baïkal CalDAV web UI (caldav profile) |
http://localhost:9090 | Prometheus web UI (monitoring profile) |
http://localhost:9090/targets | Prometheus scrape target status |
Configuration
Environment Variables
Edit docker/.env before starting:
# Your domain (required for TLS)
PISOVEREIGN_DOMAIN=pi.example.com
# Email for Let's Encrypt certificates
TRAEFIK_ACME_EMAIL=you@example.com
# Vault root token (set after vault init)
VAULT_TOKEN=hvs.xxxxx
# Container image version
PISOVEREIGN_VERSION=latest
# Email provider preset: proton (default), gmail, or custom
EMAIL_PROVIDER=proton
Note: On first startup, PiSovereign automatically populates 32 default system commands and validates Vault credentials, logging warnings for any missing or invalid secrets. Check the container logs after first startup to verify all integrations are configured correctly.
Application Config
The main application config is at docker/config/config.toml.
All service hostnames use Docker network names (e.g., ollama:11434).
See Configuration Reference for all options.
Storing Secrets in Vault
After Vault initialization, store your integration secrets:
# Enter Vault container
docker compose exec vault sh
# Store WhatsApp credentials
vault kv put secret/pisovereign/whatsapp \
access_token="your-meta-token" \
app_secret="your-app-secret"
# Store Brave Search API key
vault kv put secret/pisovereign/websearch \
api_key="your-brave-api-key"
# Store CalDAV credentials
vault kv put secret/pisovereign/caldav \
password="your-caldav-password"
# Store email credentials (IMAP/SMTP)
vault kv put secret/pisovereign/email \
password="your-email-password"
Docker Compose Profiles
Additional services are available via profiles (see tables above for URLs):
Monitoring Stack
docker compose --profile monitoring up -d
CalDAV Server
docker compose --profile caldav up -d
All Profiles
docker compose --profile monitoring --profile caldav up -d
Signal Registration (Docker)
Signal requires a one-time registration before messages can be sent/received.
1. Set your phone number
Edit docker/.env and set your phone number in E.164 format:
SIGNAL_CLI_NUMBER=+491701234567
This automatically configures both the PiSovereign application and can be stored in Vault for secure persistence.
2. Register with Signal
# Register via SMS
docker compose exec signal-cli signal-cli -a +491701234567 register
# Or register via voice call
docker compose exec signal-cli signal-cli -a +491701234567 register --voice
3. Verify the code
# Enter the verification code received via SMS/voice
docker compose exec signal-cli signal-cli -a +491701234567 verify 123-456
4. Store in Vault (optional)
For production, store the phone number in Vault so it’s managed centrally:
docker compose exec vault vault kv put secret/pisovereign/signal \
phone_number="+491701234567"
The application loads the phone number in this priority order:
- config.toml —
[signal] phone_number = "..." - Environment variable —
PISOVEREIGN_SIGNAL__PHONE_NUMBER(set via.env) - Vault —
secret/pisovereign/signal→phone_number
5. Restart and verify
docker compose restart pisovereign
docker compose logs pisovereign | grep -i signal
For the full Signal setup guide, see Signal Setup.
Operations
Updating
cd docker
# Pull latest images
docker compose pull
# Recreate containers with new images
docker compose up -d
Vault Management
# Check Vault status
docker compose exec vault vault status
# Unseal after restart (use key from init)
docker compose exec vault vault operator unseal <UNSEAL_KEY>
# Read a secret
docker compose exec vault vault kv get secret/pisovereign/whatsapp
Logs
# Follow all logs
docker compose logs -f
# Specific service
docker compose logs -f pisovereign
# Last 100 lines
docker compose logs --tail=100 pisovereign
Backup
# Stop services
docker compose down
# Backup volumes
docker run --rm -v pisovereign-data:/data -v $(pwd):/backup \
alpine tar czf /backup/pisovereign-backup-$(date +%Y%m%d).tar.gz /data
# Restart
docker compose up -d
Troubleshooting
See the Troubleshooting guide for common issues.
GPU Acceleration
By default, Ollama runs CPU-only inside Docker. For GPU-accelerated inference:
- macOS (Metal): Run Ollama natively and set
OLLAMA_BASE_URLin.env - Linux (NVIDIA): Use
docker compose -f compose.yml -f compose.gpu-nvidia.yml up -d - Linux (AMD/ROCm): Create a
compose.override.ymlwith the ROCm image
See the full GPU Acceleration guide for setup instructions.
GPU Acceleration
Run Ollama with GPU acceleration for faster LLM inference
By default, PiSovereign runs Ollama inside a Docker container using CPU-only
inference. With GPU acceleration, inference speed improves dramatically —
especially for larger models like qwen2.5:14b or qwen2.5:32b.
Platform Overview
| Platform | GPU Access | Method |
|---|---|---|
| macOS (Apple Silicon / Intel) | Metal | Native Ollama (hybrid mode) |
| Linux + NVIDIA GPU | CUDA | Compose override file |
| Linux + AMD GPU | ROCm | Manual compose override |
| Raspberry Pi + Hailo | NPU | See Hardware Setup |
macOS — Native Ollama with Metal GPU
Docker Desktop on macOS runs containers inside a Linux VM and cannot pass through the Metal GPU. To use GPU acceleration, run Ollama natively on the host and point PiSovereign’s Docker container at it.
1. Install Ollama
brew install ollama
2. Start Ollama
ollama serve
Ollama will listen on http://localhost:11434 and automatically use Metal for
GPU-accelerated inference on Apple Silicon (M1/M2/M3/M4) or Intel Macs.
3. Pull the inference model
# Default model (recommended for 16 GB+ RAM)
ollama pull qwen2.5:14b
# Embedding model (required)
ollama pull nomic-embed-text
4. Configure Docker environment
Edit docker/.env and set:
OLLAMA_BASE_URL=http://host.docker.internal:11434
This tells the PiSovereign container to connect to the native Ollama instance
via Docker’s host.docker.internal bridge (already configured in
compose.yml via extra_hosts).
5. Start PiSovereign
# From the repository root
just docker-up
# Or directly
cd docker && docker compose up -d
Note: The Ollama Docker container will still start but is unused. It runs idle with minimal resource consumption. The PiSovereign container connects to native Ollama via the configured
OLLAMA_BASE_URL.
Verify GPU is active
# Check Ollama is using Metal
ollama ps
# Should show "metal" in the processor column
# Test inference
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:14b",
"prompt": "Hello",
"stream": false
}'
Linux — NVIDIA GPU
On Linux with an NVIDIA GPU, Ollama runs inside Docker with full GPU passthrough via the NVIDIA Container Toolkit.
1. Install NVIDIA Container Toolkit
# Add the NVIDIA repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
2. Verify GPU is visible to Docker
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
This should display your GPU model, driver version, and CUDA version.
3. Start with GPU override
# From the repository root
just docker-up-gpu
# Or directly
cd docker && docker compose -f compose.yml -f compose.gpu-nvidia.yml up -d
This merges compose.gpu-nvidia.yml into the Ollama service, adding NVIDIA GPU
device reservations and higher resource limits. The same ollama service is
used — only the resource configuration is overridden.
4. Verify GPU inference
# Check GPU layers are loaded
docker compose -f compose.yml -f compose.gpu-nvidia.yml exec ollama ollama ps
# Should show GPU layers in the "processor" column
# Check NVIDIA GPU usage
docker compose -f compose.yml -f compose.gpu-nvidia.yml exec ollama nvidia-smi
GPU Resource Limits
The GPU override file (compose.gpu-nvidia.yml) configures higher resource
limits than CPU-only:
| Setting | CPU-only | GPU (NVIDIA) |
|---|---|---|
| Memory limit | 12 GB | 24 GB |
| CPU limit | 4.0 | 8.0 |
| Parallel requests | 1 | 2 |
| Loaded models | 1 | 2 |
Adjust these in docker/compose.gpu-nvidia.yml to match your hardware.
Linux — AMD GPU (ROCm)
AMD GPU support requires the ROCm-specific Ollama image and device mappings. This is not provided as a built-in profile due to the different base image, but can be configured manually:
1. Install ROCm drivers
Follow the AMD ROCm installation guide.
2. Create a compose override
Create docker/compose.override.yml:
services:
ollama:
image: ollama/ollama:rocm
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
group_add:
- video
- render
deploy:
resources:
limits:
memory: 24G
cpus: "8.0"
environment:
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_FLASH_ATTENTION=1
3. Start services
cd docker && docker compose up -d
Docker Compose automatically merges compose.yml with compose.override.yml.
Model Configuration
The inference model is configurable via the OLLAMA_MODEL environment variable
in docker/.env. The ollama-init container pulls this model on first start.
Recommended models by VRAM / RAM
| VRAM / RAM | Model | Parameter |
|---|---|---|
| 8 GB | qwen2.5:7b | OLLAMA_MODEL=qwen2.5:7b |
| 16 GB | qwen2.5:14b | OLLAMA_MODEL=qwen2.5:14b (default) |
| 24 GB+ | qwen2.5:32b | OLLAMA_MODEL=qwen2.5:32b |
To change the model:
# Edit docker/.env
OLLAMA_MODEL=qwen2.5:32b
# Restart ollama-init to pull the new model
cd docker && docker compose restart ollama-init
# Or pull manually
just docker-model-pull qwen2.5:32b
The embedding model (nomic-embed-text) is always pulled regardless of the
OLLAMA_MODEL setting.
Troubleshooting
macOS: Ollama not reachable from Docker
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Verify Docker can reach the host
docker run --rm --add-host=host.docker.internal:host-gateway \
curlimages/curl curl -s http://host.docker.internal:11434/api/tags
# Check .env is correct
grep OLLAMA_BASE_URL docker/.env
# Should show: OLLAMA_BASE_URL=http://host.docker.internal:11434
NVIDIA: GPU not visible in container
# Check NVIDIA driver is loaded
nvidia-smi
# Check Container Toolkit is installed
nvidia-ctk --version
# Check Docker runtime
docker info | grep -i nvidia
# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
Model download fails
# Check ollama-init logs
docker compose logs ollama-init
# Pull manually
docker compose exec ollama ollama pull qwen2.5:14b
# Or via Justfile
just docker-model-pull qwen2.5:14b
Performance is slow despite GPU
# Verify GPU layers are being used
ollama ps
# The "processor" column should show "gpu" or "metal", not "cpu"
# Check if model fits in VRAM — if it spills to RAM, inference slows down
# Reduce model size if VRAM is insufficient
HashiCorp Vault Setup
Secure secret management for PiSovereign using HashiCorp Vault
Vault is included in the Docker Compose stack and initialized automatically on first run. This guide covers how secrets are structured, how to store them, and how Vault integrates with PiSovereign.
Overview
HashiCorp Vault provides centralized secret management with encryption at rest and in transit, fine-grained access control, audit logging, and secret rotation. PiSovereign’s Docker Compose setup includes Vault with automatic initialization via the vault-init sidecar container.
How It Works
┌─────────────────────────────────────────────────────┐
│ PiSovereign │
│ ┌─────────────────────────────────────────────┐ │
│ │ ChainedSecretStore │ │
│ │ ┌─────────────┐ ┌──────────────────┐ │ │
│ │ │ VaultSecret │ → │ EnvironmentSecret │ │ │
│ │ │ Store │ │ Store │ │ │
│ │ └─────────────┘ └──────────────────┘ │ │
│ └─────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ HashiCorp Vault │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────┐ │
│ │ KV v2 Engine │ │ AppRole │ │ Audit │ │
│ │ │ │ Auth │ │ Log │ │
│ └──────────────┘ └─────────────┘ └───────────┘ │
└─────────────────────────────────────────────────────┘
PiSovereign uses a ChainedSecretStore that tries multiple backends in order:
- Vault (primary) — Production secrets stored securely
- Environment variables (fallback) — Overrides for development or CI
Initialization
Vault is initialized on first deployment via the Docker Compose init container. Run manually if needed:
cd docker
docker compose exec vault /vault/init.sh
Important: Save the unseal key and root token printed to stdout. Loss of the unseal key means loss of access to secrets.
After a container restart, Vault may need to be unsealed:
docker compose exec vault vault operator unseal <UNSEAL_KEY>
Storing Secrets
Store integration credentials in Vault after initialization:
# Enter the Vault container
docker compose exec vault sh
# WhatsApp credentials
vault kv put secret/pisovereign/whatsapp \
access_token="your-meta-access-token" \
app_secret="your-app-secret"
# Email credentials (IMAP/SMTP password or Bridge password)
vault kv put secret/pisovereign/email \
password="your-email-password"
# CalDAV credentials
vault kv put secret/pisovereign/caldav \
username="your-username" \
password="your-password"
# OpenAI API key (for speech fallback)
vault kv put secret/pisovereign/openai \
api_key="sk-your-openai-key"
# Brave Search API key
vault kv put secret/pisovereign/websearch \
brave_api_key="BSA-your-key"
# Signal phone number
vault kv put secret/pisovereign/signal \
phone_number="+491701234567"
# Verify a secret
vault kv get secret/pisovereign/whatsapp
Secret Paths
PiSovereign expects secrets at these paths:
| Secret | Vault Path | Environment Variable Fallback |
|---|---|---|
| WhatsApp Access Token | secret/pisovereign/whatsapp → access_token | PISOVEREIGN_WHATSAPP_ACCESS_TOKEN |
| WhatsApp App Secret | secret/pisovereign/whatsapp → app_secret | PISOVEREIGN_WHATSAPP_APP_SECRET |
| Email Password | secret/pisovereign/email → password | PISOVEREIGN_EMAIL_PASSWORD |
| CalDAV Username | secret/pisovereign/caldav → username | PISOVEREIGN_CALDAV_USERNAME |
| CalDAV Password | secret/pisovereign/caldav → password | PISOVEREIGN_CALDAV_PASSWORD |
| OpenAI API Key | secret/pisovereign/openai → api_key | PISOVEREIGN_OPENAI_API_KEY |
| Brave Search Key | secret/pisovereign/websearch → brave_api_key | PISOVEREIGN_WEBSEARCH_BRAVE_API_KEY |
| Signal Phone Number | secret/pisovereign/signal → phone_number | PISOVEREIGN_SIGNAL__PHONE_NUMBER |
AppRole Authentication
For production, use AppRole instead of the root token. AppRole provides short-lived tokens with scoped permissions.
Create Policy
docker compose exec vault sh
vault policy write pisovereign - <<EOF
path "secret/data/pisovereign/*" {
capabilities = ["read"]
}
path "secret/metadata/pisovereign/*" {
capabilities = ["list"]
}
path "auth/token/renew-self" {
capabilities = ["update"]
}
EOF
Configure AppRole
vault auth enable approle
vault write auth/approle/role/pisovereign \
token_policies="pisovereign" \
token_ttl=1h \
token_max_ttl=4h \
secret_id_ttl=720h \
secret_id_num_uses=0
# Get Role ID
vault read auth/approle/role/pisovereign/role-id
# Generate Secret ID
vault write -f auth/approle/role/pisovereign/secret-id
Then configure PiSovereign to use AppRole in config.toml:
[vault]
address = "http://vault:8200"
role_id = "12345678-1234-1234-1234-123456789012"
secret_id = "abcd1234-abcd-1234-abcd-abcd12345678"
mount_path = "secret"
timeout_secs = 5
Tip: Store
secret_idas an environment variable rather than in the config file:export PISOVEREIGN_VAULT_SECRET_ID="abcd1234-..."
Operations
Secret Rotation
Update a secret without downtime — PiSovereign reads the latest version automatically:
vault kv put secret/pisovereign/whatsapp \
access_token="new-access-token" \
app_secret="same-app-secret"
View secret versions or rollback:
vault kv metadata get secret/pisovereign/whatsapp
vault kv rollback -version=2 secret/pisovereign/whatsapp
Backup
# Backup Vault data volume
docker run --rm -v docker_vault-data:/data -v $(pwd):/backup \
alpine tar czf /backup/vault-backup-$(date +%Y%m%d).tar.gz /data
For disaster recovery, ensure you have the unseal key and root token stored securely in a separate location.
Troubleshooting
Cannot connect to Vault
docker compose exec vault vault status
docker compose logs vault
Permission denied
# Verify the token has the correct policy
docker compose exec vault vault token lookup
docker compose exec vault vault policy read pisovereign
Secret not found
# Verify the secret exists
docker compose exec vault vault kv get secret/pisovereign/whatsapp
# Check the mount path
docker compose exec vault vault secrets list
Vault sealed after restart
docker compose exec vault vault operator unseal <UNSEAL_KEY>
Next Steps
- Configuration Reference — All PiSovereign options
- Security Hardening — Vault security best practices
- Docker Setup — Full deployment reference
Configuration Reference
⚙️ Complete reference for all PiSovereign configuration options
This document covers every configuration option available in config.toml.
Table of Contents
- Overview
- Environment Settings
- Server Settings
- Inference Engine
- Security Settings
- Memory & Knowledge Storage
- Database & Cache
- Integrations
- Model Selector
- Telemetry
- Resilience
- Health Checks
- Event Bus
- Agentic Mode
- Vault Integration
- Environment Variables
- Example Configurations
Overview
PiSovereign uses a layered configuration system:
- Default values - Built into the application
- Configuration file -
config.tomlin the working directory - Environment variables - Override config file values (prefix:
PISOVEREIGN_)
Configuration File Location
The application loads config.toml from the current working directory:
# Default location (relative to working directory)
./config.toml
Environment Variable Mapping
Config values can be overridden using environment variables:
[server]
port = 3000
# Becomes:
PISOVEREIGN_SERVER_PORT=3000
Nested values use double underscores:
[speech.local_stt]
threads = 4
# Becomes:
PISOVEREIGN_SPEECH_LOCAL_STT__THREADS=4
Environment Settings
# Application environment: "development" or "production"
# In production:
# - JSON logging is enforced
# - Security warnings block startup (unless PISOVEREIGN_ALLOW_INSECURE_CONFIG=true)
# - TLS verification is enforced
environment = "development"
| Value | Description |
|---|---|
development | Relaxed security, human-readable logs |
production | Strict security, JSON logs, TLS enforced |
Server Settings
[server]
# Network interface to bind to
# "127.0.0.1" = localhost only (recommended for security)
# "0.0.0.0" = all interfaces (use behind reverse proxy)
host = "127.0.0.1"
# HTTP port
port = 3000
# Enable CORS (Cross-Origin Resource Sharing)
cors_enabled = true
# Allowed CORS origins
# Empty array = allow all (WARNING in production)
# Example: ["https://app.example.com", "https://admin.example.com"]
allowed_origins = []
# Graceful shutdown timeout (seconds)
# Time to wait for active requests to complete
shutdown_timeout_secs = 30
# Log format: "json" or "text"
# In production mode, defaults to "json" even if set to "text"
log_format = "text"
# Secure session cookies (requires HTTPS)
# Set to false for local HTTP development
secure_cookies = false
# Maximum request body size for JSON payloads (optional, bytes)
# max_body_size_json_bytes = 1048576 # 1MB
# Maximum request body size for audio uploads (optional, bytes)
# max_body_size_audio_bytes = 10485760 # 10MB
| Option | Type | Default | Description |
|---|---|---|---|
host | String | 127.0.0.1 | Bind address |
port | Integer | 3000 | HTTP port |
cors_enabled | Boolean | true | Enable CORS |
allowed_origins | Array | [] | CORS allowed origins |
shutdown_timeout_secs | Integer | 30 | Shutdown grace period |
log_format | String | text | Log output format |
secure_cookies | Boolean | false | Secure cookie mode (HTTPS) |
max_body_size_json_bytes | Integer | 1048576 | (Optional) Max JSON payload size |
max_body_size_audio_bytes | Integer | 10485760 | (Optional) Max audio upload size |
Inference Engine
[inference]
# Ollama-compatible server URL
# Works with both hailo-ollama (Raspberry Pi) and standard Ollama (macOS)
base_url = "http://localhost:11434"
# Default model for inference
default_model = "qwen2.5:1.5b"
# Request timeout (milliseconds)
timeout_ms = 60000
# Maximum tokens to generate
max_tokens = 2048
# Sampling temperature (0.0 = deterministic, 2.0 = creative)
temperature = 0.7
# Top-p (nucleus) sampling (0.0-1.0)
top_p = 0.9
# System prompt (optional)
# system_prompt = "You are a helpful AI assistant."
| Option | Type | Default | Range | Description |
|---|---|---|---|---|
base_url | String | http://localhost:11434 | - | Inference server URL |
default_model | String | qwen2.5:1.5b | - | Model identifier |
timeout_ms | Integer | 60000 | 1000-300000 | Request timeout |
max_tokens | Integer | 2048 | 1-8192 | Max generation length |
temperature | Float | 0.7 | 0.0-2.0 | Randomness |
top_p | Float | 0.9 | 0.0-1.0 | Nucleus sampling |
system_prompt | String | None | - | (Optional) System prompt |
Security Settings
[security]
# Whitelisted phone numbers for WhatsApp
# Empty = allow all, Example: ["+491234567890", "+491234567891"]
whitelisted_phones = []
# API Keys (hashed with Argon2id)
# Generate hashed keys using: pisovereign-cli hash-api-key <your-key>
# Migrate existing plaintext keys: pisovereign-cli migrate-keys --input config.toml --dry-run
#
# [[security.api_keys]]
# hash = "$argon2id$v=19$m=19456,t=2,p=1$..."
# user_id = "550e8400-e29b-41d4-a716-446655440000"
#
# [[security.api_keys]]
# hash = "$argon2id$v=19$m=19456,t=2,p=1$..."
# user_id = "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
# Trusted reverse proxies (IP addresses) - optional
# Add your proxy IPs here if behind a reverse proxy
# trusted_proxies = ["127.0.0.1", "::1"]
# Rate limiting
rate_limit_enabled = true
rate_limit_rpm = 120 # Requests per minute per IP
# TLS settings for outbound connections
tls_verify_certs = true
connection_timeout_secs = 30
min_tls_version = "1.2" # "1.2" or "1.3"
| Option | Type | Default | Description |
|---|---|---|---|
whitelisted_phones | Array | [] | (Optional) Allowed phone numbers |
api_keys | Array | [] | API key definitions with Argon2id hash |
trusted_proxies | Array | - | (Optional) Trusted reverse proxy IPs |
rate_limit_enabled | Boolean | true | Enable rate limiting |
rate_limit_rpm | Integer | 120 | Requests/minute/IP |
tls_verify_certs | Boolean | true | Verify TLS certificates for outbound connections |
connection_timeout_secs | Integer | 30 | Connection timeout for external services |
min_tls_version | String | 1.2 | Minimum TLS version (“1.2” or “1.3”) |
Prompt Security
Protects against prompt injection and other AI security threats.
[prompt_security]
# Enable prompt security analysis
enabled = true
# Sensitivity level: "low", "medium", or "high"
# - low: Only block high-confidence threats
# - medium: Block medium and high confidence threats (recommended)
# - high: Block all detected threats including low confidence
sensitivity = "medium"
# Block requests when security threats are detected
block_on_detection = true
# Maximum violations before auto-blocking an IP
max_violations_before_block = 3
# Time window for counting violations (seconds)
violation_window_secs = 3600 # 1 hour
# How long to block an IP after exceeding max violations (seconds)
block_duration_secs = 86400 # 24 hours
# Immediately block IPs that send critical-level threats
auto_block_on_critical = true
# Custom patterns to detect (in addition to built-in patterns) - optional
# custom_patterns = ["DROP TABLE", "eval("]
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | Enable prompt security analysis |
sensitivity | String | medium | Detection level: “low”, “medium”, or “high” |
block_on_detection | Boolean | true | Block requests when threats detected |
max_violations_before_block | Integer | 3 | Violations before IP auto-block |
violation_window_secs | Integer | 3600 | Time window for counting violations |
block_duration_secs | Integer | 86400 | IP block duration after violations |
auto_block_on_critical | Boolean | true | Auto-block critical threats immediately |
custom_patterns | Array | - | (Optional) Custom threat detection patterns |
API Key Authentication
API keys are now securely hashed using Argon2id. Use the CLI tools to generate and migrate keys.
Generate a new hashed key:
pisovereign-cli hash-api-key <your-api-key>
Migrate existing plaintext keys:
pisovereign-cli migrate-keys --input config.toml --dry-run
pisovereign-cli migrate-keys --input config.toml --output config-new.toml
Configuration:
[[security.api_keys]]
hash = "$argon2id$v=19$m=19456,t=2,p=1$..."
user_id = "550e8400-e29b-41d4-a716-446655440000"
Usage:
curl -H "Authorization: Bearer <your-api-key>" http://localhost:3000/v1/chat
Memory & Knowledge Storage
Persistent AI memory for RAG-based context retrieval. Stores interactions, facts, preferences, and corrections using embeddings for semantic similarity search.
[memory]
# Enable memory storage (default: true)
# enabled = true
# Enable RAG context retrieval (default: true)
# enable_rag = true
# Enable automatic learning from interactions (default: true)
# enable_learning = true
# Number of memories to retrieve for RAG context (default: 5)
# rag_limit = 5
# Minimum similarity threshold for RAG retrieval (0.0-1.0, default: 0.5)
# rag_threshold = 0.5
# Similarity threshold for memory deduplication (0.0-1.0, default: 0.85)
# merge_threshold = 0.85
# Minimum importance score to keep memories (default: 0.1)
# min_importance = 0.1
# Decay factor for memory importance over time (default: 0.95)
# decay_factor = 0.95
# Enable content encryption (default: true)
# enable_encryption = true
# Path to encryption key file (generated if not exists)
# encryption_key_path = "memory_encryption.key"
[memory.embedding]
# Embedding model name (default: nomic-embed-text)
# model = "nomic-embed-text"
# Embedding dimension (default: 384 for nomic-embed-text)
# dimension = 384
# Request timeout in milliseconds (default: 30000)
# timeout_ms = 30000
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | (Optional) Enable memory storage |
enable_rag | Boolean | true | (Optional) Enable RAG context retrieval |
enable_learning | Boolean | true | (Optional) Auto-learn from interactions |
rag_limit | Integer | 5 | (Optional) Number of memories for RAG |
rag_threshold | Float | 0.5 | (Optional) Min similarity for RAG (0.0-1.0) |
merge_threshold | Float | 0.85 | (Optional) Similarity for deduplication (0.0-1.0) |
min_importance | Float | 0.1 | (Optional) Min importance to keep memories |
decay_factor | Float | 0.95 | (Optional) Importance decay over time |
enable_encryption | Boolean | true | (Optional) Encrypt stored content |
encryption_key_path | String | memory_encryption.key | (Optional) Encryption key file path |
Embedding Settings:
| Option | Type | Default | Description |
|---|---|---|---|
embedding.model | String | nomic-embed-text | (Optional) Embedding model name |
embedding.dimension | Integer | 384 | (Optional) Embedding vector dimension |
embedding.timeout_ms | Integer | 30000 | (Optional) Request timeout |
Database & Cache
Database
[database]
# SQLite database file path
path = "pisovereign.db"
# Connection pool size
max_connections = 5
# Auto-run migrations on startup
run_migrations = true
| Option | Type | Default | Description |
|---|---|---|---|
path | String | pisovereign.db | Database file path |
max_connections | Integer | 5 | Pool size |
run_migrations | Boolean | true | Auto-migrate |
Cache
PiSovereign uses a 3-layer caching architecture:
- L1 (Moka) - In-memory cache for fastest access
- L2 (Redb) - Persistent disk cache for exact-match lookups
- L3 (Semantic) - pgvector-based similarity cache for semantically equivalent queries
[cache]
# Enable caching (disable for debugging)
enabled = true
# TTL values (seconds)
ttl_short_secs = 300 # 5 minutes - frequently changing
ttl_medium_secs = 3600 # 1 hour - moderately stable
ttl_long_secs = 86400 # 24 hours - stable data
# LLM response caching
ttl_llm_dynamic_secs = 3600 # Dynamic content (briefings)
ttl_llm_stable_secs = 86400 # Stable content (help text)
# L1 (in-memory) cache size
l1_max_entries = 10000
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | Enable caching |
ttl_short_secs | Integer | 300 | Short TTL |
ttl_medium_secs | Integer | 3600 | Medium TTL |
ttl_long_secs | Integer | 86400 | Long TTL |
ttl_llm_dynamic_secs | Integer | 3600 | Dynamic LLM TTL |
ttl_llm_stable_secs | Integer | 86400 | Stable LLM TTL |
l1_max_entries | Integer | 10000 | Max memory cache entries |
Semantic Cache
The semantic cache provides an additional layer that matches queries based on embedding similarity rather than exact string matching. This enables cache hits for semantically equivalent queries like:
- “What’s the weather?” ≈ “How’s the weather today?”
- “Tell me about the capital of France” ≈ “What is Paris?”
[cache.semantic]
# Enable semantic caching
enabled = true
# Minimum cosine similarity for cache hit (0.0-1.0)
# Higher = stricter matching, lower = more cache hits
similarity_threshold = 0.92
# TTL for cached entries (hours)
ttl_hours = 48
# Maximum cached entries
max_entries = 10000
# Patterns that bypass semantic cache (time-sensitive queries)
bypass_patterns = ["weather", "time", "date", "today", "tomorrow", "now", "latest", "current", "recent"]
# How often to evict expired entries (minutes)
eviction_interval_minutes = 60
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | Enable semantic caching |
similarity_threshold | Float | 0.92 | Minimum cosine similarity (0.0-1.0) |
ttl_hours | Integer | 48 | Time-to-live in hours |
max_entries | Integer | 10000 | Maximum cache entries |
bypass_patterns | Array | See above | Queries containing these words skip cache |
eviction_interval_minutes | Integer | 60 | Expired entry cleanup interval |
Integrations
Messenger Selection
PiSovereign supports one messenger at a time:
# Choose one: "whatsapp", "signal", or "none"
messenger = "whatsapp"
| Value | Description |
|---|---|
whatsapp | Use WhatsApp Business API (webhooks) |
signal | Use Signal via signal-cli (polling) |
none | Disable messenger integration |
WhatsApp Business
[whatsapp]
# Meta Graph API access token (store in Vault)
# access_token = "your-access-token"
# Phone number ID from WhatsApp Business
# phone_number_id = "your-phone-number-id"
# App secret for webhook signature verification
# app_secret = "your-app-secret"
# Verify token for webhook setup
# verify_token = "your-verify-token"
# Require webhook signature verification
signature_required = true
# Meta Graph API version
api_version = "v18.0"
# Phone numbers allowed to send messages (empty = allow all)
# whitelist = ["+1234567890"]
# Conversation Persistence Settings
[whatsapp.persistence]
# Enable conversation persistence (default: true)
# enabled = true
# Enable encryption for stored messages (default: true)
# enable_encryption = true
# Enable RAG context retrieval from memory system (default: true)
# enable_rag = true
# Enable automatic learning from interactions (default: true)
# enable_learning = true
# Maximum days to retain conversations (optional, unlimited if not set)
# retention_days = 90
# Maximum messages per conversation before FIFO truncation (optional)
# max_messages_per_conversation = 1000
# Number of recent messages to use as context (default: 50)
# context_window = 50
| Option | Type | Default | Description |
|---|---|---|---|
access_token | String | - | (Optional) Meta Graph API token (store in Vault) |
phone_number_id | String | - | (Optional) WhatsApp Business phone number ID |
app_secret | String | - | (Optional) Webhook signature secret |
verify_token | String | - | (Optional) Webhook verification token |
signature_required | Boolean | true | Require webhook signature verification |
api_version | String | v18.0 | Meta Graph API version |
whitelist | Array | [] | (Optional) Allowed phone numbers |
Persistence Options:
| Option | Type | Default | Description |
|---|---|---|---|
persistence.enabled | Boolean | true | (Optional) Store conversations in database |
persistence.enable_encryption | Boolean | true | (Optional) Encrypt stored messages |
persistence.enable_rag | Boolean | true | (Optional) Enable RAG context retrieval |
persistence.enable_learning | Boolean | true | (Optional) Auto-learn from interactions |
persistence.retention_days | Integer | - | (Optional) Max retention days (unlimited if not set) |
persistence.max_messages_per_conversation | Integer | - | (Optional) Max messages before truncation |
persistence.context_window | Integer | 50 | (Optional) Recent messages for context |
Signal Messenger
[signal]
# Your phone number registered with Signal (E.164 format)
phone_number = "+1234567890"
# Path to signal-cli JSON-RPC socket
socket_path = "/var/run/signal-cli/socket"
# Path to signal-cli data directory (optional)
# data_path = "/var/lib/signal-cli"
# Connection timeout in milliseconds
timeout_ms = 30000
# Phone numbers allowed to send messages (empty = allow all)
# whitelist = ["+1234567890", "+0987654321"]
# Conversation Persistence Settings
[signal.persistence]
# Enable conversation persistence (default: true)
# enabled = true
# Enable encryption for stored messages (default: true)
# enable_encryption = true
# Enable RAG context retrieval from memory system (default: true)
# enable_rag = true
# Enable automatic learning from interactions (default: true)
# enable_learning = true
# Maximum days to retain conversations (optional, unlimited if not set)
# retention_days = 90
# Maximum messages per conversation before FIFO truncation (optional)
# max_messages_per_conversation = 1000
# Number of recent messages to use as context (default: 50)
# context_window = 50
| Option | Type | Default | Description |
|---|---|---|---|
phone_number | String | - | Your Signal phone number (E.164) |
socket_path | String | /var/run/signal-cli/socket | signal-cli daemon socket |
data_path | String | - | (Optional) signal-cli data directory |
timeout_ms | Integer | 30000 | Connection timeout |
whitelist | Array | [] | (Optional) Allowed phone numbers |
Persistence Options:
| Option | Type | Default | Description |
|---|---|---|---|
persistence.enabled | Boolean | true | (Optional) Store conversations in database |
persistence.enable_encryption | Boolean | true | (Optional) Encrypt stored messages |
persistence.enable_rag | Boolean | true | (Optional) Enable RAG context retrieval |
persistence.enable_learning | Boolean | true | (Optional) Auto-learn from interactions |
persistence.retention_days | Integer | - | (Optional) Max retention days (unlimited if not set) |
persistence.max_messages_per_conversation | Integer | - | (Optional) Max messages before truncation |
persistence.context_window | Integer | 50 | (Optional) Recent messages for context |
📖 See Signal Setup Guide for installation instructions.
Speech Processing
Voice message support for speech-to-text (STT) and text-to-speech (TTS).
Cloud Provider (OpenAI):
- Works on all platforms
- Requires API key
Local Provider (whisper.cpp + Piper):
- Raspberry Pi: Models in
/usr/local/share/{whisper,piper}/ - macOS: Models in
~/Library/Application Support/{whisper,piper}/ - Install whisper.cpp:
brew install whisper-cpp(Mac) or build from source (Pi) - Install Piper: Download from https://github.com/rhasspy/piper/releases
[speech]
# Speech provider: "openai" (cloud) or "local" (whisper.cpp + Piper)
# provider = "openai"
# OpenAI API key for Whisper (STT) and TTS
# openai_api_key = "sk-..."
# OpenAI API base URL (for custom endpoints)
# openai_base_url = "https://api.openai.com/v1"
# Speech-to-text model (OpenAI Whisper)
# stt_model = "whisper-1"
# Text-to-speech model
# tts_model = "tts-1"
# Default TTS voice: alloy, echo, fable, onyx, nova, shimmer
# default_voice = "nova"
# Output audio format: opus, ogg, mp3, wav
# output_format = "opus"
# Request timeout in milliseconds
# timeout_ms = 60000
# Maximum audio duration in milliseconds (25 min for Whisper)
# max_audio_duration_ms = 1500000
# Response format preference: mirror, text, voice
# response_format = "mirror"
# TTS speaking speed (0.25 to 4.0)
# speed = 1.0
| Option | Type | Default | Description |
|---|---|---|---|
provider | String | openai | (Optional) Speech provider: “openai” or “local” |
openai_api_key | String | - | (Optional) OpenAI API key (store in Vault) |
openai_base_url | String | https://api.openai.com/v1 | (Optional) OpenAI API base URL |
stt_model | String | whisper-1 | (Optional) Speech-to-text model |
tts_model | String | tts-1 | (Optional) Text-to-speech model |
default_voice | String | nova | (Optional) TTS voice (alloy, echo, fable, onyx, nova, shimmer) |
output_format | String | opus | (Optional) Audio format (opus, ogg, mp3, wav) |
timeout_ms | Integer | 60000 | (Optional) Request timeout |
max_audio_duration_ms | Integer | 1500000 | (Optional) Max audio duration (25 minutes) |
response_format | String | mirror | (Optional) Response format (mirror, text, voice) |
speed | Float | 1.0 | (Optional) TTS speaking speed (0.25 to 4.0) |
Weather
[weather]
# Open-Meteo API (free, no key required)
# base_url = "https://api.open-meteo.com/v1"
# Connection timeout in seconds
# timeout_secs = 30
# Number of forecast days (1-16)
# forecast_days = 7
# Cache TTL in minutes
# cache_ttl_minutes = 30
# Default location (when user has no profile)
# default_location = { latitude = 52.52, longitude = 13.405 } # Berlin
| Option | Type | Default | Description |
|---|---|---|---|
base_url | String | https://api.open-meteo.com/v1 | (Optional) Open-Meteo API URL |
timeout_secs | Integer | 30 | (Optional) Request timeout |
forecast_days | Integer | 7 | (Optional) Forecast days (1-16) |
cache_ttl_minutes | Integer | 30 | (Optional) Cache TTL |
default_location | Object | - | (Optional) Default location { latitude, longitude } |
CalDAV Calendar
[caldav]
# CalDAV server URL (Baïkal, Radicale, Nextcloud)
# server_url = "https://cal.example.com"
# When using Baïkal via Docker (setup --baikal):
# server_url = "http://baikal:80/dav.php"
# Authentication (store in Vault)
# username = "your-username"
# password = "your-password"
# Default calendar path (optional)
# calendar_path = "/calendars/user/default"
# TLS verification
# verify_certs = true
# Connection timeout in seconds
# timeout_secs = 30
| Option | Type | Default | Description |
|---|---|---|---|
server_url | String | - | (Optional) CalDAV server URL |
username | String | - | (Optional) Username for authentication (store in Vault) |
password | String | - | (Optional) Password for authentication (store in Vault) |
calendar_path | String | /calendars/user/default | (Optional) Default calendar path |
verify_certs | Boolean | true | (Optional) Verify TLS certificates |
timeout_secs | Integer | 30 | (Optional) Connection timeout |
Email (IMAP/SMTP)
PiSovereign supports any email provider that offers IMAP/SMTP access, including Gmail, Outlook, Proton Mail (via Bridge), and custom servers. Authentication is supported via password or OAuth2 (XOAUTH2).
Migration note: The config section was previously named
[proton]. The old name still works (via a serde alias) but[email]is the canonical name going forward.
Quick setup with provider presets:
The easiest way to configure email is using the provider field, which automatically sets sensible defaults for IMAP/SMTP hosts and ports:
[email]
provider = "gmail" # or "proton" or "custom"
email = "user@gmail.com"
password = "app-password"
Available providers:
| Provider | IMAP Host | IMAP Port | SMTP Host | SMTP Port |
|---|---|---|---|---|
proton | 127.0.0.1 | 1143 | 127.0.0.1 | 1025 |
gmail | imap.gmail.com | 993 | smtp.gmail.com | 465 |
custom | (must specify) | (must specify) | (must specify) | (must specify) |
Explicit
imap_host,imap_port,smtp_host,smtp_portvalues always override provider presets.
Full configuration:
[email]
# Provider preset: "proton" (default), "gmail", or "custom"
# provider = "proton"
# IMAP server host (overrides provider preset)
# imap_host = "imap.gmail.com" # Gmail
# imap_host = "outlook.office365.com" # Outlook
# imap_host = "127.0.0.1" # Proton Bridge
# IMAP server port (993 for TLS, 1143 for Proton Bridge STARTTLS)
# imap_port = 993
# SMTP server host
# smtp_host = "smtp.gmail.com" # Gmail
# smtp_host = "smtp.office365.com" # Outlook
# smtp_host = "127.0.0.1" # Proton Bridge
# SMTP server port (465 for TLS, 587 for STARTTLS, 1025 for Proton Bridge)
# smtp_port = 465
# Email address
# email = "user@gmail.com"
# Authentication: password or OAuth2
# For password-based auth (app passwords, Bridge passwords):
# password = "app-password"
# For OAuth2 (Gmail, Outlook):
# [email.auth]
# type = "oauth2"
# access_token = "ya29.your-token"
# TLS configuration
[email.tls]
# Verify TLS certificates (set false for self-signed certs like Proton Bridge)
# verify_certificates = true
# Minimum TLS version
# min_tls_version = "1.2"
# Custom CA certificate path (optional)
# ca_cert_path = "/path/to/ca.pem"
| Option | Type | Default | Description |
|---|---|---|---|
provider | String | proton | (Optional) Provider preset: proton, gmail, or custom. Sets default host/port values. |
imap_host | String | 127.0.0.1 | (Optional) IMAP server host (overrides provider preset) |
imap_port | Integer | 1143 | (Optional) IMAP server port (overrides provider preset) |
smtp_host | String | 127.0.0.1 | (Optional) SMTP server host (overrides provider preset) |
smtp_port | Integer | 1025 | (Optional) SMTP server port (overrides provider preset) |
email | String | - | (Optional) Email address (store in Vault) |
password | String | - | (Optional) Password (store in Vault) |
auth.type | String | password | (Optional) Auth method: password or oauth2 |
auth.access_token | String | - | (Optional) OAuth2 access token (store in Vault) |
tls.verify_certificates | Boolean | true | (Optional) Verify TLS certificates |
tls.min_tls_version | String | 1.2 | (Optional) Minimum TLS version |
tls.ca_cert_path | String | - | (Optional) Custom CA certificate path |
Provider-specific examples:
Gmail
[email]
provider = "gmail"
email = "user@gmail.com"
# Use an App Password (not your Google account password)
# Generate at: https://myaccount.google.com/apppasswords
password = "xxxx xxxx xxxx xxxx"
Outlook / Microsoft 365
[email]
provider = "custom"
imap_host = "outlook.office365.com"
imap_port = 993
smtp_host = "smtp.office365.com"
smtp_port = 587
email = "user@outlook.com"
password = "your-app-password"
Proton Mail (via Bridge)
[email]
provider = "proton" # default — uses Bridge at 127.0.0.1
email = "user@proton.me"
# Use the Bridge password (from Bridge UI), NOT your Proton account password
password = "bridge-password"
[email.tls]
verify_certificates = false # Bridge uses self-signed certs
Web Search
[websearch]
# Brave Search API key (required for primary provider)
# Get your key at: https://brave.com/search/api/
# api_key = "BSA-your-brave-api-key"
# Maximum results per search query (default: 5)
max_results = 5
# Request timeout in seconds (default: 30)
timeout_secs = 30
# Enable DuckDuckGo fallback if Brave fails (default: true)
fallback_enabled = true
# Safe search: "off", "moderate", "strict" (default: "moderate")
safe_search = "moderate"
# Country code for localized results (e.g., "US", "DE", "GB")
country = "DE"
# Language code for results (e.g., "en", "de", "fr")
language = "de"
# Rate limit: requests per minute (default: 60)
rate_limit_rpm = 60
# Cache TTL in minutes (default: 30)
cache_ttl_minutes = 30
| Option | Type | Default | Description |
|---|---|---|---|
api_key | String | - | (Optional) Brave Search API key (store in Vault) |
max_results | Integer | 5 | (Optional) Max search results (1-10) |
timeout_secs | Integer | 30 | (Optional) Request timeout |
fallback_enabled | Boolean | true | (Optional) Enable DuckDuckGo fallback |
safe_search | String | moderate | (Optional) Safe search: “off”, “moderate”, “strict” |
country | String | DE | (Optional) Country code for results |
language | String | de | (Optional) Language code for results |
rate_limit_rpm | Integer | 60 | (Optional) Rate limit (requests/minute) |
cache_ttl_minutes | Integer | 30 | (Optional) Cache time-to-live |
Security Note: Store the Brave API key in Vault rather than config.toml:
vault kv put secret/pisovereign/websearch brave_api_key="BSA-..."
Public Transit (ÖPNV)
Provides public transit routing for German transport networks via transport.rest API. Used for “How do I get to X?” queries and location-based reminders.
[transit]
# Base URL for transport.rest API (default: v6.db.transport.rest)
# base_url = "https://v6.db.transport.rest"
# Request timeout in seconds
# timeout_secs = 10
# Maximum number of journey results
# max_results = 3
# Cache TTL in minutes
# cache_ttl_minutes = 5
# Include transit info in location-based reminders
# include_in_reminders = true
# Transport modes to include:
# products_bus = true
# products_suburban = true # S-Bahn
# products_subway = true # U-Bahn
# products_tram = true
# products_regional = true # RB/RE
# products_national = false # ICE/IC
# User's home location for route calculations
# home_location = { latitude = 52.52, longitude = 13.405 } # Berlin
| Option | Type | Default | Description |
|---|---|---|---|
base_url | String | https://v6.db.transport.rest | (Optional) transport.rest API URL |
timeout_secs | Integer | 10 | (Optional) Request timeout |
max_results | Integer | 3 | (Optional) Max journey results |
cache_ttl_minutes | Integer | 5 | (Optional) Cache TTL |
include_in_reminders | Boolean | true | (Optional) Include in location reminders |
products_bus | Boolean | true | (Optional) Include bus routes |
products_suburban | Boolean | true | (Optional) Include S-Bahn |
products_subway | Boolean | true | (Optional) Include U-Bahn |
products_tram | Boolean | true | (Optional) Include tram |
products_regional | Boolean | true | (Optional) Include regional trains (RB/RE) |
products_national | Boolean | false | (Optional) Include national trains (ICE/IC) |
home_location | Object | - | (Optional) Home location { latitude, longitude } |
Reminder System
Configures the proactive reminder system including CalDAV sync, custom reminders, and scheduling settings.
[reminder]
# Maximum number of snoozes per reminder
# max_snooze = 5
# Default snooze duration in minutes
# default_snooze_minutes = 15
# How far in advance to create reminders from CalDAV events (minutes)
# caldav_reminder_lead_time_minutes = 30
# Interval for checking due reminders (seconds)
# check_interval_secs = 60
# CalDAV sync interval (minutes)
# caldav_sync_interval_minutes = 15
# Morning briefing time (HH:MM format)
# morning_briefing_time = "07:00"
# Enable morning briefing
# morning_briefing_enabled = true
| Option | Type | Default | Description |
|---|---|---|---|
max_snooze | Integer | 5 | (Optional) Max snoozes per reminder |
default_snooze_minutes | Integer | 15 | (Optional) Default snooze duration |
caldav_reminder_lead_time_minutes | Integer | 30 | (Optional) CalDAV event advance notice |
check_interval_secs | Integer | 60 | (Optional) How often to check for due reminders |
caldav_sync_interval_minutes | Integer | 15 | (Optional) CalDAV sync frequency |
morning_briefing_time | String | 07:00 | (Optional) Morning briefing time (HH:MM) |
morning_briefing_enabled | Boolean | true | (Optional) Enable daily morning briefing |
Model Selector (Deprecated)
Deprecated since v0.6.0: Use
[model_routing]instead. See Adaptive Model Routing.
The old [model_selector] section with small_model / large_model is still accepted but will be removed in a future release.
Adaptive Model Routing
Routes requests to different LLM models based on complexity. See the dedicated Adaptive Model Routing page for full documentation.
[model_routing]
enabled = true
[model_routing.models]
trivial = "template"
simple = "gemma3:1b"
moderate = "gemma3:4b"
complex = "gemma3:12b"
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | false | Enable adaptive routing |
models.trivial | String | "template" | Model for trivial tier (usually "template") |
models.simple | String | "gemma3:1b" | Small model for simple queries |
models.moderate | String | "gemma3:4b" | Medium model for moderate queries |
models.complex | String | "gemma3:12b" | Large model for complex queries |
classification.confidence_threshold | Float | 0.6 | Below this, upgrade tier |
Telemetry
[telemetry]
# Enable OpenTelemetry export
enabled = false
# OTLP endpoint (Tempo, Jaeger)
# otlp_endpoint = "http://localhost:4317"
# Sampling ratio (0.0-1.0, 1.0 = all traces)
# sample_ratio = 1.0
# Service name for traces
# service_name = "pisovereign"
# Log level filter (e.g., "info", "debug", "pisovereign=debug,tower_http=info")
# log_filter = "pisovereign=info,tower_http=info"
# Batch export timeout in seconds
# export_timeout_secs = 30
# Maximum batch size for trace export
# max_batch_size = 512
# Graceful fallback to console-only logging if OTLP collector is unavailable.
# When true (default), the application starts with console logging if the collector
# cannot be reached. Set to false to require a working collector in production.
# graceful_fallback = true
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | false | Enable OpenTelemetry export |
otlp_endpoint | String | http://localhost:4317 | (Optional) OTLP collector endpoint |
sample_ratio | Float | 1.0 | (Optional) Trace sampling ratio (0.0-1.0) |
service_name | String | pisovereign | (Optional) Service name for traces |
log_filter | String | pisovereign=info,tower_http=info | (Optional) Log level filter |
export_timeout_secs | Integer | 30 | (Optional) Batch export timeout |
max_batch_size | Integer | 512 | (Optional) Max batch size for export |
graceful_fallback | Boolean | true | (Optional) Fallback to console logging if collector unavailable |
Resilience
Degraded Mode
[degraded_mode]
# Enable fallback when backend unavailable
enabled = true
# Message returned during degraded mode
unavailable_message = "I'm currently experiencing technical difficulties. Please try again in a moment."
# Cooldown before retrying primary backend (seconds)
retry_cooldown_secs = 30
# Number of failures before entering degraded mode
failure_threshold = 3
# Number of successes required to exit degraded mode
success_threshold = 2
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | Enable degraded mode fallback |
unavailable_message | String | See above | Message returned during degraded mode |
retry_cooldown_secs | Integer | 30 | Cooldown before retrying primary backend |
failure_threshold | Integer | 3 | Failures before entering degraded mode |
success_threshold | Integer | 2 | Successes to exit degraded mode |
Retry Configuration
Exponential backoff for retrying failed requests.
[retry]
# Initial delay before first retry in milliseconds
initial_delay_ms = 100
# Maximum delay between retries in milliseconds
max_delay_ms = 10000
# Multiplier for exponential backoff (delay = initial * multiplier^attempt)
multiplier = 2.0
# Maximum number of retry attempts
max_retries = 3
| Option | Type | Default | Description |
|---|---|---|---|
initial_delay_ms | Integer | 100 | Initial retry delay (milliseconds) |
max_delay_ms | Integer | 10000 | Maximum retry delay (milliseconds) |
multiplier | Float | 2.0 | Exponential backoff multiplier |
max_retries | Integer | 3 | Maximum retry attempts |
Formula: delay = min(initial_delay * multiplier^attempt, max_delay)
Health Checks
[health]
# Global timeout for all health checks in seconds
global_timeout_secs = 5
# Service-specific timeout overrides (uncomment to customize):
# inference_timeout_secs = 10
# email_timeout_secs = 5
# calendar_timeout_secs = 5
# weather_timeout_secs = 5
| Option | Type | Default | Description |
|---|---|---|---|
global_timeout_secs | Integer | 5 | Global timeout for all health checks |
inference_timeout_secs | Integer | 5 | (Optional) Inference service timeout override |
email_timeout_secs | Integer | 5 | (Optional) Email service timeout override |
calendar_timeout_secs | Integer | 5 | (Optional) Calendar service timeout override |
weather_timeout_secs | Integer | 5 | (Optional) Weather service timeout override |
Event Bus
The in-process event bus decouples post-processing from the user-facing response path. When enabled, background handlers asynchronously handle fact extraction, audit logging, conversation persistence verification, and metrics collection — reducing perceived latency by 100–500 ms per request.
[events]
# Enable or disable the event bus (default: true)
enabled = true
# Broadcast channel buffer capacity (default: 1024)
# Increase if handlers can't keep up under high load.
channel_capacity = 1024
# Error handling policy: "log" or "retry" (default: "log", reserved for future use)
# handler_error_policy = "log"
# Retry settings (reserved for future use)
# max_retry_attempts = 3
# retry_delay_ms = 500
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | true | Enable or disable the event bus |
channel_capacity | Integer | 1024 | Broadcast channel buffer size. Values 256–4096 suit most workloads |
handler_error_policy | String | "log" | (Reserved) "log" = log-and-continue, "retry" = retry with backoff |
max_retry_attempts | Integer | 3 | (Reserved) Max retries when policy is "retry" |
retry_delay_ms | Integer | 500 | (Reserved) Base delay between retries in milliseconds |
Background handlers spawned automatically:
| Handler | Requires | Purpose |
|---|---|---|
FactExtractionHandler | Memory context | Extracts structured facts from conversations via LLM |
AuditLogHandler | Database | Records audit trail entries for chat/command/security events |
ConversationPersistenceHandler | Conversation store | Verifies conversation integrity after each interaction |
MetricsHandler | (always) | Feeds event data into the metrics collector |
Tip: Set
enabled = falseto disable all background processing and fall back to synchronous inline behavior.
Agentic Mode
Multi-agent orchestration for complex tasks. When enabled, the system decomposes complex user requests into parallel sub-tasks, each handled by an independent AI agent.
Note: Requires
[agent.tool_calling] enabled = trueto be set.
[agentic]
# Enable agentic mode (default: false)
enabled = false
# Maximum concurrent sub-agents running in parallel
max_concurrent_sub_agents = 4
# Maximum sub-agents spawned per task
max_sub_agents_per_task = 10
# Total timeout for the entire agentic task (minutes)
total_timeout_minutes = 30
# Timeout for each individual sub-agent (minutes)
sub_agent_timeout_minutes = 10
# Operations that require user approval before execution
# Example: ["send_email", "delete_contact", "execute_code"]
require_approval_for = []
| Option | Type | Default | Description |
|---|---|---|---|
enabled | Boolean | false | Enable agentic multi-agent orchestration |
max_concurrent_sub_agents | Integer | 4 | Max sub-agents running in parallel |
max_sub_agents_per_task | Integer | 10 | Max sub-agents per task |
total_timeout_minutes | Integer | 30 | Total task timeout (minutes) |
sub_agent_timeout_minutes | Integer | 10 | Per sub-agent timeout (minutes) |
require_approval_for | Array | [] | Operations requiring user approval |
Vault Integration
[vault]
# Vault server address
# address = "http://127.0.0.1:8200"
# AppRole authentication (recommended)
# role_id = "your-role-id"
# secret_id = "your-secret-id"
# Or token authentication
# token = "hvs.your-token"
# KV engine mount path
# mount_path = "secret"
# Request timeout in seconds
# timeout_secs = 5
# Vault Enterprise namespace (optional)
# namespace = "admin/pisovereign"
| Option | Type | Default | Description |
|---|---|---|---|
address | String | http://127.0.0.1:8200 | (Optional) Vault server address |
role_id | String | - | (Optional) AppRole role ID (recommended) |
secret_id | String | - | (Optional) AppRole secret ID |
token | String | - | (Optional) Vault token (alternative to AppRole) |
mount_path | String | secret | (Optional) KV engine mount path |
timeout_secs | Integer | 5 | (Optional) Request timeout |
namespace | String | - | (Optional) Vault Enterprise namespace |
Environment Variables
All configuration options can be set via environment variables.
Use __ (double underscore) as the nesting separator to avoid conflicts
with field names containing underscores (e.g., phone_number):
| Config Path | Environment Variable |
|---|---|
server.port | PISOVEREIGN_SERVER__PORT |
inference.base_url | PISOVEREIGN_INFERENCE__BASE_URL |
signal.phone_number | PISOVEREIGN_SIGNAL__PHONE_NUMBER |
database.path | PISOVEREIGN_DATABASE__PATH |
vault.address | PISOVEREIGN_VAULT__ADDRESS |
Special variables:
| Variable | Description |
|---|---|
PISOVEREIGN_ALLOW_INSECURE_CONFIG | Allow insecure settings in production |
RUST_LOG | Log level override |
Example Configurations
Development
environment = "development"
[server]
host = "127.0.0.1"
port = 3000
log_format = "text"
[inference]
base_url = "http://localhost:11434"
default_model = "qwen2.5:1.5b"
[database]
path = "./dev.db"
[cache]
enabled = false # Disable for debugging
[security]
rate_limit_enabled = false
tls_verify_certs = false
Production
environment = "production"
[server]
host = "127.0.0.1" # Behind reverse proxy
port = 3000
log_format = "json"
cors_enabled = true
allowed_origins = ["https://app.example.com"]
[inference]
base_url = "http://localhost:11434"
default_model = "qwen2.5:1.5b"
timeout_ms = 120000
[database]
path = "/var/lib/pisovereign/pisovereign.db"
max_connections = 10
[security]
rate_limit_enabled = true
rate_limit_rpm = 30
min_tls_version = "1.3"
[prompt_security]
enabled = true
sensitivity = "high"
block_on_detection = true
[vault]
address = "https://vault.internal:8200"
role_id = "..."
mount_path = "secret"
[telemetry]
enabled = true
otlp_endpoint = "http://tempo:4317"
sample_ratio = 0.1
Minimal (Quick Start)
environment = "development"
[server]
port = 3000
[inference]
base_url = "http://localhost:11434"
default_model = "qwen2.5:1.5b"
[database]
path = "pisovereign.db"
Adaptive Model Routing
Complexity-based request routing to reduce latency and resource usage
Overview
Model routing classifies every incoming message into one of four complexity tiers and routes it to an appropriately sized LLM model — or answers trivially without calling any model at all.
| Tier | Default Model | Typical Latency | Use Case |
|---|---|---|---|
| Trivial | template (no LLM) | <10 ms | Greetings, thanks, farewells |
| Simple | gemma3:1b | ~0.5 s | Short factual questions |
| Moderate | gemma3:4b | ~2 s | Multi-turn conversations, explanations |
| Complex | gemma3:12b | ~6 s | Code generation, analysis, creative writing |
Goal: Route 60–70% of queries to the Trivial or Simple tier, reducing average response time from ~8 s to ~3 s.
Configuration
Enable in config.toml:
[model_routing]
enabled = true
[model_routing.models]
trivial = "template" # No LLM call
simple = "gemma3:1b"
moderate = "gemma3:4b"
complex = "gemma3:12b"
[model_routing.classification]
confidence_threshold = 0.6
max_simple_words = 15
max_simple_chars = 100
max_moderate_sentences = 5
complex_min_words = 50
complex_keywords = [
"code", "implement", "explain", "analyze",
"compare", "debug", "refactor", "translate"
]
trivial_patterns = [
"^hi$", "^hello$", "^hey$", "^hallo$",
"^moin$", "^danke$", "^thanks$"
]
[model_routing.templates]
greeting = ["Hello! How can I help?", "Hallo! Wie kann ich helfen?"]
farewell = ["Goodbye!", "Tschüss!"]
thanks = ["You're welcome!", "Gerne!"]
help = ["I can help with questions, tasks, weather, transit, and more."]
system_info = ["PiSovereign — your private AI assistant."]
unknown = ["How can I help you?", "Wie kann ich Ihnen helfen?"]
Docker Compose
When routing is enabled, Ollama needs to keep multiple models loaded. Set in compose.yml:
OLLAMA_MAX_LOADED_MODELS: 2
This allows the small and large models to stay warm in memory simultaneously.
How Classification Works
The rule-based classifier runs synchronously (no LLM call) and takes <1 ms:
- Trivial detection: Regex patterns, emoji-only, empty input → instant template
- Complex detection: Code patterns (backticks, keywords), high word count (≥50), configured keywords → large model
- Simple detection: Short messages (≤15 words, ≤100 chars), single sentence, no conversation history → small model
- Moderate fallback: Everything else, or follow-up messages in an ongoing conversation
Confidence & Tier Upgrades
Each classification includes a confidence score (0.0–1.0). When confidence falls below the confidence_threshold (default: 0.6), the classifier upgrades to the next higher tier:
- Simple → Moderate
- Moderate → Complex
This ensures borderline cases use a more capable model rather than risk a poor response.
Metrics
Model routing exposes Prometheus metrics at /metrics/prometheus:
model_routing_requests_total{tier="trivial"} 142
model_routing_requests_total{tier="simple"} 89
model_routing_requests_total{tier="moderate"} 45
model_routing_requests_total{tier="complex"} 24
model_routing_template_hits_total 142
model_routing_upgrades_total 12
The JSON /metrics endpoint also includes a model_routing object when routing is enabled.
Decorator Chain
When model routing is enabled, the inference decorator chain becomes:
Per tier:
OllamaInferenceAdapter(tier_model)
→ DegradedInferenceAdapter (per-tier circuit breaker)
ModelRoutingAdapter
→ classifies message → selects tier adapter
→ delegates to appropriate tier
CachedInferenceAdapter (shared across all tiers)
→ SanitizedInferencePort (shared output filter)
→ ChatService
When disabled, the chain is the standard single-model path:
OllamaInferenceAdapter → Degraded → Cached → Sanitized → ChatService
Backward Compatibility
- The old
[model_selector]configuration is deprecated since v0.6.0 - Setting
model_routing.enabled = false(or omitting the section) preserves the original single-model behavior - No breaking changes to the
InferencePorttrait or HTTP API
External Services Setup
Configure WhatsApp, Signal, Email, CalDAV/CardDAV, OpenAI, and Brave Search integrations
Messenger Selection
PiSovereign supports one messenger at a time:
messenger = "signal" # Signal via signal-cli (default)
messenger = "whatsapp" # WhatsApp Business API
messenger = "none" # Disable messenger integration
| Messenger | Use Case |
|---|---|
| Signal | Privacy-focused, polling-based, no public URL needed |
| Business integration, webhook-based, requires public URL |
WhatsApp Business
PiSovereign uses the WhatsApp Business API for bidirectional messaging.
Meta Business Account
- Create a Meta Business Account
- Create a Meta Developer Account
WhatsApp App Setup
- Create an app at developers.facebook.com/apps (type: Business)
- Add the WhatsApp product
- In WhatsApp → Getting Started, note the Phone Number ID and generate an Access Token
- For a permanent token: Business Settings → System Users → create Admin → generate token with
whatsapp_business_messagingpermission - Note the App Secret from App Settings → Basic
Webhook Configuration
PiSovereign needs a public URL for WhatsApp webhooks. The Docker Compose stack uses Traefik for this automatically.
Configure in Meta Developer Console:
- WhatsApp → Configuration → Edit Webhooks
- Callback URL:
https://your-domain.com/v1/webhooks/whatsapp - Verify Token: your chosen
verify_token - Subscribe to:
messages,message_template_status_update
PiSovereign Configuration
Store credentials in Vault:
docker compose exec vault vault kv put secret/pisovereign/whatsapp \
access_token="your-access-token" \
app_secret="your-app-secret"
Add to config.toml:
[whatsapp]
phone_number_id = "your-phone-number-id"
verify_token = "your-verify-token"
signature_required = true
api_version = "v18.0"
Signal Messenger
Signal provides privacy-focused messaging with end-to-end encryption, polling-based delivery (no public URL required), and voice message support.
For the full setup guide, see Signal Setup.
Quick config:
messenger = "signal"
[signal]
phone_number = "+1234567890"
socket_path = "/var/run/signal-cli/socket"
Email Integration (IMAP/SMTP)
PiSovereign supports any provider with standard IMAP/SMTP access. Use the provider field for automatic host/port configuration, or specify hosts and ports manually.
Provider Quick Reference
| Provider | provider Value | IMAP Host | IMAP Port | SMTP Host | SMTP Port | Auth |
|---|---|---|---|---|---|---|
| Gmail | gmail | imap.gmail.com | 993 | smtp.gmail.com | 465 | App Password |
| Outlook | custom | outlook.office365.com | 993 | smtp.office365.com | 587 | App Password |
| Proton Mail | proton | 127.0.0.1 | 1143 | 127.0.0.1 | 1025 | Bridge Password |
Gmail: Enable IMAP in Gmail settings, then generate an App Password (requires 2-Step Verification).
Outlook: Enable IMAP in settings, generate an App Password at account.microsoft.com/security if 2FA is enabled.
Proton Mail: Requires Proton Bridge running on the host. Use the Bridge Password shown in Bridge UI — not your Proton account password. Set verify_certificates = false since Bridge uses self-signed certs.
Configuration
Store the password in Vault:
docker compose exec vault vault kv put secret/pisovereign/email \
password="your-email-password"
Example configs — choose one:
# Gmail (using provider preset)
[email]
provider = "gmail"
email = "yourname@gmail.com"
# Proton Mail (default provider — via Bridge)
[email]
provider = "proton"
email = "yourname@proton.me"
[email.tls]
verify_certificates = false
# Outlook (custom provider with explicit hosts)
[email]
provider = "custom"
imap_host = "outlook.office365.com"
imap_port = 993
smtp_host = "smtp.office365.com"
smtp_port = 587
email = "yourname@outlook.com"
Migration note: The config section was previously named
[proton]. The old name still works but[email]is the canonical name going forward.
CalDAV / CardDAV (Baïkal)
Baïkal is a lightweight, self-hosted CalDAV/CardDAV server included in the Docker Compose stack as an optional profile.
Docker Setup
docker compose --profile caldav up -d
This starts Baïkal at http://localhost/caldav (via Traefik). PiSovereign accesses it internally via the Docker network at http://baikal:80/dav.php.
Security: Baïkal is not directly exposed to the internet. All access is through the Docker network or localhost.
Auto-recreation: PiSovereign automatically re-creates calendars and address books if they return 404 errors (e.g., after a Baïkal database reset or re-initialization). No manual intervention is needed.
Initial Setup
- Open
http://localhost/caldavin your browser - Complete the setup wizard, set an admin password, choose SQLite
- Create a user under Users and Resources
- Create a calendar via any CalDAV client or the admin interface
Configuration
Store credentials in Vault (optional):
docker compose exec vault vault kv put secret/pisovereign/caldav \
username="your-username" \
password="your-password"
Add to config.toml:
[caldav]
server_url = "http://baikal:80/dav.php"
username = "your-username"
password = "your-password"
calendar_path = "/calendars/username/default/"
verify_certs = true
timeout_secs = 30
CardDAV for contacts uses the same server and credentials — PiSovereign automatically discovers the address book.
OpenAI API
OpenAI is used as an optional cloud fallback for speech processing (STT/TTS).
Setup
- Create an account at platform.openai.com
- Add a payment method and set usage limits (recommended: $10–20/month)
- Create an API key at platform.openai.com/api-keys
Store in Vault:
docker compose exec vault vault kv put secret/pisovereign/openai \
api_key="sk-your-openai-key"
Configuration
[speech]
provider = "hybrid" # Local first, OpenAI fallback
openai_base_url = "https://api.openai.com/v1"
stt_model = "whisper-1"
tts_model = "tts-1"
default_voice = "nova"
timeout_ms = 60000
[speech.hybrid]
prefer_local = true
allow_cloud_fallback = true
For maximum privacy (no cloud at all):
[speech]
provider = "local"
[speech.hybrid]
prefer_local = true
allow_cloud_fallback = false
Brave Search API
Brave Search enables web search with source citations. DuckDuckGo is used as an automatic fallback.
Setup
- Sign up at brave.com/search/api — the Free tier (2,000 queries/month) is sufficient for personal use
- Create an API key in the dashboard
Store in Vault:
docker compose exec vault vault kv put secret/pisovereign/websearch \
brave_api_key="BSA-your-brave-api-key"
Configuration
[websearch]
api_key = "BSA-your-brave-api-key"
max_results = 5
timeout_secs = 30
fallback_enabled = true
safe_search = "moderate"
country = "DE"
language = "de"
DuckDuckGo’s Instant Answer API is used automatically when Brave is unavailable, rate-limited, or not configured. No API key required. To disable the fallback:
[websearch]
fallback_enabled = false
Verify All Integrations
# Check all services
docker compose exec pisovereign pisovereign-cli status
# Or via HTTP
curl https://your-domain.example.com/ready/all | jq
Troubleshooting
WhatsApp webhook not receiving messages
- Verify callback URL is publicly accessible
- Check
verify_tokenmatches between config and Meta console - Ensure webhook is subscribed to
messages
Email connection refused
- Verify host and port match your provider
- For Proton: ensure Bridge is running on the host
- Check password type (App Password for Gmail/Outlook, Bridge Password for Proton)
CalDAV authentication failed
- Verify username/password
- Check
calendar_pathformat — must match user and calendar name in Baïkal
Next Steps
- Configuration Reference — Fine-tune all options
- Monitoring — Track service health
Signal Messenger Setup
📱 Connect Signal messenger to PiSovereign via Docker
PiSovereign uses signal-cli as a Docker container to send and receive Signal messages. This guide covers the complete setup process.
Prerequisites
- Docker must be running (
docker compose up -din thedocker/directory) - Signal app installed on your smartphone and registered with a phone number
- qrencode installed on the host (for QR code display)
- Phone number stored in
.envor Vault
Installing qrencode
macOS:
brew install qrencode
Debian / Raspberry Pi:
sudo apt-get install qrencode
Linking Your Signal Account
signal-cli is connected as a linked device to your existing Signal account (similar to Signal Desktop). No new account is created.
⚠️ Important: The
linkcommand outputs asgnl://URI that must be converted into a QR code. You cannot pipe the output directly toqrencode, becauseqrencodewaits for EOF — by that time the link process has already terminated and the URI has expired. Therefore, two separate terminal commands must be used.
Step 1: Start the Link Process and Capture the URI
Open a terminal and run:
docker exec -it pisovereign-signal-cli signal-cli --config /var/lib/signal-cli link -n "PiSovereign" | tee /tmp/signal-uri.txt
This command:
- Starts the link process in the background
- Captures the URI to
/tmp/signal-uri.txt - Displays the URI after 8 seconds (for verification)
Step 2: Display the QR Code and Scan
Once the URI is displayed, generate the QR code:
head -1 /tmp/signal-uri.txt | tr -d '\n' | qrencode -t ANSIUTF8
Now quickly scan with your phone:
- Open Signal on your smartphone
- Go to Settings → Linked Devices → Link New Device
- Scan the QR code shown in the terminal
- Confirm the link on your phone
💡 The link process is still running in the background, waiting for the scan. If the QR code has expired, simply repeat both steps.
Step 3: Verify the Link
After a successful scan, restart the container:
cd docker/
docker compose restart signal-cli
The logs should no longer show a NotRegisteredException:
docker compose logs signal-cli
Configuration
Phone Number
The Signal phone number must be known to PiSovereign. Use one of the following methods:
Option A: .env file (in the docker/ directory):
PISOVEREIGN_SIGNAL__PHONE_NUMBER=+491234567890
Option B: Vault:
vault kv put secret/pisovereign signal_phone_number="+491234567890"
config.toml
messenger = "signal"
[signal]
phone_number = "+491234567890" # E.164 format
socket_path = "/var/run/signal-cli/socket"
timeout_ms = 30000
Environment Variables
export PISOVEREIGN_MESSENGER=signal
export PISOVEREIGN_SIGNAL__PHONE_NUMBER=+491234567890
export PISOVEREIGN_SIGNAL__SOCKET_PATH=/var/run/signal-cli/socket
Troubleshooting
Socket Already in Use
Failed to bind socket /var/run/signal-cli/socket: Address already in use
Cause: A stale socket from a previous run persists in the Docker volume.
Solution: The container uses an entrypoint script that automatically cleans up the socket before starting. If the error still occurs:
docker compose restart signal-cli
NotRegisteredException
WARN MultiAccountManager - Ignoring +49...: User is not registered.
Cause: signal-cli has not been linked to a Signal account.
Solution: Complete the account linking procedure.
Expired QR Code
Cause: qrencode waits for EOF. When piping signal-cli link | qrencode, the QR code is only displayed after the link process terminates — at which point the URI is already invalid.
Solution: Redirect the URI to a file (Step 1) and display it as a QR code separately (Step 2). See Linking Your Signal Account.
Daemon Connection Failed
# Check the socket
docker exec pisovereign-signal-cli ls -la /var/run/signal-cli/socket
# Check container logs
docker compose logs signal-cli
Security
- Signal messages are end-to-end encrypted
- signal-cli stores cryptographic keys locally in the
signal-cli-datavolume - The socket (
signal-cli-socket) is shared only within the Docker network
Backup
The signal-cli data should be backed up regularly:
docker run --rm -v docker_signal-cli-data:/data -v $(pwd):/backup \
alpine tar czf /backup/signal-cli-backup.tar.gz -C /data .
See Backup & Restore for complete backup procedures.
See Also
- Docker Setup — Set up the Docker environment
- Vault Setup — Manage secrets
- Configuration Reference — All configuration options
- signal-cli Documentation — Upstream documentation
Reminder System
PiSovereign includes a proactive reminder system that helps you stay on top of appointments, tasks, and custom reminders. The system integrates with CalDAV calendars and provides beautiful German-language notifications via WhatsApp or Signal.
Features
- Calendar Integration: Automatically creates reminders from CalDAV events
- Custom Reminders: Create personal reminders with natural language
- Smart Notifications: Beautiful formatted messages with emoji and key information
- Location Support: Google Maps links and ÖPNV transit connections for location-based events
- Snooze Management: Snooze reminders up to 5 times (configurable)
- Morning Briefing: Daily summary of your upcoming appointments
Natural Language Commands
Creating Reminders
"Erinnere mich morgen um 10 Uhr an den Arzttermin"
"Remind me tomorrow at 3pm to call mom"
"Erinnere mich in 2 Stunden an die Wäsche"
Listing Reminders
"Zeige meine Erinnerungen"
"Welche Termine habe ich heute?"
"Liste alle aktiven Erinnerungen"
Snoozing Reminders
"Erinnere mich nochmal in 15 Minuten"
"Snooze für eine Stunde"
Acknowledging Reminders
"Ok, danke!"
"Erledigt"
Deleting Reminders
"Lösche die Erinnerung zum Arzttermin"
Transit Connections
When you have an appointment at a specific location, PiSovereign can automatically include ÖPNV (public transit) connections in your reminder:
📅 **Meeting mit Hans**
📍 Alexanderplatz 1, Berlin
🕒 Morgen um 14:00 Uhr
🚇 **So kommst du hin:**
🚌 Bus 200 → S-Bahn S5 → U-Bahn U2
Abfahrt: 13:22 (38 min)
Ankunft: 14:00
🗺️ [Auf Google Maps öffnen](https://www.google.com/maps/...)
Searching Transit Routes
You can also search for transit connections directly:
"Wie komme ich zum Hauptbahnhof?"
"ÖPNV Verbindung nach Alexanderplatz"
Configuration
Add the following sections to your config.toml:
Transit Configuration
[transit]
# Include transit info in location-based reminders
include_in_reminders = true
# Your home location for route calculations
home_location = { latitude = 52.52, longitude = 13.405 }
# Transport modes to include
products_bus = true
products_suburban = true # S-Bahn
products_subway = true # U-Bahn
products_tram = true
products_regional = true # RB/RE
products_national = false # ICE/IC
Reminder Configuration
[reminder]
# Maximum number of snoozes per reminder (default: 5)
max_snooze = 5
# Default snooze duration in minutes (default: 15)
default_snooze_minutes = 15
# How far in advance to create reminders from CalDAV events
caldav_reminder_lead_time_minutes = 30
# Interval for checking due reminders (seconds)
check_interval_secs = 60
# CalDAV sync interval (minutes)
caldav_sync_interval_minutes = 15
# Morning briefing settings
morning_briefing_time = "07:00"
morning_briefing_enabled = true
CalDAV Configuration
For calendar integration, you need a CalDAV server (like Baikal, Radicale, or Nextcloud):
[caldav]
server_url = "https://cal.example.com/dav.php"
username = "your-username"
password = "your-password"
calendar_path = "/calendars/user/default"
Reminder Sources
Reminders can come from two sources:
- CalDAV Events: Automatically synced from your calendar
- Custom Reminders: Created via natural language commands
CalDAV events include the original event details (title, time, location) while custom reminders are more flexible and can include any text.
Notification Format
Reminders are formatted as beautiful German messages with:
- Bold headers for event titles
- Emoji prefixes for quick scanning (📅 📍 🕒)
- Time formatting relative to now (“in 30 Minuten”)
- Location links to Google Maps
- Transit info for getting there
Example reminder notification:
📅 **Zahnarzt Dr. Müller**
📍 Friedrichstraße 123, Berlin
🕒 Heute um 15:00 (in 2 Stunden)
🗺️ Auf Google Maps öffnen
Morning Briefing
When enabled, you receive a daily summary at the configured time (default 7:00 AM):
☀️ **Guten Morgen!**
📅 **Heute hast du 3 Termine:**
1. 09:00 - Team Meeting (Büro)
2. 12:30 - Mittagessen mit Lisa (Restaurant Mitte)
3. 16:00 - Arzttermin (Praxis Dr. Schmidt)
🌤️ Wetter: 18°C, leicht bewölkt
📋 **Offene Erinnerungen:**
- Geburtstagskarte für Mama kaufen
- Wäsche abholen
Snooze Limits
Each reminder can be snoozed up to max_snooze times (default: 5). After that, the system will indicate that no more snoozes are available:
⏰ Diese Erinnerung wurde bereits 5x verschoben.
Bitte bestätige oder lösche sie.
Status Tracking
Reminders go through these states:
- Pending: Waiting for the remind time
- Sent: Notification was delivered
- Acknowledged: User confirmed receipt
- Snoozed: User requested a later reminder
- Deleted: User removed the reminder
You can list reminders filtered by status using commands like “zeige alle erledigten Erinnerungen”.
Troubleshooting
Solutions for common issues with PiSovereign
Quick Diagnostics
Run these commands first to identify the problem:
# Check all containers are running
docker compose ps
# Health check
curl http://localhost/health | jq
# Detailed readiness
curl http://localhost/ready/all | jq
# Recent logs
docker compose logs --tail=100 pisovereign
# System resources
docker stats --no-stream
Hailo AI HAT+
Device not detected
Symptom: Hailo device not available inside the container
Diagnosis:
# Check device files on the host
ls -la /dev/hailo*
# Check kernel module on the host
lsmod | grep hailo
# Check PCIe
lspci | grep -i hailo
Solutions:
-
Check physical connection — ensure the HAT+ is fully seated on GPIO pins, PCIe FPC cable is connected, and you are using the 27W USB-C power supply
-
Reinstall drivers on the host:
sudo apt remove --purge hailo-* sudo apt autoremove sudo reboot sudo apt install hailo-h10-all sudo reboot -
Check device passthrough — ensure
docker-compose.ymlmaps/dev/hailo0into the container
Hailo firmware error
# Reset the device (on host)
sudo hailortcli fw-control reset
# Update firmware
sudo apt update && sudo apt upgrade hailo-firmware
Inference Problems
Inference timeout
Diagnosis:
# Test Ollama directly inside Docker
docker compose exec ollama curl -s http://localhost:11434/api/generate \
-d '{"model":"qwen2.5-1.5b-instruct","prompt":"Hi","stream":false}'
Solutions:
-
Increase timeout:
[inference] timeout_ms = 120000 # 2 minutes -
Use a smaller model:
[inference] default_model = "llama3.2-1b-instruct"
Model not found
# List models
docker compose exec ollama ollama list
# Pull missing model
docker compose exec ollama ollama pull qwen2.5-1.5b-instruct
Poor response quality
Adjust in config.toml:
[inference]
max_tokens = 4096
temperature = 0.5 # Lower = more focused
If model routing is enabled, ensure complex queries use a capable model:
[model_routing.models]
complex = "gemma3:12b"
Model routing — wrong tier selected
Check Prometheus metrics to see tier distribution:
curl -s http://localhost:3000/metrics/prometheus | grep model_routing
If too many requests go to the Simple tier, lower max_simple_words or add more complex_keywords:
[model_routing.classification]
max_simple_words = 10
complex_keywords = ["code", "implement", "explain", "analyze", "compare"]
Model routing — Ollama out of memory
When multiple models are loaded, Ollama may run out of RAM. Reduce the number of concurrent models:
# compose.yml
OLLAMA_MAX_LOADED_MODELS: 1
Or use smaller models for the Simple and Moderate tiers.
Network & Connectivity
Connection refused
Diagnosis:
# Check containers
docker compose ps
# Check Traefik is routing
docker compose logs traefik | tail -20
# Test direct container access
docker compose exec pisovereign curl -s http://localhost:3000/health
Solutions:
-
Check bind address in
config.toml:[server] host = "0.0.0.0" -
Check Traefik configuration — verify domain and routing rules in
docker/traefik/dynamic.yml
TLS/SSL errors
- Development: Use
http://localhost(Traefik handles TLS for external access) - Production: Ensure your domain’s DNS points to the server, and Let’s Encrypt can reach port 80 for validation
- Self-signed certs (e.g., Proton Bridge): set
verify_certificates = falsein the relevant config section
Database Issues
Database locked
Cause: Multiple concurrent writers to SQLite
Solutions:
-
Ensure single PiSovereign instance:
docker compose ps | grep pisovereign # Should show exactly one instance -
Verify WAL mode:
docker compose exec pisovereign sqlite3 /data/pisovereign.db "PRAGMA journal_mode;" # Should return "wal"
Migration failed
# Backup current database
docker compose exec pisovereign cp /data/pisovereign.db /data/pisovereign-backup.db
# Reset database (LOSES DATA — restore from backup afterward)
docker compose exec pisovereign rm /data/pisovereign.db
docker compose restart pisovereign
Database corruption
# Attempt recovery
docker compose exec pisovereign sh -c \
'sqlite3 /data/pisovereign.db ".recover" | sqlite3 /data/pisovereign-recovered.db'
Integration Problems
Webhook verification failed:
- URL must be publicly accessible — test with
curlfrom an external network verify_tokeninconfig.tomlmust match the Meta developer console- HTTPS must be configured (Traefik handles this)
Messages not received:
- Check webhook is subscribed to the
messagesfield in Meta console - Verify phone number is whitelisted (for test numbers)
- Check logs:
docker compose logs pisovereign | grep -i whatsapp
Email (IMAP/SMTP)
Connection refused:
# Test IMAP from the container
docker compose exec pisovereign openssl s_client -connect imap.gmail.com:993
- Verify host/port match your provider (see External Services)
- For Proton Bridge: ensure Bridge is running on the host
- If using the
providerfield, explicitimap_host/smtp_hostvalues override presets
Authentication failed:
- Gmail: Use an App Password, not your account password
- Outlook: Use an App Password if 2FA is enabled
- Proton Mail: Use the Bridge Password from the Bridge UI, not your Proton account password
Migrating from [proton] config: The old [proton] config section still works via a serde alias. If you see “duplicate field” errors, ensure you don’t have both [proton] and [email] sections in your config.
CalDAV
401 Unauthorized:
docker compose exec pisovereign curl -u username:password \
http://baikal:80/dav.php/calendars/username/
Verify user exists in Baïkal admin at http://localhost/caldav.
404 Not Found — PiSovereign automatically re-creates missing calendars and address books. If you still see 404 errors:
- Verify
calendar_pathmatches your Baïkal user and calendar name - Check that the user has permissions to create calendars
- List calendars to verify:
docker compose exec pisovereign curl -u username:password -X PROPFIND \
http://baikal:80/dav.php/calendars/username/
Speech Processing
Whisper (STT) fails
# Check Whisper container
docker compose logs whisper
# Test directly
docker compose exec whisper curl -s http://localhost:8081/health
- Verify the container has enough memory (~500 MB for base model)
- Check audio format (mono 16 kHz WAV preferred)
Piper (TTS) fails
# Check Piper container
docker compose logs piper
# Test directly
docker compose exec piper curl -s http://localhost:8082/health
- Verify voice model files are mounted correctly
- Check container logs for ONNX runtime errors
Memory System (RAG)
Memories not being retrieved
- Check that
enable_rag = truein[memory]config - Verify
rag_thresholdisn’t too high — try lowering to 0.3 - Ensure embeddings are generated:
GET /v1/memories/statsshould show entries with embeddings - Confirm Ollama is running with the
nomic-embed-textembedding model
Encryption key errors
- “Read-only file system”: Ensure
encryption_key_pathpoints to a writable directory (e.g.,/app/data/memory_encryption.keyin Docker) - Lost encryption key: Encrypted memories cannot be recovered. Delete the key file, clear the
memoriesandmemory_embeddingstables, and let PiSovereign generate a new key on startup
Memory decay not running
The decay task runs automatically every 24 hours. Check logs for memory decay task entries. You can also trigger it manually via POST /v1/memories/decay.
System Commands
Commands not auto-populating
On first startup, 32 default system commands should be auto-discovered. If empty:
- Check logs for
system_command_discoveryentries - Verify the database migration
14_system_commands.sqlran successfully - Check
GET /v1/commands/catalog/count— should return{"count": 32}
Startup warnings about Vault credentials
PiSovereign validates Vault credentials at startup and logs diagnostic warnings for missing or empty fields. Check the log line Some configuration fields are empty after secret resolution for affected fields. Store the missing secrets in Vault using just docker-vault-check.
Performance Issues
High memory usage
docker stats --no-stream
Reduce resource consumption in config.toml:
[cache]
l1_max_entries = 1000
[database]
max_connections = 3
Slow response times
- Check inference latency — the model may be too large for your hardware
- Enable caching:
[cache] enabled = true - Use SSD storage — SD card I/O is a common bottleneck on Raspberry Pi
Getting Help
Collect Diagnostic Information
Before reporting an issue, gather:
# Container status
docker compose ps
# PiSovereign version
docker compose exec pisovereign pisovereign-server --version
# Recent logs
docker compose logs --since "1h" > pisovereign-logs.txt
# System info
uname -a
docker --version
free -h
df -h
Report an Issue
- GitHub Issues: github.com/twohreichel/PiSovereign/issues — include diagnostic info and reproduction steps
- Security Issues: Report privately via GitHub Security Advisories
- Discussions: GitHub Discussions for questions and help
Architecture
🏗️ System design and architectural patterns in PiSovereign
This document explains the architectural decisions, design patterns, and structure of PiSovereign.
Table of Contents
Overview
PiSovereign follows Clean Architecture (also known as Hexagonal Architecture or Ports & Adapters) to achieve:
- Independence from frameworks - Business logic doesn’t depend on Axum, SQLite, or any external library
- Testability - Core logic can be tested without infrastructure
- Flexibility - Adapters can be swapped without changing business rules
- Maintainability - Clear boundaries between concerns
┌─────────────────────────────────────────────────────────────────┐
│ External World │
│ (HTTP Clients, WhatsApp, Email Servers, AI Hardware) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Presentation Layer │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ presentation_ │ │ presentation_ │ │
│ │ http │ │ cli │ │
│ │ (Axum API) │ │ (Clap CLI) │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ application │ │
│ │ (Services, Use Cases, Orchestration, Port Definitions) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────┐ ┌──────────────────────────┐
│ Domain Layer │ │ AI Layer │ │ Infrastructure Layer │
│ ┌────────────┐ │ │ ┌──────────┐ │ │ ┌──────────────────────┐ │
│ │ domain │ │ │ │ ai_core │ │ │ │ infrastructure │ │
│ │ (Entities, │ │ │ │(Inference│ │ │ │ (Adapters, Repos, │ │
│ │ Values, │ │ │ │ Engine) │ │ │ │ Cache, DB, Vault) │ │
│ │ Commands) │ │ │ └──────────┘ │ │ └──────────────────────┘ │
│ └────────────┘ │ │ ┌──────────┐ │ │ │
│ │ │ │ai_speech │ │ │ ┌──────────────────┐ │
│ │ │ │(STT/TTS) │ │ │ │ integration_* │ │
│ │ │ └──────────┘ │ │ │ (WhatsApp, Mail, │ │
│ │ │ │ │ │ Calendar, etc.) │ │
│ │ └──────────────┘ │ └──────────────────┘ │
└──────────────────┘ └──────────────────────────┘
Clean Architecture
Layer Responsibilities
| Layer | Crates | Responsibility |
|---|---|---|
| Domain | domain | Core business entities, value objects, commands, domain errors |
| Application | application | Use cases, service orchestration, port definitions |
| Infrastructure | infrastructure, integration_* | Adapters for external systems (DB, cache, APIs) |
| AI | ai_core, ai_speech | AI-specific logic (inference, speech processing) |
| Presentation | presentation_http, presentation_cli | User interfaces (REST API, CLI) |
Dependency Rule
Inner layers NEVER depend on outer layers
domain → (no dependencies on other PiSovereign crates)
application → domain
ai_core → domain, application (ports)
ai_speech → domain, application (ports)
infrastructure → domain, application (ports)
integration_* → domain, application (ports)
presentation_* → domain, application, infrastructure, ai_*, integration_*
This means:
domainknows nothing about databases, HTTP, or external servicesapplicationdefines what it needs via ports (traits), not how it’s done- Only
presentationcrates wire everything together
Crate Dependencies
Dependency Graph
graph TB
subgraph "Presentation"
HTTP[presentation_http]
CLI[presentation_cli]
end
subgraph "Integration"
WA[integration_whatsapp]
PM[integration_email]
CAL[integration_caldav]
WX[integration_weather]
end
subgraph "Infrastructure"
INFRA[infrastructure]
end
subgraph "AI"
CORE[ai_core]
SPEECH[ai_speech]
end
subgraph "Core"
APP[application]
DOM[domain]
end
HTTP --> APP
HTTP --> INFRA
HTTP --> CORE
HTTP --> SPEECH
HTTP --> WA
HTTP --> PM
HTTP --> CAL
HTTP --> WX
CLI --> APP
CLI --> INFRA
WA --> APP
WA --> DOM
PM --> APP
PM --> DOM
CAL --> APP
CAL --> DOM
WX --> APP
WX --> DOM
INFRA --> APP
INFRA --> DOM
CORE --> APP
CORE --> DOM
SPEECH --> APP
SPEECH --> DOM
APP --> DOM
Workspace Structure
PiSovereign/
├── Cargo.toml # Workspace manifest
├── crates/
│ ├── domain/ # Core business logic (no external deps)
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── entities/ # User, Conversation, Message, etc.
│ │ ├── values/ # UserId, MessageContent, etc.
│ │ ├── commands/ # UserCommand, SystemCommand
│ │ └── errors.rs # Domain errors
│ │
│ ├── application/ # Use cases and ports
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── services/ # ConversationService, CommandService, etc.
│ │ └── ports/ # Trait definitions (InferencePort, etc.)
│ │
│ ├── infrastructure/ # Framework-dependent implementations
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── adapters/ # VaultSecretStore, etc.
│ │ ├── cache/ # MokaCache, RedbCache
│ │ ├── persistence/# SQLite repositories
│ │ └── telemetry/ # OpenTelemetry setup
│ │
│ ├── ai_core/ # Inference engine
│ │ └── src/
│ │ ├── hailo/ # Hailo-Ollama client
│ │ └── selector/ # Model routing
│ │
│ ├── ai_speech/ # Speech processing
│ │ └── src/
│ │ ├── providers/ # Hybrid, Local, OpenAI
│ │ └── converter/ # Audio format conversion
│ │
│ ├── integration_*/ # External service adapters
│ │
│ └── presentation_*/ # User interfaces
Port/Adapter Pattern
Ports (Interfaces)
Ports are traits defined in application/src/ports/ that describe what the application needs:
// application/src/ports/inference.rs
#[async_trait]
pub trait InferencePort: Send + Sync {
async fn generate(
&self,
prompt: &str,
options: InferenceOptions,
) -> Result<InferenceResponse, InferenceError>;
async fn generate_stream(
&self,
prompt: &str,
options: InferenceOptions,
) -> Result<impl Stream<Item = Result<String, InferenceError>>, InferenceError>;
async fn health_check(&self) -> Result<bool, InferenceError>;
}
// application/src/ports/secret_store.rs
#[async_trait]
pub trait SecretStore: Send + Sync {
async fn get_secret(&self, path: &str) -> Result<Option<String>, SecretError>;
async fn health_check(&self) -> Result<bool, SecretError>;
}
// application/src/ports/memory_context.rs — RAG context injection
#[async_trait]
pub trait MemoryContextPort: Send + Sync {
async fn retrieve_context(
&self,
user_id: &UserId,
query: &str,
limit: usize,
) -> Result<Vec<MemoryContext>, MemoryError>;
}
// application/src/ports/embedding.rs — Vector embeddings
#[async_trait]
pub trait EmbeddingPort: Send + Sync {
async fn embed(&self, text: &str) -> Result<Vec<f32>, EmbeddingError>;
}
// application/src/ports/encryption.rs — Content encryption at rest
pub trait EncryptionPort: Send + Sync {
fn encrypt(&self, plaintext: &[u8]) -> Result<Vec<u8>, EncryptionError>;
fn decrypt(&self, ciphertext: &[u8]) -> Result<Vec<u8>, EncryptionError>;
}
Adapters (Implementations)
Adapters implement ports and live in infrastructure/ or integration_*/:
// infrastructure/src/adapters/vault_secret_store.rs
pub struct VaultSecretStore {
client: VaultClient,
mount_path: String,
}
#[async_trait]
impl SecretStore for VaultSecretStore {
async fn get_secret(&self, path: &str) -> Result<Option<String>, SecretError> {
let full_path = format!("{}/{}", self.mount_path, path);
self.client.read_secret(&full_path).await
}
async fn health_check(&self) -> Result<bool, SecretError> {
self.client.health().await
}
}
// infrastructure/src/adapters/env_secret_store.rs
pub struct EnvironmentSecretStore {
prefix: Option<String>,
}
#[async_trait]
impl SecretStore for EnvironmentSecretStore {
async fn get_secret(&self, path: &str) -> Result<Option<String>, SecretError> {
// Convert "database/password" to "DATABASE_PASSWORD"
let env_key = self.path_to_env_var(path);
Ok(std::env::var(&env_key).ok())
}
async fn health_check(&self) -> Result<bool, SecretError> {
Ok(true) // Environment is always available
}
}
Example: Secret Store
The ChainedSecretStore demonstrates the adapter pattern:
// infrastructure/src/adapters/chained_secret_store.rs
pub struct ChainedSecretStore {
stores: Vec<Box<dyn SecretStore>>,
}
impl ChainedSecretStore {
pub fn new() -> Self {
Self { stores: Vec::new() }
}
pub fn add_store(mut self, store: impl SecretStore + 'static) -> Self {
self.stores.push(Box::new(store));
self
}
}
#[async_trait]
impl SecretStore for ChainedSecretStore {
async fn get_secret(&self, path: &str) -> Result<Option<String>, SecretError> {
for store in &self.stores {
if let Ok(Some(secret)) = store.get_secret(path).await {
return Ok(Some(secret));
}
}
Ok(None)
}
}
Usage in application:
// Wiring in presentation layer
let secret_store = ChainedSecretStore::new()
.add_store(VaultSecretStore::new(vault_config)?)
.add_store(EnvironmentSecretStore::new(Some("PISOVEREIGN")));
let command_service = CommandService::new(
Arc::new(secret_store), // Injected as trait object
// ... other dependencies
);
Data Flow
Example: Intent Routing Pipeline
User input is routed through a multi-stage pipeline that minimizes LLM calls. Each stage acts as a progressively more expensive filter:
1. User Input: "Hey, it's Andreas. I'm naming you Macci."
│
▼
2. Conversational Filter (zero LLM cost)
│ Regex-based detection of greetings, introductions, small talk.
│ If matched → skip to chat (no workflow/intent parsing).
│
▼ (not conversational)
3. Quick Pattern Matching
│ Regex patterns for well-known commands (e.g., "remind me",
│ "search for", "send email"). Fast, deterministic.
│
▼ (no quick match)
4. Guarded Workflow Detection
│ Only invoked when input has ≥8 words AND contains ≥2
│ workflow-hint keywords ("create", "plan", "distribute", etc.).
│ Uses LLM to detect multi-step workflows.
│
▼ (not a workflow)
5. LLM Intent Parsing
│ Full LLM-based intent classification with confidence score.
│ Post-validated by keyword presence per intent category.
│ Intents below 0.7 confidence are downgraded to chat.
│
▼
6. Dispatch to appropriate handler or fall through to chat
Example: Chat Request
1. HTTP Request arrives at /v1/chat
│
▼
2. presentation_http extracts request, validates auth
│
▼
3. Calls ConversationService.send_message() [application layer]
│
▼
4. ConversationService:
├── Loads conversation from ConversationRepository [port]
├── Calls InferencePort.generate() [port]
└── Saves message via ConversationRepository [port]
│
▼
5. InferencePort implementation (ai_core::HailoClient):
├── Sends request to Hailo-Ollama
└── Returns response
│
▼
6. Response flows back through layers
│
▼
7. HTTP Response returned to client
Example: WhatsApp Voice Message
1. WhatsApp Webhook POST to /v1/webhooks/whatsapp
│
▼
2. integration_whatsapp validates signature, parses message
│
▼
3. VoiceMessageService.process() [application layer]
│
├── Download audio via WhatsAppPort
├── Convert format via AudioConverter [ai_speech]
├── Transcribe via SpeechPort (STT)
├── Process text via CommandService
├── (Optional) Synthesize via SpeechPort (TTS)
└── Send response via WhatsAppPort
│
▼
4. Response sent back to user via WhatsApp
Key Design Decisions
1. Async-First
All I/O operations are async using Tokio:
#[async_trait]
pub trait InferencePort: Send + Sync {
async fn generate(&self, ...) -> Result<..., ...>;
}
Rationale: Maximizes throughput on limited Raspberry Pi resources.
2. Error Handling via thiserror
Each layer defines its own error types:
// domain/src/errors.rs
#[derive(Debug, thiserror::Error)]
pub enum DomainError {
#[error("Invalid message content: {0}")]
InvalidContent(String),
}
// application/src/errors.rs
#[derive(Debug, thiserror::Error)]
pub enum ServiceError {
#[error("Domain error: {0}")]
Domain(#[from] DomainError),
#[error("Inference failed: {0}")]
Inference(String),
}
Rationale: Clear error boundaries, easy conversion between layers.
3. Feature Flags
Optional features reduce binary size:
# Cargo.toml
[features]
default = ["http"]
http = ["axum", "tower", ...]
cli = ["clap", ...]
speech = ["whisper", "piper", ...]
Rationale: Raspberry Pi has limited storage; include only what’s needed.
4. Configuration via config Crate
Layered configuration (defaults → file → env vars):
let config = Config::builder()
.add_source(config::File::with_name("config"))
.add_source(config::Environment::with_prefix("PISOVEREIGN"))
.build()?;
Rationale: Flexibility for different deployment scenarios.
5. Multi-Layer Caching
Request → L1 (Moka, in-memory) → L2 (Redb, persistent) → L3 (Semantic, pgvector) → Backend
| Layer | Type | Storage | Match Method | Use Case |
|---|---|---|---|---|
| L1 | MokaCache | In-memory | Exact string | Hot data, sub-ms access |
| L2 | RedbCache | Disk | Exact string | Warm data, persists across restarts |
| L3 | PgSemanticCache | PostgreSQL/pgvector | Cosine similarity | Semantically equivalent queries |
Decorator Chain Order:
SanitizedInferencePort (outermost)
└─ CachedInferenceAdapter (exact L1+L2)
└─ SemanticCachedInferenceAdapter (similarity L3)
└─ DegradedInferenceAdapter
└─ OllamaInferenceAdapter (innermost)
Rationale: Minimize latency and reduce load on inference engine. The semantic layer catches queries that are phrased differently but mean the same thing, significantly improving cache hit rates.
6. In-Process Event Bus
Post-processing work (fact extraction, audit logging, metrics) runs asynchronously via an in-process event bus backed by tokio::sync::broadcast:
ChatService / AgentService
│
▼ publish(DomainEvent)
┌──────────────────────┐
│ TokioBroadcastBus │
└──────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
Fact Audit Conv Metrics
Ext. Log Pers. Handler
Key properties:
- Fire-and-forget — handlers never block the response path
DomainEventenum defined in the domain layer (7 variants)EventBusPort/EventSubscriberPortdefined in application portsTokioBroadcastEventBusadapter in infrastructure- Handlers spawned conditionally based on available dependencies
- Channel overflow →
Laggedwarning, not data loss for the publisher
Rationale: Moves 100–500 ms of per-request post-processing off the critical path, crucial on resource-constrained Raspberry Pi hardware.
7. Agentic Multi-Agent Orchestration
Complex tasks can be decomposed into parallel sub-tasks, each executed by an independent AI agent:
User Request: "Plan my trip to Berlin next week"
│
▼ POST /v1/agentic/tasks
┌──────────────────────────┐
│ AgenticOrchestrator │
│ (application service) │
└──────────────────────────┘
│ │ │
▼ ▼ ▼
SubAgent SubAgent SubAgent
(weather) (calendar)(transit)
│ │ │
└────────┴────────┘
│
▼
Aggregated Result
Key properties:
- Wave-based parallel execution with configurable concurrency limits
- Dependency tracking between sub-tasks
- Individual sub-agent timeouts and total task timeouts
- Real-time progress via SSE streaming (
/v1/agentic/tasks/{id}/stream) - Task cancellation support
- Approval gates for sensitive operations
- Domain entities in
domain, orchestration inapplication, event bus ininfrastructure, REST handlers inpresentation_http, UI inpresentation_web
Further Reading
- Crate Reference - Detailed documentation of each crate
- API Reference - REST API documentation
- Contributing - How to contribute
Web Frontend
🌐 SolidJS-based progressive web application for PiSovereign
The web frontend provides a modern chat interface for interacting with PiSovereign through any browser. It is built with SolidJS and embedded directly into the Rust binary at compile time via rust-embed.
Table of Contents
- Overview
- Technology Stack
- Architecture
- Development
- Build & Deployment
- Testing
- Code Quality
- PWA Support
- Styling
- Security
Overview
PiSovereign’s web frontend is a single-page application (SPA) that communicates with the backend via REST and Server-Sent Events (SSE). Key design goals:
- Zero external hosting — Assets are embedded in the Rust binary, no separate web server needed
- Offline-capable — PWA with service worker for offline resilience
- Privacy-first — No external CDNs, fonts, or analytics; everything is self-contained
- Lightweight — ~200 KB production bundle with code splitting
Technology Stack
| Technology | Version | Purpose |
|---|---|---|
| SolidJS | 1.9.x | Reactive UI framework |
| SolidJS Router | 0.15.x | Client-side routing |
| TypeScript | 5.7 | Type safety (strict mode) |
| Vite | 6.x | Build tool & dev server |
| Tailwind CSS | 4.x | Utility-first styling |
| Vitest | 3.x | Unit & component testing |
| vite-plugin-pwa | 1.x | Service worker generation |
Architecture
Directory Structure
crates/presentation_web/
├── Cargo.toml # Rust crate manifest
├── dist/ # Vite build output (gitignored)
├── frontend/ # SolidJS source code
│ ├── index.html # HTML entry point
│ ├── package.json # Node dependencies
│ ├── tsconfig.json # TypeScript configuration
│ ├── vite.config.ts # Vite build configuration
│ ├── vitest.config.ts # Test configuration
│ └── src/
│ ├── index.tsx # Application entry point
│ ├── app.tsx # Root component with router
│ ├── api/ # REST & SSE API clients
│ ├── components/ # Reusable UI components
│ │ └── ui/ # Base UI primitives
│ ├── hooks/ # Reactive hooks
│ ├── lib/ # Utilities (cn, format, sanitize)
│ ├── pages/ # Route page components
│ ├── stores/ # Global state stores
│ └── types/ # TypeScript type definitions
└── src/ # Rust source code
├── lib.rs # Crate root
├── assets.rs # rust-embed asset struct
├── csp.rs # Content Security Policy
├── handler.rs # Static file handler with caching
└── routes.rs # SPA router & axum integration
Component Architecture
The frontend follows a layered component architecture:
Pages (routes)
└── Composed Components (chat, settings panels)
└── UI Primitives (Button, Card, Modal, Badge, Spinner)
└── Utility Functions (cn, format, sanitize)
Pages are lazy-loaded via solid-router for code splitting:
/— Chat interface (main interaction view)/settings— Application settings/commands— Available bot commands/memories— Memory inspection/audit— Audit log viewer/health— System health dashboard
UI Primitives (components/ui/) are unstyled, composable building blocks:
Button— With variants (default, outline, ghost, destructive) and sizesCard— Container with header/content/footer slotsModal— Dialog with focus trap and backdropBadge— Status indicators with color variantsSpinner— Loading indicators
State Management
Global state uses SolidJS signals organized into stores:
| Store | Purpose |
|---|---|
auth.store | Authentication state, token management |
chat.store | Conversations, messages, active chat |
theme.store | Dark/light mode preference |
toast.store | Notification queue |
Stores are accessed via the StoreProvider context, available throughout the component tree.
API Client Layer
The api/ directory contains typed REST clients:
| Client | Endpoint | Purpose |
|---|---|---|
client.ts | — | Base HTTP client with auth headers |
chat.api.ts | /api/v1/chat | Send messages, SSE streaming |
health.api.ts | /api/v1/health | System health status |
memories.api.ts | /api/v1/memories | Memory CRUD |
audit.api.ts | /api/v1/audit | Audit log queries |
commands.api.ts | /api/v1/commands | Bot command listing |
settings.api.ts | /api/v1/settings | User preferences |
All API calls go through client.ts, which handles:
- Bearer token injection from the auth store
- Base URL resolution (same origin in production, proxy in dev)
- JSON serialization/deserialization
- Error response mapping
Development
Prerequisites
- Node.js ≥ 22 (LTS recommended)
- npm ≥ 10
Getting Started
# Install dependencies
just web-install
# Start development server with hot reload
just web-dev
The Vite dev server starts on http://localhost:5173 and proxies API requests to the backend at http://localhost:3000.
Available Commands
All frontend tasks are available via just:
| Command | Description |
|---|---|
just web-install | Install npm dependencies |
just web-build | Production build → dist/ |
just web-dev | Start Vite dev server with HMR |
just web-lint | Run ESLint checks |
just web-lint-fix | Auto-fix ESLint issues |
just web-fmt | Format with Prettier |
just web-test | Run Vitest test suite |
just web-test-coverage | Run tests with coverage report |
just web-typecheck | TypeScript type checking |
Development Workflow
- Start the backend:
just runorcargo run - Start the frontend dev server:
just web-dev - Open
http://localhost:5173in your browser - Edit SolidJS components — changes are reflected instantly via HMR
The Vite dev server proxies /api/* requests to localhost:3000, so you get the full API available during development.
Build & Deployment
Production Build
just web-build
This runs vite build which outputs optimized assets to crates/presentation_web/dist/. The build:
- Tree-shakes unused code
- Minifies JS/CSS
- Adds content hashes to filenames for cache busting
- Generates PWA service worker
- Code-splits routes for lazy loading
- Outputs ~200 KB total (gzipped ~60 KB)
Rust Integration
The presentation_web crate embeds the dist/ directory at compile time using rust-embed:
#[derive(Embed)]
#[folder = "dist/"]
pub struct FrontendAssets;
The Rust handler layer provides:
- Content-type detection — MIME types based on file extension
- Cache control — Immutable caching for hashed assets, no-cache for HTML
- ETag support — Conditional requests via
If-None-Match - CSP headers — Content Security Policy for XSS protection
- SPA fallback — Unknown routes serve
index.htmlfor client-side routing
Important: You must run just web-build before cargo build so the dist/ directory is populated. The Docker build handles this automatically.
Docker Build
The Dockerfile uses a multi-stage build:
# Stage 1: Build frontend
FROM node:22-alpine AS frontend-builder
COPY crates/presentation_web/frontend/ .
RUN npm ci && npm run build
# Stage 2: Build Rust binary
FROM rust:1.93-slim-bookworm AS builder
COPY --from=frontend-builder /app/dist/ crates/presentation_web/dist/
RUN cargo build --release
# Stage 3: Runtime
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/pisovereign .
This ensures the frontend is always built fresh and embedded into the binary.
Testing
Unit & Component Tests
Tests use Vitest with @solidjs/testing-library for component tests and MSW (Mock Service Worker) for API mocking.
# Run all tests
just web-test
# Run with coverage
just web-test-coverage
Test structure mirrors the source layout:
frontend/src/
├── lib/__tests__/ # Utility tests (cn, format, sanitize)
├── stores/__tests__/ # Store logic tests
├── api/__tests__/ # API client tests
└── components/ui/__tests__/ # Component rendering tests
Current coverage: 93 tests across utilities, stores, API clients, and UI components.
End-to-End Tests (Playwright)
The project includes Playwright-based E2E journey tests that simulate real user interactions against a live application instance on localhost:3000. These tests cover every page, CRUD operation, and user action in the frontend.
Setup
# Install Playwright browsers (one-time)
just web-e2e-install
# Ensure the Docker stack is running
just docker-up
Running E2E Tests
# Run all journey tests
just web-e2e
# Run with interactive UI (for debugging)
just web-e2e-ui
# View the last HTML report
just web-e2e-report
Skipping LLM-Dependent Tests
Tests tagged @llm (Chat, Agentic) require Ollama to be running. To skip them:
cd crates/presentation_web/frontend && npx playwright test --grep-invert @llm
Architecture
frontend/e2e/
├── global-setup.ts # Authenticates once, saves session cookie
├── fixtures/
│ └── auth.fixture.ts # Pre-authenticated page fixture
├── reporters/
│ └── bugreport.reporter.ts # Auto-generates bug reports on failure
├── helpers/
│ ├── navigation.helper.ts # Sidebar navigation, page-load utilities
│ └── form.helper.ts # Form fills, modal helpers, test IDs
└── journeys/
├── auth.journey.spec.ts # Login, logout, session persistence
├── dashboard.journey.spec.ts # Stat cards, quick actions, sections
├── chat.journey.spec.ts # Send message, SSE streaming (@llm)
├── conversations.journey.spec.ts # List, search, delete conversations
├── commands.journey.spec.ts # Parse, execute, catalog CRUD
├── approvals.journey.spec.ts # List, approve, deny requests
├── contacts.journey.spec.ts # Full CRUD + search
├── calendar.journey.spec.ts # Views, event CRUD, date navigation
├── tasks.journey.spec.ts # Task CRUD, filters, completion
├── kanban.journey.spec.ts # Board columns, filter buttons
├── memory.journey.spec.ts # Memory CRUD, search, decay, stats
├── agentic.journey.spec.ts # Multi-agent task lifecycle (@llm)
├── mailing.journey.spec.ts # Email list, refresh, mark read
└── system.journey.spec.ts # Status, models, health checks
Writing New Journey Tests
Journey tests follow a consistent pattern using test.step() for structured reproduction steps:
import { test, expect } from '../fixtures/auth.fixture';
test.describe('Feature Journey', () => {
test('complete user flow', async ({ page }) => {
await test.step('navigate to the page', async () => {
await page.goto('/feature');
await page.waitForLoadState('networkidle');
});
await test.step('perform user action', async () => {
await page.locator('button:has-text("Action")').click();
await expect(page.locator('text=Result')).toBeVisible();
});
});
});
Key conventions:
- File naming:
{feature}.journey.spec.ts - Test steps: Use
test.step()— these feed the bugreport reporter for clear reproduction steps - Data cleanup: Create test data with unique IDs (
testId()helper) and clean up inafterAllor at the end of the test - Timeouts: Use generous timeouts (60s) for LLM/SSE-dependent tests and tag them with
@llm - Resilience: Use
.catch(() => false)for optional UI elements that may or may not exist depending on backend state
Automatic Bug Reports
When a test fails, the custom BugreportReporter writes a detailed markdown file to bugreports/:
- Title and test metadata (file, line, browser, duration)
- Steps to reproduce extracted from
test.step()annotations - Error message and stack trace
- Screenshot paths (captured on failure)
- Environment details (OS, Node.js version)
Bug reports are named YYYY-MM-DD-e2e-{test-slug}.md for chronological ordering.
Code Quality
The project enforces strict code quality standards:
- TypeScript strict mode — All strict checks enabled, including
exactOptionalPropertyTypes - ESLint — SolidJS-specific rules + TypeScript checks (0 errors required)
- Prettier — Consistent formatting
- Pre-commit checks —
just pre-commitruns lint, typecheck, and tests
Quality gates are integrated into the just quality and just pre-commit recipes, which run both frontend and backend checks together.
PWA Support
The frontend is a Progressive Web App with:
- Service Worker — Generated by
vite-plugin-pwausing Workbox - Offline support — Cached assets served when offline
- Installable — Add to home screen on mobile devices
- Web manifest — App name, icons, theme colors defined in
manifest.webmanifest
The service worker uses a cache-first strategy for static assets and network-first for API calls.
Styling
Styling uses Tailwind CSS v4 with a custom design system:
- CSS custom properties for theming (dark/light mode)
- Utility classes for layout, spacing, typography
cn()helper — Merges Tailwind classes with conflict resolution viatailwind-merge+clsx- No external CSS frameworks — Everything is built from Tailwind utilities
The color palette follows a navy/slate theme matching PiSovereign’s brand identity.
Security
The embedded frontend includes several security measures:
- Content Security Policy (CSP) — Restricts script sources, style sources, and connections
- No inline scripts — All JavaScript is loaded from hashed asset files
- Same-origin API calls — No cross-origin requests by design
- No external dependencies at runtime — Fonts, icons, and all assets are self-hosted
- Auth token handling — Tokens stored in memory (SolidJS signals), not
localStorage
See the Security Hardening guide for production deployment recommendations.
AI Memory System
PiSovereign includes a persistent AI memory system that enables your assistant to remember facts, preferences, and past interactions. This creates a more personalized and contextually aware experience.
Overview
The memory system provides:
- Persistent Storage: All interactions can be stored in PostgreSQL with encryption at rest
- Semantic Search (RAG): Retrieve relevant memories based on meaning, not just keywords
- Automatic Learning: The AI learns from conversations automatically
- Memory Decay: Less important or rarely accessed memories fade over time
- Deduplication: Similar memories are merged to prevent redundancy
- Content Encryption: Sensitive data is encrypted at rest using XChaCha20-Poly1305
How It Works
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │────▶│ RAG Retrieval │────▶│ Context + Query│
│ "What's my │ │ (Top 5 similar) │ │ sent to LLM │
│ favorite..." │ └──────────────────┘ └─────────────────┘
└─────────────────┘ │ │
│ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Stored Memory │◀────│ Learning Phase │◀────│ AI Response │
│ (Encrypted) │ │ (Q&A + Metadata) │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
1. RAG Context Retrieval
When you ask a question:
- The query is converted to an embedding vector using
nomic-embed-text - Similar memories are found using cosine similarity search
- The top N most relevant memories are sorted by type priority (corrections and facts first) and injected into the prompt with an instructive preamble that explicitly tells the LLM to treat them as known facts
- Full memory content is used (not truncated summaries), with a 2 000-character budget to stay within the token window
- The AI generates a response with full context
2. Automatic Learning
After each response (including streamed responses):
- The Q&A pair is evaluated for importance using lightweight heuristics (no LLM call):
- AI naming cues (“nenn dich”, “your name is”, “du heißt”) → +0.40
- Identity cues (“my name is”, “I live in”, “ich heiße”) → +0.35
- Correction cues (“that’s wrong”, “please remember”, “eigentlich”) → +0.30
- Preference cues (“I prefer”, “I like”, “ich mag”) → +0.25
- Word count adjustments (longer = more valuable)
- Final score clamped to [0.2, 0.9]
- The memory type is automatically classified (priority order):
- AI naming signals →
Fact - Correction signals →
Correction - Preference signals →
Preference - Identity/fact signals →
Fact - Default →
Context
- AI naming signals →
- Embeddings are generated for semantic search
- If a similar memory exists (>85% similarity), they’re merged (on plaintext, before encryption)
- Content is encrypted before storage
Note: Both the HTTP chat endpoint (
ChatService) and the messenger path (MemoryEnhancedChat) use the same shared heuristic module (importance.rs) for consistent importance estimation and type classification.
3. Memory Types
| Type | Purpose | Example |
|---|---|---|
| Fact | General knowledge | “Paris is the capital of France” |
| Preference | User preferences | “User prefers dark mode” |
| Correction | Feedback/corrections | “Actually, the meeting is Tuesday not Monday” |
| ToolResult | API/tool outputs | “Weather API returned: 22°C, sunny” |
| Context | Conversation context | “Q: What time is it? A: 3:00 PM” |
4. Relevance Scoring
When memories are retrieved for RAG context, they are ranked using a combined relevance score that balances three factors:
relevance_score = similarity × 0.50 + importance × 0.20 + freshness × 0.30
Where:
similarity(50%): Cosine similarity between query and memory embeddings (0.0–1.0)importance(20%): Current importance after decay (0.0–1.0), with per-type floorsfreshness(30%): Exponential decay based on time since last access:e^(-0.01 × hours). Memories from seconds ago score ~1.0, from one day ago ~0.79, from one week ago ~0.19.
This ensures that memories from the current conversation session (stored moments ago) dominate when relevant, while long-term knowledge still contributes via the similarity and importance terms.
After scoring, memories are sorted by type priority before injection:
- Corrections (highest priority — user explicitly corrected the AI)
- Facts (identity, names, important knowledge)
- Preferences
- Context
- Tool Results
Configuration
Add to your config.toml:
[memory]
# Enable memory storage
enabled = true
# Enable RAG context retrieval
enable_rag = true
# Enable automatic learning from interactions
enable_learning = true
# Number of memories to retrieve for RAG context
rag_limit = 5
# Minimum similarity threshold for RAG retrieval (0.0-1.0)
rag_threshold = 0.5
# Similarity threshold for memory deduplication (0.0-1.0)
merge_threshold = 0.85
# Minimum importance score to keep memories
min_importance = 0.1
# Decay factor for memory importance over time
decay_factor = 0.95
# Enable content encryption
enable_encryption = true
# Path to encryption key file (generated if not exists)
encryption_key_path = "memory_encryption.key"
[memory.embedding]
# Embedding model name
model = "nomic-embed-text"
# Embedding dimension
dimension = 384
# Request timeout in milliseconds
timeout_ms = 30000
Memory Decay
Memory importance decays over time using an Ebbinghaus-inspired model with per-type modifiers that ensure critical memories resist forgetting:
stability = 1.0 + ln(1 + access_count)
type_modifier = memory_type.decay_modifier()
effective_rate = (base_decay_rate × type_modifier) / stability
reinforcement = min(access_count × 0.005, 0.08)
new_importance = max(
importance × e^(-effective_rate × days) + reinforcement,
memory_type.importance_floor()
)
Type-specific modifiers
| Memory Type | Decay Modifier | Importance Floor | Effect |
|---|---|---|---|
| Correction | 0.50 | 0.35 | Decays very slowly, never drops below 0.35 |
| Fact | 0.70 | 0.30 | Decays slowly, never drops below 0.30 |
| Preference | 0.80 | 0.25 | Moderate decay |
| Tool Result | 1.00 | 0.10 | Normal decay, ephemeral |
| Context | 1.00 | 0.10 | Normal decay, ephemeral |
This mirrors the human brain: corrections and facts are “episodic memories” that the brain retains much longer than transient working-memory items.
Other factors:
base_decay_rate: Derived fromdecay_factor(default: 0.95)stability: Grows logarithmically with each access — first access gives stability ≈ 1.0, ~3 accesses double it, with diminishing returnsreinforcement: A small bonus (up to 0.08) that prevents heavily-used memories from vanishing entirelydays_since_access: Time elapsed since the memory was last retrieved
Memories below min_importance are automatically cleaned up.
Security
Content Encryption
All memory content and summaries are encrypted using:
- Algorithm: XChaCha20-Poly1305 (AEAD)
- Key Size: 256 bits
- Nonce Size: 192 bits (unique per encryption)
The encryption key is stored at encryption_key_path and auto-generated if missing.
⚠️ Important: Backup your encryption key! Without it, encrypted memories cannot be recovered.
Embedding Vectors
Embedding vectors are stored unencrypted to enable similarity search. They reveal:
- Semantic similarity between memories
- General topic clustering
They do NOT reveal:
- Actual content
- Specific details
Embedding Models
The system supports various Ollama embedding models:
| Model | Dimensions | Use Case |
|---|---|---|
nomic-embed-text | 384 | Default, balanced |
mxbai-embed-large | 1024 | Higher accuracy |
bge-m3 | 1024 | Multilingual |
To use a different model:
[memory.embedding]
model = "mxbai-embed-large"
dimension = 1024
Database Schema
Memories are stored in PostgreSQL with pgvector for similarity search:
-- Main memories table
CREATE TABLE memories (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
conversation_id UUID,
content TEXT NOT NULL, -- Encrypted
summary TEXT NOT NULL, -- Encrypted
importance DOUBLE PRECISION NOT NULL,
memory_type TEXT NOT NULL,
tags JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
accessed_at TIMESTAMPTZ NOT NULL,
access_count INTEGER DEFAULT 0,
embedding vector(384) -- pgvector column for similarity search
);
-- IVFFlat index for fast cosine similarity search
CREATE INDEX idx_memories_embedding ON memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Full-text search index
CREATE INDEX idx_memories_fts ON memories
USING gin (to_tsvector('english', content || ' ' || summary));
Manual Memory Management
You can manually store specific information:
// Store a fact
memory_service.store_fact(user_id, "User's birthday is March 15", 0.9).await?;
// Store a preference
memory_service.store_preference(user_id, "Prefers metric units", 0.8).await?;
// Store a correction
memory_service.store_correction(user_id, "Actually prefers tea, not coffee", 1.0).await?;
Maintenance
Applying Decay
Memory decay runs as an automatic background task (daily by default). The interval is controlled by the decay_factor configuration. You can also trigger it manually:
let decayed = memory_service.apply_decay().await?;
println!("Decayed {} memories", decayed.len());
Or via the REST API:
curl -X POST -H "Authorization: Bearer $API_KEY" \
http://localhost:3000/v1/memories/decay
Cleaning Up Low-Importance Memories
let deleted = memory_service.cleanup_low_importance().await?;
println!("Deleted {} memories", deleted);
Statistics
let stats = memory_service.stats(&user_id).await?;
println!("Total: {}, With embeddings: {}, Avg importance: {:.2}",
stats.total_count, stats.with_embeddings, stats.avg_importance);
Troubleshooting
Memories Not Being Retrieved
- Check that
enable_rag = true - Verify
rag_thresholdisn’t too high (try 0.3) - Ensure embeddings are generated (check
with_embeddingsin stats) - Confirm Ollama is running with the embedding model
High Memory Usage
- Lower
rag_limitto reduce context size - Run
cleanup_low_importance()more frequently - Increase
min_importancethreshold - Reduce
decay_factorfor faster decay
Encryption Key Lost
If you lose the encryption key, encrypted memories cannot be recovered.
To start fresh:
- Delete
memory_encryption.key - Clear the
memoriesandmemory_embeddingstables - A new key will be generated on next startup
Architecture
The memory system follows the ports-and-adapters pattern:
MemoryContextPort— the primary port interface used byChatServiceto inject RAG context into prompts. Implementations receive a query string and return relevant memory snippets.MemoryService— the core service that orchestrates embedding generation, semantic search, encryption, and storage. Requires three ports:MemoryPort— persistence (PostgreSQL adapter)EmbeddingPort— vector generation (Ollama adapter usingnomic-embed-text)EncryptionPort— content encryption (ChaChaEncryptionAdapterorNoOpEncryption)
// The MemoryContextPort trait signature
#[async_trait]
pub trait MemoryContextPort: Send + Sync {
async fn retrieve_context(
&self,
user_id: &UserId,
query: &str,
limit: usize,
) -> Result<Vec<MemoryContext>, MemoryError>;
}
API Endpoints
See the API Reference for full REST API documentation covering:
GET /v1/memories— list memoriesPOST /v1/memories— create a memoryGET /v1/memories/search?q=...— semantic searchGET /v1/memories/stats— storage statisticsPOST /v1/memories/decay— trigger decayGET /v1/memories/{id}— get specific memoryDELETE /v1/memories/{id}— delete memory
LLM Tool Calling (ReAct Agent)
PiSovereign includes a ReAct (Reason + Act) agent that enables the LLM to autonomously invoke tools — weather lookups, calendar queries, web searches, and more — instead of relying solely on rigid command parsing.
How It Works
When a user sends a general question (AgentCommand::Ask), the system follows
this flow:
- Collect tools — The
ToolRegistryasks each wired port which tool definitions are available (e.g., if no weather port is configured,get_weatheris omitted). - LLM + tools — The conversation history and tool JSON schemas are sent to
Ollama’s
/api/chatendpoint with thetoolsparameter. - Parse response — The LLM either returns a final text response or requests one or more tool calls.
- Execute tools — If tool calls are returned, the
ToolExecutordispatches each call to the appropriate port, collects results, and appends them asMessageRole::Toolmessages to the conversation. - Loop — Steps 2–4 repeat until the LLM produces a final response or a configurable iteration limit / timeout is reached.
User → LLM (with tool schemas)
├─ Final text → done
└─ Tool calls → execute → append results → loop back to LLM
Architecture
The implementation follows Clean Architecture:
| Layer | Component | Crate |
|---|---|---|
| Domain | ToolDefinition | domain |
| Domain | ToolCall, ToolResult, ToolCallingResult | domain |
| Domain | MessageRole::Tool, ChatMessage::tool() | domain |
| Application | ToolRegistryPort | application |
| Application | ToolExecutorPort | application |
| Application | InferencePort::generate_with_tools() | application |
| Application | ReActAgentService | application |
| Infrastructure | ToolRegistry | infrastructure |
| Infrastructure | ToolExecutor | infrastructure |
| Infrastructure | OllamaInferenceAdapter (extended) | infrastructure |
| Presentation | Wired in main.rs, used in chat handlers | presentation_http |
Available Tools
The following 18 tools are registered when their corresponding ports are wired:
| Tool | Port Required | Description |
|---|---|---|
get_weather | WeatherPort | Current weather and forecast |
search_web | WebSearchPort | Web search via Brave / DuckDuckGo |
list_calendar_events | CalendarPort | List upcoming calendar events |
create_calendar_event | CalendarPort | Create a new calendar event |
search_contacts | ContactPort | Search contacts by name/email |
get_contact | ContactPort | Get full contact details by ID |
list_tasks | TaskPort | List tasks/todos with filters |
create_task | TaskPort | Create a new task |
complete_task | TaskPort | Mark a task as completed |
create_reminder | ReminderPort | Schedule a reminder |
list_reminders | ReminderPort | List active reminders |
search_transit | TransitPort | Search public transit connections |
store_memory | MemoryStore | Store a fact in long-term memory |
recall_memory | MemoryStore | Recall facts from memory |
execute_code | CodeExecutionPort | Run code in a sandboxed container |
search_emails | EmailPort | Search emails by query |
draft_email | EmailPort + DraftStorePort | Draft an email |
send_email | EmailPort | Send an email |
Configuration
Add to config.toml:
[agent.tool_calling]
# Enable/disable the ReAct agent (default: true)
enabled = true
# Maximum ReAct loop iterations before forcing a final answer
max_iterations = 5
# Timeout per individual tool execution (seconds)
iteration_timeout_secs = 30
# Total timeout for the entire ReAct loop (seconds)
total_timeout_secs = 120
# Run tool calls in parallel when multiple are requested
parallel_tool_execution = true
# Tools that require user approval before execution (future use)
require_approval_for = []
When enabled = false, the system falls back to the standard
ChatService::chat_with_context flow without any tool calling.
Relationship to AgentService
The ReAct agent runs alongside the existing AgentService:
AgentServicehandles all structured commands (AgentCommandvariants likeGetWeather,SearchWeb,CreateTask, etc.) via pattern matching and dedicated handler methods.ReActAgentServicehandles general questions (AgentCommand::Ask) by letting the LLM decide which tools to call.
The command parsing flow remains unchanged — AgentService::parse_command()
still classifies user input. Only Ask commands are routed through the ReAct
agent when it’s enabled.
Extending with New Tools
To add a new tool:
- Define the port in
crates/application/src/ports/(if not already existing). - Add a tool definition in
ToolRegistry— create adef_your_tool()method returning aToolDefinitionwith parameter schemas. - Add execution logic in
ToolExecutor— create anexec_your_tool()method that extracts arguments, calls the port, and formats the result. - Wire the port in
ToolRegistry::collect_tools()andToolExecutor::execute()dispatch. - Connect in
main.rs— pass the port Arc to bothToolRegistryandToolExecutorviawith_your_port()builder methods.
Decorator Forwarding
All inference port decorators forward generate_with_tools() to their inner
adapter:
SanitizedInferencePort— forwards directly (no sanitization for tool iterations)CachedInferenceAdapter— forwards without caching (tool iterations are non-deterministic)SemanticCachedInferenceAdapter— forwards without semantic cachingDegradedInferenceAdapter— forwards with circuit-breaker trackingModelRoutingAdapter— routes to the most capable (fallback) model
Relationship to Agentic Mode
The ReAct agent handles single-turn tool calling — one user query, one LLM loop deciding which tools to invoke. Agentic Mode extends this to multi-agent orchestration:
| Aspect | ReAct Agent | Agentic Mode |
|---|---|---|
| Scope | Single query | Complex multi-step task |
| Agents | 1 LLM loop | Multiple parallel sub-agents |
| Endpoint | POST /v1/chat | POST /v1/agentic/tasks |
| Progress | Synchronous or SSE chat stream | SSE task progress stream |
| Config | [agent.tool_calling] | [agentic] |
Each agentic sub-agent internally uses the same ReAct tool-calling loop. The
orchestrator (AgenticOrchestrator) decomposes the user’s request, spawns
sub-agents, and aggregates their results.
See API Reference — Agentic Tasks for endpoint documentation.
Contributing
🤝 Guidelines for contributing to PiSovereign
Thank you for your interest in contributing to PiSovereign! This guide will help you get started.
Table of Contents
Code of Conduct
This project adheres to a Code of Conduct. By participating, you are expected to:
- Be respectful and inclusive
- Accept constructive criticism gracefully
- Focus on what’s best for the community
- Show empathy towards others
Development Setup
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| Rust | 1.93.0+ | Edition 2024 |
| Just | Latest | Command runner |
| SQLite | 3.x | Development database |
| FFmpeg | 5.x+ | Audio processing |
Environment Setup
- Clone the repository
git clone https://github.com/twohreichel/PiSovereign.git
cd PiSovereign
- Install Rust toolchain
# Install rustup if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install required components
rustup component add rustfmt clippy
# Install nightly for docs (optional)
rustup toolchain install nightly
- Install Just
# macOS
brew install just
# Linux
cargo install just
- Install development dependencies
# macOS
brew install sqlite ffmpeg
# Ubuntu/Debian
sudo apt install libsqlite3-dev ffmpeg pkg-config libssl-dev
- Verify setup
# Run quality checks
just quality
# Build the project
just build
Running Tests
# Run all tests
just test
# Run tests with output
just test-verbose
# Run specific crate tests
cargo test -p domain
cargo test -p application
# Run integration tests
cargo test --test '*' -- --ignored
# Generate coverage report
just coverage
Code Style
Rust Formatting
We use rustfmt with custom configuration:
# Format all code
just fmt
# Check formatting (CI will fail if not formatted)
just fmt-check
Configuration in rustfmt.toml:
edition = "2024"
max_width = 100
use_small_heuristics = "Default"
imports_granularity = "Crate"
group_imports = "StdExternalCrate"
Clippy Lints
We enforce strict Clippy lints:
# Run clippy
just lint
# Auto-fix issues
just lint-fix
Key lint categories enabled:
clippy::pedantic- Strict lintsclippy::nursery- Experimental but useful lintsclippy::cargo- Cargo.toml best practices
Commit Messages
We follow Conventional Commits:
<type>(<scope>): <description>
[optional body]
[optional footer(s)]
Types:
| Type | Description |
|---|---|
feat | New feature |
fix | Bug fix |
docs | Documentation only |
style | Code style (formatting, no logic change) |
refactor | Code change that neither fixes nor adds |
perf | Performance improvement |
test | Adding or updating tests |
chore | Maintenance tasks |
Examples:
feat(api): add streaming chat endpoint
Implements SSE-based streaming for /v1/chat/stream endpoint.
Supports token-by-token response streaming for better UX.
Closes #123
fix(inference): handle timeout gracefully
Previously, inference timeouts caused a panic. Now returns
a proper error response with retry information.
Documentation
All public APIs must be documented:
/// Processes a user message and returns an AI response.
///
/// This method handles the full conversation flow including:
/// - Loading conversation context
/// - Calling the inference engine
/// - Persisting the response
///
/// # Arguments
///
/// * `conversation_id` - Optional ID to continue existing conversation
/// * `message` - The user's message content
///
/// # Returns
///
/// Returns the AI's response or an error if processing fails.
///
/// # Errors
///
/// - `ServiceError::Inference` - If the inference engine is unavailable
/// - `ServiceError::Database` - If conversation persistence fails
///
/// # Examples
///
/// ```rust,ignore
/// let response = service.send_message(
/// Some(conversation_id),
/// "What's the weather?".to_string(),
/// ).await?;
/// ```
pub async fn send_message(
&self,
conversation_id: Option<ConversationId>,
message: String,
) -> Result<Message, ServiceError> {
// ...
}
Pull Request Process
Before You Start
-
Check existing issues/PRs
- Look for related issues or PRs
- Comment on the issue you want to work on
-
Create an issue first (for features)
- Describe the feature
- Discuss approach before implementing
-
Fork and branch
git checkout -b feat/my-feature # or git checkout -b fix/issue-123
Creating a PR
-
Ensure quality checks pass
just pre-commit -
Write/update tests
- Add tests for new functionality
- Ensure existing tests still pass
-
Update documentation
- Update relevant docs in
docs/ - Add doc comments to new public APIs
- Update relevant docs in
-
Push and create PR
git push origin feat/my-feature -
Fill out PR template
- Description of changes
- Related issues
- Testing performed
- Breaking changes (if any)
PR Template
## Description
Brief description of what this PR does.
## Related Issues
Fixes #123
Related to #456
## Type of Change
- [ ] Bug fix (non-breaking)
- [ ] New feature (non-breaking)
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manually tested on Raspberry Pi
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new warnings
Review Process
-
Automated checks must pass:
- Format check (
rustfmt) - Lint check (
clippy) - Tests (all platforms)
- Coverage (no significant decrease)
- Security scan (
cargo-deny)
- Format check (
-
Human review:
- At least one maintainer approval required
- Address all review comments
-
Merge:
- Squash and merge for clean history
- Delete branch after merge
Development Workflow
Common Tasks
# Full quality check (run before pushing)
just quality
# Quick pre-commit check
just pre-commit
# Run the server locally
just run
# Run CLI commands
just cli status
just cli chat "Hello"
# Generate and view documentation
just docs
# Clean build artifacts
just clean
Project Structure
PiSovereign/
├── crates/ # Rust crates
│ ├── domain/ # Core business logic
│ ├── application/ # Use cases, services
│ ├── infrastructure/ # External adapters
│ ├── ai_core/ # Inference engine
│ ├── ai_speech/ # Speech processing
│ ├── integration_*/ # Service integrations
│ └── presentation_*/ # HTTP API, CLI
├── docs/ # mdBook documentation
├── grafana/ # Monitoring configuration
├── migrations/ # Database migrations
└── .github/ # CI/CD workflows
Adding a New Feature
-
Domain layer (if new entities/values needed)
# Edit crates/domain/src/entities/mod.rs # Add new entity module -
Application layer (service logic)
# Add port trait in crates/application/src/ports/ # Add service in crates/application/src/services/ -
Infrastructure layer (adapters)
# Implement port in crates/infrastructure/src/adapters/ -
Presentation layer (API endpoints)
# Add handler in crates/presentation_http/src/handlers/ # Add route in crates/presentation_http/src/router.rs -
Tests
# Unit tests alongside code # Integration tests in crates/*/tests/
Database Migrations
# Create new migration
cat > migrations/V007__my_migration.sql << 'EOF'
-- Description of migration
CREATE TABLE my_table (
id TEXT PRIMARY KEY,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
EOF
# Migrations run automatically on startup (if enabled)
# Or manually:
pisovereign-cli migrate
Getting Help
- Questions: Use GitHub Discussions
- Bugs: Open an Issue
- Security: Report via GitHub Security Advisories
Thank you for contributing! 🎉
Crate Reference
📦 Detailed documentation of all PiSovereign crates
This document provides comprehensive documentation for each crate in the PiSovereign workspace.
Table of Contents
- Overview
- Domain Layer
- Application Layer
- Infrastructure Layer
- AI Crates
- Integration Crates
- Presentation Crates
Overview
PiSovereign consists of 12 crates organized by architectural layer:
| Layer | Crates | Purpose |
|---|---|---|
| Domain | domain | Core business logic, entities, value objects |
| Application | application | Use cases, services, port definitions |
| Infrastructure | infrastructure | Database, cache, secrets, telemetry |
| AI | ai_core, ai_speech | Inference engine, speech processing |
| Integration | integration_* | External service adapters |
| Presentation | presentation_* | HTTP API, CLI |
Domain Layer
domain
Purpose: Contains the core business logic with zero external dependencies (except std). Defines the ubiquitous language of the application.
Dependencies: None (pure Rust)
Entities
| Entity | Description |
|---|---|
User | Represents a system user with profile information |
Conversation | A chat conversation containing messages |
Message | A single message in a conversation |
ApprovalRequest | Pending approval for sensitive operations |
AuditEntry | Audit log entry for compliance |
CalendarEvent | Calendar event representation |
EmailMessage | Email representation |
WeatherData | Weather information |
// Example: Conversation entity
pub struct Conversation {
pub id: ConversationId,
pub title: Option<String>,
pub system_prompt: Option<String>,
pub messages: Vec<Message>,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
}
Value Objects
| Value Object | Description |
|---|---|
UserId | Unique user identifier (UUID) |
ConversationId | Unique conversation identifier |
MessageContent | Validated message content |
TenantId | Multi-tenant identifier |
PhoneNumber | Validated phone number |
// Example: UserId value object
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct UserId(Uuid);
impl UserId {
pub fn new() -> Self {
Self(Uuid::new_v4())
}
pub fn from_uuid(uuid: Uuid) -> Self {
Self(uuid)
}
}
Commands
| Command | Description |
|---|---|
UserCommand | Commands from users (Briefing, Ask, Help, etc.) |
SystemCommand | Internal system commands |
// User command variants
pub enum UserCommand {
MorningBriefing,
CreateCalendarEvent { title: String, start: DateTime<Utc>, end: DateTime<Utc> },
SummarizeInbox { count: usize },
DraftEmail { to: String, subject: String },
SendEmail { draft_id: String },
Ask { query: String },
Echo { message: String },
Help,
}
Domain Errors
#[derive(Debug, thiserror::Error)]
pub enum DomainError {
#[error("Invalid message content: {0}")]
InvalidContent(String),
#[error("Conversation not found: {0}")]
ConversationNotFound(ConversationId),
#[error("User not authorized: {0}")]
Unauthorized(String),
}
Application Layer
application
Purpose: Orchestrates use cases by coordinating domain entities and infrastructure through port interfaces.
Dependencies: domain
Services
| Service | Description |
|---|---|
AgentService | Intent routing pipeline (conversational filter → quick patterns → workflow detection → LLM intent) |
ChatService | LLM chat with RAG context injection and automatic memory storage |
ConversationService | Manages conversations and messages |
VoiceMessageService | STT → LLM → TTS pipeline |
CommandService | Parses and executes user commands |
MemoryService | Memory storage, semantic search, encryption, decay, and deduplication |
ApprovalService | Handles approval workflows |
BriefingService | Generates morning briefings |
CalendarService | Calendar operations |
EmailService | Email operations |
HealthService | System health checks |
Command Parser Modules
| Module | Description |
|---|---|
conversational_filter | Zero-LLM-cost regex filter for greetings, introductions, and small talk |
llm | LLM-based intent parsing with confidence scoring and keyword post-validation |
workflow_parser | Multi-step workflow detection with hardened negative examples |
// Example: ConversationService
pub struct ConversationService<R, I>
where
R: ConversationRepository,
I: InferencePort,
{
repository: Arc<R>,
inference: Arc<I>,
}
impl<R, I> ConversationService<R, I>
where
R: ConversationRepository,
I: InferencePort,
{
pub async fn send_message(
&self,
conversation_id: Option<ConversationId>,
content: String,
) -> Result<Message, ServiceError> {
// 1. Load or create conversation
// 2. Build prompt with context
// 3. Call inference engine
// 4. Save and return response
}
}
Ports (Trait Definitions)
| Port | Description |
|---|---|
InferencePort | LLM inference operations |
ConversationRepository | Conversation persistence |
MemoryPort | Memory persistence (store, search, decay) |
MemoryContextPort | RAG context injection into prompts |
EmbeddingPort | Embedding vector generation |
EncryptionPort | Content encryption/decryption |
SecretStore | Secret management |
CachePort | Caching abstraction |
CalendarPort | Calendar operations |
EmailPort | Email operations |
WeatherPort | Weather data |
SpeechPort | STT/TTS operations |
WhatsAppPort | WhatsApp messaging |
ApprovalRepository | Approval persistence |
AuditRepository | Audit logging |
// Example: InferencePort
#[async_trait]
pub trait InferencePort: Send + Sync {
async fn generate(
&self,
prompt: &str,
options: InferenceOptions,
) -> Result<InferenceResponse, InferenceError>;
async fn generate_stream(
&self,
prompt: &str,
options: InferenceOptions,
) -> Result<Pin<Box<dyn Stream<Item = Result<String, InferenceError>> + Send>>, InferenceError>;
async fn health_check(&self) -> Result<bool, InferenceError>;
fn default_model(&self) -> &str;
}
Infrastructure Layer
infrastructure
Purpose: Provides concrete implementations of application ports for external systems.
Dependencies: domain, application
Adapters
| Adapter | Implements | Description |
|---|---|---|
VaultSecretStore | SecretStore | HashiCorp Vault KV v2 |
EnvironmentSecretStore | SecretStore | Environment variables |
ChainedSecretStore | SecretStore | Multi-backend fallback |
Argon2PasswordHasher | PasswordHasher | Secure password hashing |
// Example: VaultSecretStore usage
let vault = VaultSecretStore::new(VaultConfig {
address: "http://127.0.0.1:8200".to_string(),
role_id: Some("...".to_string()),
secret_id: Some("...".to_string()),
mount_path: "secret".to_string(),
..Default::default()
})?;
let secret = vault.get_secret("pisovereign/whatsapp/access_token").await?;
Cache
| Component | Description |
|---|---|
MokaCache | L1 in-memory cache (fast, volatile) |
RedbCache | L2 persistent cache (survives restarts) |
TieredCache | Combines L1 + L2 with fallback |
// TieredCache usage
let cache = TieredCache::new(
MokaCache::new(10_000), // 10k entries max
RedbCache::new("/var/lib/pisovereign/cache.redb")?,
);
// Write-through to both layers
cache.set("key", "value", Duration::from_secs(3600)).await?;
// Read checks L1 first, then L2
let value = cache.get("key").await?;
Persistence
| Component | Description |
|---|---|
PgConversationRepository | Conversation storage |
PgApprovalRepository | Approval request storage |
PgAuditRepository | Audit log storage |
PgUserRepository | User profile storage |
Other Components
| Component | Description |
|---|---|
TelemetrySetup | OpenTelemetry initialization |
CronScheduler | Cron-based task scheduling |
TeraTemplates | Template rendering |
RetryExecutor | Exponential backoff retry |
SecurityValidator | Config validation |
ModelRoutingAdapter | Adaptive 4-tier model routing (replaces ai_core::ModelSelector) |
RuleBasedClassifier | Rule-based complexity classification |
TemplateResponder | Instant template responses for trivial queries |
ModelRoutingMetrics | Atomic counters for routing observability |
AI Crates
ai_core
Purpose: Inference engine abstraction and Hailo-Ollama client.
Dependencies: domain, application
Components
| Component | Description |
|---|---|
HailoClient | Hailo-Ollama HTTP client |
ModelSelector | infrastructure::ModelRoutingAdapter) |
// HailoClient usage
let client = HailoClient::new(InferenceConfig {
base_url: "http://localhost:11434".to_string(),
default_model: "gemma3:12b".to_string(),
timeout_ms: 60000,
..Default::default()
})?;
let response = client.generate(
"What is the capital of France?",
InferenceOptions::default(),
).await?;
ai_speech
Purpose: Speech-to-Text and Text-to-Speech processing.
Dependencies: domain, application
Providers
| Provider | Description |
|---|---|
HybridSpeechProvider | Local first, cloud fallback |
LocalSttProvider | whisper.cpp integration |
LocalTtsProvider | Piper integration |
OpenAiSpeechProvider | OpenAI Whisper & TTS |
// HybridSpeechProvider usage
let speech = HybridSpeechProvider::new(SpeechConfig {
provider: SpeechProviderType::Hybrid,
prefer_local: true,
allow_cloud_fallback: true,
..Default::default()
})?;
// Transcribe audio
let text = speech.transcribe(&audio_data, "en").await?;
// Synthesize speech
let audio = speech.synthesize("Hello, world!", "en").await?;
Audio Conversion
| Component | Description |
|---|---|
AudioConverter | FFmpeg-based format conversion |
Supported formats: OGG/Opus, MP3, WAV, FLAC, M4A, WebM
Integration Crates
integration_whatsapp
Purpose: WhatsApp Business API integration.
Dependencies: domain, application
Components
| Component | Description |
|---|---|
WhatsAppClient | Meta Graph API client |
WebhookHandler | Incoming message handler |
SignatureValidator | HMAC-SHA256 verification |
// WhatsAppClient usage
let whatsapp = WhatsAppClient::new(WhatsAppConfig {
access_token: "...".to_string(),
phone_number_id: "...".to_string(),
api_version: "v18.0".to_string(),
})?;
// Send text message
whatsapp.send_text("+1234567890", "Hello!").await?;
// Send audio message
whatsapp.send_audio("+1234567890", &audio_data).await?;
integration_email
Purpose: Generic email integration via IMAP/SMTP, supporting any provider (Gmail, Outlook, Proton Mail, and custom servers).
Dependencies: domain, application
Components
| Component | Description |
|---|---|
ImapClient | Email reading via IMAP |
SmtpClient | Email sending via SMTP |
EmailProviderConfig | Provider-agnostic configuration |
AuthMethod | Password or OAuth2 (XOAUTH2) authentication |
ProviderPreset | Pre-configured settings for Proton, Gmail, Outlook |
ReconnectingClient | Connection resilience with auto-reconnect |
use integration_email::{EmailProviderConfig, AuthMethod, ProviderPreset};
// Proton Mail via Bridge
let proton = EmailProviderConfig::with_credentials("user@proton.me", "bridge-password")
.with_imap("127.0.0.1", 1143)
.with_smtp("127.0.0.1", 1025);
// Gmail with OAuth2
let gmail = EmailProviderConfig::with_oauth2("user@gmail.com", "ya29.access-token")
.with_preset(ProviderPreset::Gmail);
// Outlook with app password
let outlook = EmailProviderConfig::with_credentials("user@outlook.com", "app-password")
.with_preset(ProviderPreset::Outlook);
integration_caldav
Purpose: CalDAV calendar integration.
Dependencies: domain, application
Components
| Component | Description |
|---|---|
CalDavClient | CalDAV protocol client |
ICalParser | iCalendar parsing |
// CalDavClient usage
let calendar = CalDavClient::new(CalDavConfig {
server_url: "https://cal.example.com/dav.php".to_string(),
username: "user".to_string(),
password: "pass".to_string(),
calendar_path: "/calendars/user/default/".to_string(),
})?;
// Fetch events
let events = calendar.get_events(start_date, end_date).await?;
// Create event
calendar.create_event(CalendarEvent {
title: "Meeting".to_string(),
start: start_time,
end: end_time,
..Default::default()
}).await?;
integration_weather
Purpose: Open-Meteo weather API integration.
Dependencies: domain, application
Components
| Component | Description |
|---|---|
OpenMeteoClient | Weather API client |
// OpenMeteoClient usage
let weather = OpenMeteoClient::new(WeatherConfig {
base_url: "https://api.open-meteo.com/v1".to_string(),
forecast_days: 7,
cache_ttl_minutes: 30,
})?;
// Get current weather
let current = weather.get_current(52.52, 13.405).await?;
// Get forecast
let forecast = weather.get_forecast(52.52, 13.405).await?;
Presentation Crates
presentation_http
Purpose: HTTP REST API using Axum.
Dependencies: All crates (orchestration layer)
Handlers
| Handler | Endpoint | Description |
|---|---|---|
health | GET /health | Liveness probe |
ready | GET /ready | Readiness with inference status |
chat | POST /v1/chat | Send chat message |
chat_stream | POST /v1/chat/stream | Streaming chat (SSE) |
commands | POST /v1/commands | Execute command |
webhooks | POST /v1/webhooks/whatsapp | WhatsApp webhook |
metrics | GET /metrics/prometheus | Prometheus metrics |
Middleware
| Middleware | Description |
|---|---|
RateLimiter | Request rate limiting |
ApiKeyAuth | API key authentication |
RequestId | Request correlation ID |
Cors | CORS handling |
Binaries
pisovereign-server- HTTP server binary
presentation_cli
Purpose: Command-line interface using Clap.
Dependencies: Core crates
Commands
| Command | Description |
|---|---|
status | Show system status |
chat | Send chat message |
command | Execute command |
backup | Database backup |
restore | Database restore |
migrate | Run migrations |
openapi | Export OpenAPI spec |
# Examples
pisovereign-cli status
pisovereign-cli chat "Hello"
pisovereign-cli command "briefing"
pisovereign-cli backup --output backup.db
pisovereign-cli openapi --output openapi.json
Binaries
pisovereign-cli- CLI binary
Cargo Docs
For detailed API documentation, see the auto-generated Cargo docs:
- Latest: /api/latest/
- By Version:
/api/vX.Y.Z/
Generate locally:
just docs
# Opens browser at target/doc/presentation_http/index.html
API Reference
📡 REST API documentation for PiSovereign
This document provides complete REST API documentation including authentication, endpoints, and the OpenAPI specification.
Table of Contents
Overview
Base URL
http://localhost:3000 # Development
https://your-domain.com # Production (behind Traefik)
Content Type
All requests and responses use JSON:
Content-Type: application/json
Accept: application/json
Request ID
Every response includes a correlation ID for debugging:
X-Request-Id: 550e8400-e29b-41d4-a716-446655440000
Include this when reporting issues.
Authentication
API Key Authentication
Protected endpoints require an API key in the Authorization header:
Authorization: Bearer sk-your-api-key
Configuration
API keys are mapped to user IDs in config.toml:
[security.api_key_users]
"sk-abc123def456" = "550e8400-e29b-41d4-a716-446655440000"
"sk-xyz789ghi012" = "6ba7b810-9dad-11d1-80b4-00c04fd430c8"
Example Request
curl -X POST http://localhost:3000/v1/chat \
-H "Authorization: Bearer sk-abc123def456" \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}'
Authentication Errors
| Status | Code | Description |
|---|---|---|
| 401 | UNAUTHORIZED | Missing or invalid API key |
| 403 | FORBIDDEN | Valid key, but action not allowed |
{
"error": {
"code": "UNAUTHORIZED",
"message": "Invalid or missing API key",
"request_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Rate Limiting
Rate limiting is applied per IP address.
| Configuration | Default |
|---|---|
rate_limit_rpm | 120 requests/minute |
Headers
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1707321600
Rate Limited Response
HTTP/1.1 429 Too Many Requests
Retry-After: 30
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Please retry after 30 seconds.",
"retry_after": 30
}
}
Endpoints
Health & Status
GET /health
Liveness probe. Returns 200 if the server is running.
Authentication: None required
Response: 200 OK
{
"status": "ok"
}
GET /ready
Readiness probe with inference engine status.
Authentication: None required
Response: 200 OK (healthy) or 503 Service Unavailable
{
"status": "ready",
"inference": {
"healthy": true,
"model": "qwen2.5-1.5b-instruct",
"latency_ms": 45
}
}
GET /ready/all
Extended health check with all service statuses.
Authentication: None required
Response: 200 OK
{
"status": "ready",
"services": {
"inference": { "healthy": true, "latency_ms": 45 },
"database": { "healthy": true, "latency_ms": 2 },
"cache": { "healthy": true },
"whatsapp": { "healthy": true, "latency_ms": 120 },
"email": { "healthy": true, "latency_ms": 89 },
"calendar": { "healthy": true, "latency_ms": 35 },
"weather": { "healthy": true, "latency_ms": 180 }
},
"latency_percentiles": {
"p50_ms": 45,
"p90_ms": 120,
"p99_ms": 250
}
}
Chat
POST /v1/chat
Send a message and receive a response.
Authentication: Required
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
message | string | Yes | User message |
conversation_id | string | No | Continue existing conversation |
system_prompt | string | No | Override system prompt |
model | string | No | Override default model |
temperature | float | No | Sampling temperature (0.0-2.0) |
max_tokens | integer | No | Maximum response tokens |
{
"message": "What's the weather in Berlin?",
"conversation_id": "conv-123",
"temperature": 0.7
}
Response: 200 OK
{
"id": "msg-456",
"conversation_id": "conv-123",
"role": "assistant",
"content": "Currently in Berlin, it's 15°C with partly cloudy skies...",
"model": "qwen2.5-1.5b-instruct",
"tokens": {
"prompt": 45,
"completion": 128,
"total": 173
},
"created_at": "2026-02-07T10:30:00Z"
}
POST /v1/chat/stream
Streaming chat using Server-Sent Events (SSE).
Authentication: Required
Request Body: Same as /v1/chat
Response: 200 OK (text/event-stream)
event: message
data: {"delta": "Currently"}
event: message
data: {"delta": " in Berlin"}
event: message
data: {"delta": ", it's 15°C"}
event: done
data: {"tokens": {"prompt": 45, "completion": 128, "total": 173}}
Example (JavaScript):
const eventSource = new EventSource('/v1/chat/stream', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-...',
'Content-Type': 'application/json'
},
body: JSON.stringify({ message: 'Hello' })
});
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
process.stdout.write(data.delta);
};
Commands
POST /v1/commands
Execute a command and get the result.
Authentication: Required
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
command | string | Yes | Command to execute |
args | object | No | Command arguments |
{
"command": "briefing"
}
Response: 200 OK
{
"command": "MorningBriefing",
"status": "completed",
"result": {
"weather": "15°C, partly cloudy",
"calendar": [
{"time": "09:00", "title": "Team standup"},
{"time": "14:00", "title": "Client meeting"}
],
"emails": {
"unread": 5,
"important": 2
}
},
"executed_at": "2026-02-07T07:00:00Z"
}
Available Commands:
| Command | Description | Arguments |
|---|---|---|
briefing | Morning briefing | None |
weather | Current weather | location (optional) |
calendar | Today’s events | days (default: 1) |
emails | Email summary | count (default: 10) |
help | List commands | None |
POST /v1/commands/parse
Parse a command without executing it.
Authentication: Required
Request Body:
{
"input": "create meeting tomorrow at 3pm"
}
Response: 200 OK
{
"parsed": true,
"command": {
"type": "CreateCalendarEvent",
"title": "meeting",
"start": "2026-02-08T15:00:00Z",
"end": "2026-02-08T16:00:00Z"
},
"confidence": 0.92,
"requires_approval": true
}
System Command Catalog
The system command catalog provides a discoverable set of shell commands that can be executed on the host system. On first startup, PiSovereign automatically populates 32 default commands (disk usage, system info, network tools, etc.) stored in PostgreSQL.
GET /v1/commands/catalog
List all commands in the catalog.
Authentication: Required
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
limit | integer | No | Maximum results (default: 100) |
offset | integer | No | Pagination offset (default: 0) |
Response: 200 OK
[
{
"id": "default-disk-free",
"name": "Disk Free Space",
"description": "Show available disk space on all mounts",
"command": "df -h",
"category": "filesystem",
"risk_level": "safe",
"os": "linux",
"requires_approval": false,
"created_at": "2026-02-24T08:50:08Z",
"updated_at": "2026-02-24T08:50:08Z"
}
]
GET /v1/commands/catalog/search
Search the catalog by keyword.
Authentication: Required
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q | string | Yes | Search query (matches name and description) |
Response: 200 OK — returns matching commands (same format as listing).
GET /v1/commands/catalog/count
Get the total number of catalog entries.
Authentication: Required
Response: 200 OK
{
"count": 32
}
GET /v1/commands/catalog/
Get a specific catalog command by ID.
Authentication: Required
Response: 200 OK — returns a single command object.
POST /v1/commands/catalog
Create a custom catalog command.
Authentication: Required
Request Body:
{
"name": "Check Logs",
"description": "Tail the last 100 lines of syslog",
"command": "tail -n 100 /var/log/syslog",
"category": "system",
"risk_level": "safe",
"os": "linux",
"requires_approval": false
}
Response: 201 Created
POST /v1/commands/catalog/{id}/execute
Execute a command from the catalog. Commands with requires_approval: true will create an approval request instead of executing immediately.
Authentication: Required
Response: 200 OK
DELETE /v1/commands/catalog/
Delete a catalog command.
Authentication: Required
Response: 204 No Content
Memory
The memory API manages the RAG (Retrieval-Augmented Generation) knowledge store. Memories are automatically used to enrich chat context.
GET /v1/memories
List all stored memories.
Authentication: Required
Response: 200 OK
[
{
"id": "uuid",
"content": "The user prefers dark mode",
"summary": "UI preference: dark mode",
"memory_type": "Preference",
"importance": 0.8,
"access_count": 5,
"tags": ["ui", "preference"],
"created_at": "2026-02-24T08:50:00Z",
"updated_at": "2026-02-24T09:00:00Z"
}
]
POST /v1/memories
Create a new memory entry.
Authentication: Required
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
content | string | Yes | Memory content text |
summary | string | Yes | Short summary |
memory_type | string | No | Type: fact, preference, tool_result, correction, context (default: context) |
importance | float | No | Importance score 0.0–1.0 (default: 0.5) |
tags | string[] | No | Optional tags |
Response: 201 Created
GET /v1/memories/search
Search memories by semantic similarity.
Authentication: Required
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q | string | Yes | Search query |
Response: 200 OK — returns matching memories ranked by relevance.
GET /v1/memories/stats
Get memory storage statistics.
Authentication: Required
Response: 200 OK
{
"total": 42,
"by_type": [
{"memory_type": "Fact", "count": 15},
{"memory_type": "Preference", "count": 8},
{"memory_type": "Tool Result", "count": 10},
{"memory_type": "Correction", "count": 2},
{"memory_type": "Context", "count": 7}
]
}
POST /v1/memories/decay
Trigger a manual memory importance decay cycle. Reduces the importance of older, less-accessed memories.
Authentication: Required
Response: 200 OK
GET /v1/memories/
Get a specific memory by ID.
Authentication: Required
Response: 200 OK
DELETE /v1/memories/
Delete a specific memory.
Authentication: Required
Response: 204 No Content
Agentic Tasks
Multi-agent task orchestration. Decompose complex requests into parallel sub-tasks executed by independent AI agents.
Note: Requires
[agentic] enabled = trueinconfig.toml.
POST /v1/agentic/tasks
Create a new agentic task for multi-agent processing.
Authentication: Required
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
description | string | Yes | Task description in natural language |
require_approval | boolean | No | Require approval before sub-agent execution (default: false) |
{
"description": "Plan my trip to Berlin next week — check weather, find transit options, and create calendar events",
"require_approval": false
}
Response: 201 Created
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "planning",
"created_at": "2026-03-03T10:00:00Z"
}
GET /v1/agentic/tasks/
Get the current status and results of an agentic task.
Authentication: Required
Response: 200 OK
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"description": "Plan my trip to Berlin",
"plan_summary": "3 sub-tasks: weather, transit, calendar",
"sub_agents": [
{ "id": "sa-1", "description": "Check Berlin weather", "status": "completed" },
{ "id": "sa-2", "description": "Search transit", "status": "completed" },
{ "id": "sa-3", "description": "Create events", "status": "completed" }
],
"result": "Your Berlin trip is planned: ...",
"created_at": "2026-03-03T10:00:00Z"
}
GET /v1/agentic/tasks/{task_id}/stream
Stream real-time progress updates via Server-Sent Events (SSE).
Authentication: Required
Response: 200 OK (text/event-stream)
event: task_started
data: {"task_id": "550e8400-...", "description": "Plan my trip to Berlin"}
event: plan_created
data: {"task_id": "550e8400-...", "sub_tasks": [...]}
event: sub_agent_started
data: {"sub_agent_id": "sa-1", "description": "Check Berlin weather"}
event: sub_agent_completed
data: {"sub_agent_id": "sa-1", "result": "15°C, partly cloudy"}
event: task_completed
data: {"task_id": "550e8400-...", "result": "Your Berlin trip is planned: ..."}
POST /v1/agentic/tasks/{task_id}/cancel
Cancel a running agentic task and all its sub-agents.
Authentication: Required
Response: 200 OK
{
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "cancelled"
}
System
GET /v1/system/status
Get system status and resource usage.
Authentication: Required
Response: 200 OK
{
"version": "0.1.0",
"uptime_seconds": 86400,
"environment": "production",
"resources": {
"memory_used_mb": 256,
"cpu_percent": 15.5,
"database_size_mb": 42
},
"statistics": {
"requests_total": 15420,
"inference_requests": 8930,
"cache_hit_rate": 0.73
}
}
GET /v1/system/models
List available inference models.
Authentication: Required
Response: 200 OK
{
"models": [
{
"id": "qwen2.5-1.5b-instruct",
"name": "Qwen 2.5 1.5B Instruct",
"parameters": "1.5B",
"context_length": 4096,
"default": true
},
{
"id": "llama3.2-1b-instruct",
"name": "Llama 3.2 1B Instruct",
"parameters": "1B",
"context_length": 4096,
"default": false
}
]
}
Webhooks
POST /v1/webhooks/whatsapp
WhatsApp webhook endpoint for incoming messages.
Authentication: Signature verification via X-Hub-Signature-256 header
Verification Request (GET):
GET /v1/webhooks/whatsapp?hub.mode=subscribe&hub.verify_token=your-token&hub.challenge=challenge123
Response: The hub.challenge value
Message Webhook (POST):
{
"object": "whatsapp_business_account",
"entry": [{
"changes": [{
"value": {
"messages": [{
"from": "+1234567890",
"type": "text",
"text": { "body": "Hello" }
}]
}
}]
}]
}
Response: 200 OK
Metrics
GET /metrics
JSON metrics for monitoring.
Authentication: None required
Response: 200 OK
{
"uptime_seconds": 86400,
"http": {
"requests_total": 15420,
"requests_success": 15100,
"requests_client_error": 280,
"requests_server_error": 40,
"active_requests": 3,
"response_time_avg_ms": 125
},
"inference": {
"requests_total": 8930,
"requests_success": 8850,
"requests_failed": 80,
"time_avg_ms": 450,
"tokens_total": 1250000,
"healthy": true
}
}
GET /metrics/prometheus
Prometheus-compatible metrics.
Authentication: None required
Response: 200 OK (text/plain)
# HELP app_uptime_seconds Application uptime in seconds
# TYPE app_uptime_seconds counter
app_uptime_seconds 86400
# HELP http_requests_total Total HTTP requests
# TYPE http_requests_total counter
http_requests_total{status="success"} 15100
http_requests_total{status="client_error"} 280
http_requests_total{status="server_error"} 40
# HELP inference_time_ms_bucket Inference time histogram
# TYPE inference_time_ms_bucket histogram
inference_time_ms_bucket{le="100"} 1200
inference_time_ms_bucket{le="250"} 4500
inference_time_ms_bucket{le="500"} 7200
inference_time_ms_bucket{le="1000"} 8500
inference_time_ms_bucket{le="+Inf"} 8930
Error Handling
Error Response Format
All errors follow this format:
{
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message",
"details": {},
"request_id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Error Codes
| HTTP Status | Code | Description |
|---|---|---|
| 400 | BAD_REQUEST | Invalid request body or parameters |
| 401 | UNAUTHORIZED | Missing or invalid authentication |
| 403 | FORBIDDEN | Authenticated but not authorized |
| 404 | NOT_FOUND | Resource not found |
| 422 | VALIDATION_ERROR | Request validation failed |
| 429 | RATE_LIMITED | Too many requests |
| 500 | INTERNAL_ERROR | Server error |
| 502 | UPSTREAM_ERROR | External service error |
| 503 | SERVICE_UNAVAILABLE | Service temporarily unavailable |
Validation Errors
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Request validation failed",
"details": {
"fields": [
{"field": "message", "error": "cannot be empty"},
{"field": "temperature", "error": "must be between 0.0 and 2.0"}
]
}
}
}
OpenAPI Specification
Interactive Documentation
When the server is running, access interactive API documentation:
- Swagger UI:
http://localhost:3000/swagger-ui/ - ReDoc:
http://localhost:3000/redoc/
Export OpenAPI Spec
# Via CLI
pisovereign-cli openapi --output openapi.json
# Via API (if enabled)
curl http://localhost:3000/api-docs/openapi.json
OpenAPI 3.1 Specification
The full specification is available at:
- Development:
/api-docs/openapi.json - GitHub Pages:
/api/openapi.json
Example OpenAPI Excerpt
openapi: 3.1.0
info:
title: PiSovereign API
description: Local AI Assistant REST API
version: 0.1.0
license:
name: MIT
url: https://opensource.org/licenses/MIT
servers:
- url: http://localhost:3000
description: Development server
security:
- bearerAuth: []
paths:
/v1/chat:
post:
summary: Send chat message
operationId: chat
tags:
- Chat
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/ChatRequest'
responses:
'200':
description: Successful response
content:
application/json:
schema:
$ref: '#/components/schemas/ChatResponse'
'401':
$ref: '#/components/responses/Unauthorized'
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
description: API key authentication
schemas:
ChatRequest:
type: object
required:
- message
properties:
message:
type: string
description: User message
example: "What's the weather?"
conversation_id:
type: string
format: uuid
description: Continue existing conversation
SDK Examples
cURL
# Chat
curl -X POST http://localhost:3000/v1/chat \
-H "Authorization: Bearer sk-abc123" \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}'
# Command
curl -X POST http://localhost:3000/v1/commands \
-H "Authorization: Bearer sk-abc123" \
-H "Content-Type: application/json" \
-d '{"command": "briefing"}'
Python
import requests
API_URL = "http://localhost:3000"
API_KEY = "sk-abc123"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Chat
response = requests.post(
f"{API_URL}/v1/chat",
headers=headers,
json={"message": "What's the weather?"}
)
print(response.json()["content"])
JavaScript/TypeScript
const API_URL = "http://localhost:3000";
const API_KEY = "sk-abc123";
async function chat(message: string): Promise<string> {
const response = await fetch(`${API_URL}/v1/chat`, {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ message }),
});
const data = await response.json();
return data.content;
}
Production Deployment
Deploy PiSovereign for production use with TLS, monitoring, and hardened configuration
Overview
PiSovereign is deployed via Docker Compose. The stack includes Traefik for automatic TLS via Let’s Encrypt, Vault for secrets, Ollama for inference, and all supporting services.
Internet
│
▼
┌─────────────┐
│ Traefik │ ← TLS termination, Let's Encrypt
│ (Reverse │
│ Proxy) │
└─────────────┘
│ HTTP (internal)
▼
┌─────────────┐ ┌─────────────┐
│ PiSovereign │ ──▶ │ Ollama │
│ Server │ │ (isolated) │
└─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐
│ Prometheus │ ──▶ │ Grafana │
│ Metrics │ │ Dashboard │
└─────────────┘ └─────────────┘
Pre-Deployment Checklist
- Docker Engine 24+ with Compose v2 installed
- Vault initialized and secrets stored (Vault Setup)
- Domain name with DNS A record pointing to your server
- Firewall allows ports 80 and 443 (inbound)
- Backup strategy defined (Backup & Restore)
Deployment
Refer to the Docker Setup guide for the step-by-step deployment process. The key commands are:
cd PiSovereign/docker
cp .env.example .env
nano .env # Set PISOVEREIGN_DOMAIN and TRAEFIK_ACME_EMAIL
docker compose up -d
docker compose exec vault /vault/init.sh
Enable All Profiles
docker compose --profile monitoring --profile caldav up -d
Multi-Architecture Builds
PiSovereign images support both ARM64 (Raspberry Pi) and AMD64 (x86 servers):
docker pull --platform linux/arm64 ghcr.io/twohreichel/pisovereign:latest
docker pull --platform linux/amd64 ghcr.io/twohreichel/pisovereign:latest
TLS Configuration
Traefik with Let’s Encrypt
TLS is handled automatically by Traefik. The Docker Compose stack includes Traefik with HTTP challenge for Let’s Encrypt certificates. Key requirements:
- DNS A record pointing to your server’s public IP
- Ports 80 and 443 open in your firewall
- Valid email for Let’s Encrypt notifications (set in
.envasTRAEFIK_ACME_EMAIL)
Certificate auto-renewal is handled by Traefik — no manual intervention required.
TLS Hardening
For stricter TLS settings, edit docker/traefik/dynamic.yml:
tls:
options:
default:
minVersion: VersionTLS13
cipherSuites:
- TLS_AES_256_GCM_SHA384
- TLS_CHACHA20_POLY1305_SHA256
curvePreferences:
- X25519
- CurveP384
sniStrict: true
Production Configuration
Key settings for production in docker/config/config.toml:
environment = "production"
[server]
host = "0.0.0.0"
port = 3000
log_format = "json"
cors_enabled = true
allowed_origins = ["https://your-domain.example.com"]
shutdown_timeout_secs = 30
[inference]
base_url = "http://ollama:11434"
default_model = "gemma3:12b"
timeout_ms = 120000
[security]
rate_limit_enabled = true
rate_limit_rpm = 120
min_tls_version = "1.3"
tls_verify_certs = true
[database]
url = "postgres://pisovereign:pisovereign@postgres:5432/pisovereign"
max_connections = 10
run_migrations = true
[cache]
enabled = true
ttl_short_secs = 300
ttl_medium_secs = 3600
ttl_long_secs = 86400
l1_max_entries = 10000
[vault]
address = "http://vault:8200"
mount_path = "secret"
timeout_secs = 5
[degraded_mode]
enabled = true
unavailable_message = "Service temporarily unavailable. Please try again."
failure_threshold = 3
success_threshold = 2
[health]
global_timeout_secs = 5
See the Configuration Reference for all available options.
Deployment Verification
After deployment, verify everything is working:
# 1. Check all containers are running
docker compose ps
# 2. Check health endpoint
curl https://your-domain.example.com/health
# 3. Check all services are ready
curl https://your-domain.example.com/ready/all | jq
# 4. Test chat endpoint
curl -X POST https://your-domain.example.com/v1/chat \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "Hello"}' | jq
# 5. Check TLS certificate
openssl s_client -connect your-domain.example.com:443 -brief
# 6. Check metrics
curl http://localhost:3000/metrics/prometheus | head -20
Expected results:
- Health returns
{"status": "ok"} - Ready shows all services healthy
- Chat returns an AI response
- TLS shows a valid certificate
Advanced: Non-Docker Deployment
For advanced users who prefer running PiSovereign without Docker, you can build the binary directly:
cargo build --release
# Binaries: target/release/pisovereign-server, target/release/pisovereign-cli
You are responsible for managing Ollama, Vault, Signal-CLI, Whisper, Piper, and reverse proxy setup yourself. The Docker Compose stack in docker/compose.yml serves as the reference architecture.
Next Steps
- Monitoring — Grafana dashboards and alerting
- Backup & Restore — Automated backups
- Security Hardening — Application and network security
Monitoring
Prometheus metrics, Grafana dashboards, Loki log aggregation, and alerting
Overview
The monitoring stack is included in Docker Compose and activated with a single profile flag:
docker compose --profile monitoring up -d
This starts Prometheus, Grafana, Loki, Promtail, Node Exporter, and the OpenTelemetry Collector — all pre-configured to scrape PiSovereign metrics and collect logs.
┌─────────────────┐
│ PiSovereign │
│ /metrics/ │
│ prometheus │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐
│ Prometheus │────▶│ Grafana │
│ (Metrics) │ │ (Dashboards) │
└─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ Promtail │────▶│ Loki │
│ (Log Shipper) │ │ (Log Storage) │
└─────────────────┘ └─────────────────┘
Resource Usage (Raspberry Pi 5)
| Component | Memory | Storage/Day |
|---|---|---|
| Prometheus | ~100 MB | ~50 MB |
| Grafana | ~150 MB | Minimal |
| Loki | ~200 MB | ~100 MB |
| Promtail | ~30 MB | — |
| Total | ~480 MB | ~150 MB |
Accessing Dashboards
After enabling the monitoring profile:
| Service | URL |
|---|---|
| Grafana | http://localhost/grafana (via Traefik) |
| Prometheus | http://localhost:9090 |
Default Grafana credentials are admin / admin (change on first login).
Dashboards and data sources are auto-provisioned — no manual setup required.
Prometheus Metrics
PiSovereign exposes metrics at /metrics/prometheus:
Application Metrics
| Metric | Type | Description |
|---|---|---|
app_uptime_seconds | Counter | Application uptime |
app_version_info | Gauge | Version information |
HTTP Metrics
| Metric | Type | Description |
|---|---|---|
http_requests_total | Counter | Total HTTP requests |
http_requests_success_total | Counter | 2xx responses |
http_requests_client_error_total | Counter | 4xx responses |
http_requests_server_error_total | Counter | 5xx responses |
http_requests_active | Gauge | Active requests |
http_response_time_avg_ms | Gauge | Average response time |
http_response_time_ms_bucket | Histogram | Response time distribution |
Inference Metrics
| Metric | Type | Description |
|---|---|---|
inference_requests_total | Counter | Total inference requests |
inference_requests_success_total | Counter | Successful inferences |
inference_requests_failed_total | Counter | Failed inferences |
inference_time_avg_ms | Gauge | Average inference time |
inference_time_ms_bucket | Histogram | Inference time distribution |
inference_tokens_total | Counter | Total tokens generated |
inference_healthy | Gauge | Health status (0/1) |
Cache Metrics
| Metric | Type | Description |
|---|---|---|
cache_hits_total | Counter | Cache hits |
cache_misses_total | Counter | Cache misses |
cache_size | Gauge | Current cache size |
Model Routing Metrics
These metrics are only present when [model_routing] is enabled.
| Metric | Type | Description |
|---|---|---|
model_routing_requests_total{tier="..."} | Counter | Requests per tier (trivial/simple/moderate/complex) |
model_routing_template_hits_total | Counter | Trivial queries answered by template |
model_routing_upgrades_total | Counter | Tier upgrades due to low confidence |
Grafana Dashboard Panels
The pre-built PiSovereign dashboard includes:
Overview Row
| Panel | Description |
|---|---|
| Uptime | Application uptime counter |
| Inference Status | Health indicator |
| Total Requests | Cumulative request count |
| Active Requests | Current in-flight requests |
| Avg Response Time | Mean latency |
| Total Tokens | LLM tokens generated |
HTTP Requests Row
| Panel | Visualization | Description |
|---|---|---|
| Request Rate | Time series | Requests/second over time |
| Status Distribution | Pie chart | Success/error breakdown |
| Response Time P50/P90/P99 | Stat | Latency percentiles |
Inference Row
| Panel | Visualization | Description |
|---|---|---|
| Inference Rate | Time series | Inferences/second |
| Inference Latency | Gauge | Current avg latency |
| Token Rate | Time series | Tokens/second |
| Model Usage | Table | Per-model statistics |
System Row
| Panel | Description |
|---|---|
| CPU Usage | System CPU utilization |
| Memory Usage | RAM usage |
| Disk I/O | Storage throughput |
| Network I/O | Network traffic |
Alerting
Alert rules are pre-configured in docker/prometheus/rules/ (if present) or can be added:
# prometheus/rules/pisovereign.yml
groups:
- name: pisovereign
rules:
- alert: PiSovereignDown
expr: up{job="pisovereign"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PiSovereign is down"
- alert: InferenceEngineUnhealthy
expr: inference_healthy == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Inference engine is unhealthy"
- alert: HighResponseTime
expr: http_response_time_avg_ms > 5000
for: 5m
labels:
severity: warning
annotations:
summary: "Average response time is {{ $value }}ms"
- alert: HighErrorRate
expr: rate(http_requests_server_error_total[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "Server error rate is {{ $value | humanizePercentage }}"
- alert: InferenceFailures
expr: rate(inference_requests_failed_total[5m]) / rate(inference_requests_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Inference failure rate is {{ $value | humanizePercentage }}"
Log Aggregation
Loki and Promtail are included in the monitoring profile. Logs from all Docker containers are automatically collected and available in Grafana under the Loki data source.
To query logs in Grafana:
- Go to Explore → select Loki data source
- Use LogQL queries:
{container="pisovereign"} |= "error"
{container="ollama"} | json | level="error"
Resource Optimization
If running on constrained hardware, tune these settings:
# In docker/prometheus/prometheus.yml
global:
scrape_interval: 30s # Increase from 15s to reduce load
# Prometheus storage flags (in compose.yml command)
--storage.tsdb.retention.time=3d # Reduce from 7d
--storage.tsdb.retention.size=500MB # Cap storage
# In docker/loki/loki.yml
limits_config:
retention_period: 72h # 3 days instead of 7
Troubleshooting
Metrics not appearing
# Check PiSovereign exposes metrics
curl http://localhost:3000/metrics/prometheus
# Check Prometheus scrape targets
curl http://localhost:9090/api/v1/targets
Grafana dashboard empty
- Verify time range includes recent data
- Check Prometheus data source is connected (Settings → Data Sources)
- Query Prometheus directly at
http://localhost:9090/graph
Next Steps
- Backup & Restore — Protect your data
- Security Hardening — Secure monitoring endpoints
Backup & Restore
💾 Protect your PiSovereign data with comprehensive backup strategies
This guide covers backup procedures, automated backups, and disaster recovery.
Table of Contents
- Overview
- What to Back Up
- Database Backup
- S3-Compatible Storage
- Full System Backup
- Restore Procedures
- Backup Verification
- Retention Policy
Overview
Backup strategy overview:
| Component | Method | Frequency | Retention |
|---|---|---|---|
| Database | pg_dump | Daily | 7 daily, 4 weekly, 12 monthly |
| Configuration | File copy | On change | 5 versions |
| Vault Secrets | Vault backup | Weekly | 4 weekly |
| Full System | SD/NVMe image | Monthly | 3 monthly |
What to Back Up
Critical Data
| Path | Contents | Priority |
|---|---|---|
PostgreSQL database (via pg_dump) | Conversations, approvals, audit logs | High |
/etc/pisovereign/config.toml | Application configuration | High |
/opt/vault/data | Vault storage (if local) | High |
Important Data
| Path | Contents | Priority |
|---|---|---|
/var/lib/pisovereign/cache.redb | Persistent cache | Medium |
/opt/hailo/models | Downloaded models | Medium |
/etc/pisovereign/env | Environment overrides | Medium |
Can Be Recreated
| Path | Contents | Priority |
|---|---|---|
| Prometheus data | Metrics | Low |
| Grafana dashboards | Can reimport | Low |
| Log files | Historical only | Low |
Database Backup
Manual Backup
Using the PiSovereign CLI:
# Simple local backup
pisovereign-cli backup \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--output /backup/pisovereign-$(date +%Y%m%d).sql
# With timestamp
pisovereign-cli backup \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--output /backup/pisovereign-$(date +%Y%m%d_%H%M%S).sql
# Compressed backup
pisovereign-cli backup \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--output - | gzip > /backup/pisovereign-$(date +%Y%m%d).sql.gz
Using pg_dump directly:
# Custom format backup (most flexible, supports parallel restore)
pg_dump -Fc -h postgres -U pisovereign -d pisovereign \
-f /backup/pisovereign-$(date +%Y%m%d).dump
# Plain SQL backup
pg_dump -h postgres -U pisovereign -d pisovereign \
-f /backup/pisovereign-$(date +%Y%m%d).sql
Automated Backups
Create backup script:
sudo nano /usr/local/bin/pisovereign-backup.sh
#!/bin/bash
set -euo pipefail
# Configuration
BACKUP_DIR="/backup/pisovereign"
DB_URL="postgres://pisovereign:pisovereign@postgres:5432/pisovereign"
RETENTION_DAILY=7
RETENTION_WEEKLY=4
RETENTION_MONTHLY=12
# Create directories
mkdir -p "$BACKUP_DIR"/{daily,weekly,monthly}
# Timestamp
DATE=$(date +%Y%m%d)
DAY_OF_WEEK=$(date +%u)
DAY_OF_MONTH=$(date +%d)
# Daily backup (custom format for flexible restore)
DAILY_FILE="$BACKUP_DIR/daily/pisovereign-$DATE.dump.gz"
echo "Creating daily backup: $DAILY_FILE"
pg_dump -Fc -d "$DB_URL" | gzip > "$DAILY_FILE"
# Weekly backup (Sunday)
if [ "$DAY_OF_WEEK" -eq 7 ]; then
WEEKLY_FILE="$BACKUP_DIR/weekly/pisovereign-week$(date +%V)-$DATE.dump.gz"
echo "Creating weekly backup: $WEEKLY_FILE"
cp "$DAILY_FILE" "$WEEKLY_FILE"
fi
# Monthly backup (1st of month)
if [ "$DAY_OF_MONTH" -eq "01" ]; then
MONTHLY_FILE="$BACKUP_DIR/monthly/pisovereign-$(date +%Y%m).dump.gz"
echo "Creating monthly backup: $MONTHLY_FILE"
cp "$DAILY_FILE" "$MONTHLY_FILE"
fi
# Cleanup old backups
echo "Cleaning up old backups..."
find "$BACKUP_DIR/daily" -name "*.dump.gz" -mtime +$RETENTION_DAILY -delete
find "$BACKUP_DIR/weekly" -name "*.dump.gz" -mtime +$((RETENTION_WEEKLY * 7)) -delete
find "$BACKUP_DIR/monthly" -name "*.dump.gz" -mtime +$((RETENTION_MONTHLY * 30)) -delete
# Backup config
CONFIG_BACKUP="$BACKUP_DIR/config/config-$DATE.toml"
mkdir -p "$BACKUP_DIR/config"
cp /etc/pisovereign/config.toml "$CONFIG_BACKUP"
find "$BACKUP_DIR/config" -name "*.toml" -mtime +30 -delete
echo "Backup completed successfully"
sudo chmod +x /usr/local/bin/pisovereign-backup.sh
Schedule with cron:
sudo crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/pisovereign-backup.sh >> /var/log/pisovereign-backup.log 2>&1
S3-Compatible Storage
S3 Configuration
PiSovereign CLI supports S3-compatible storage (AWS S3, MinIO, Backblaze B2):
# Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
Or in configuration file:
# /etc/pisovereign/backup.toml
[s3]
bucket = "pisovereign-backups"
region = "eu-central-1"
endpoint = "https://s3.eu-central-1.amazonaws.com"
# For MinIO or Backblaze B2:
# endpoint = "https://s3.example.com"
S3 Backup Commands
# Backup to S3
pisovereign-cli backup \
--s3-bucket pisovereign-backups \
--s3-region eu-central-1 \
--s3-prefix daily/ \
--s3-access-key "$AWS_ACCESS_KEY_ID" \
--s3-secret-key "$AWS_SECRET_ACCESS_KEY"
# With custom endpoint (MinIO)
pisovereign-cli backup \
--s3-bucket pisovereign-backups \
--s3-endpoint https://minio.local:9000 \
--s3-access-key "$MINIO_ACCESS_KEY" \
--s3-secret-key "$MINIO_SECRET_KEY"
# List backups in S3
aws s3 ls s3://pisovereign-backups/daily/
Automated S3 backup script:
#!/bin/bash
set -euo pipefail
DATE=$(date +%Y%m%d)
# Upload to S3
pisovereign-cli backup \
--s3-bucket pisovereign-backups \
--s3-region eu-central-1 \
--s3-prefix "daily/pisovereign-$DATE.dump.gz" \
--s3-access-key "$AWS_ACCESS_KEY_ID" \
--s3-secret-key "$AWS_SECRET_ACCESS_KEY"
# Configure S3 lifecycle for automatic cleanup (one-time setup)
# aws s3api put-bucket-lifecycle-configuration \
# --bucket pisovereign-backups \
# --lifecycle-configuration file://lifecycle.json
S3 lifecycle policy (lifecycle.json):
{
"Rules": [
{
"ID": "DeleteOldDailyBackups",
"Status": "Enabled",
"Filter": { "Prefix": "daily/" },
"Expiration": { "Days": 7 }
},
{
"ID": "DeleteOldWeeklyBackups",
"Status": "Enabled",
"Filter": { "Prefix": "weekly/" },
"Expiration": { "Days": 30 }
},
{
"ID": "DeleteOldMonthlyBackups",
"Status": "Enabled",
"Filter": { "Prefix": "monthly/" },
"Expiration": { "Days": 365 }
}
]
}
Full System Backup
SD Card / NVMe Image
Create full system image for disaster recovery:
# Identify storage device
lsblk
# Create image (run from another system or boot USB)
sudo dd if=/dev/mmcblk0 of=/backup/pisovereign-full-$(date +%Y%m%d).img bs=4M status=progress
# Compress (takes a while)
gzip /backup/pisovereign-full-$(date +%Y%m%d).img
Incremental System Backup
Using rsync for incremental backups:
#!/bin/bash
# /usr/local/bin/pisovereign-system-backup.sh
BACKUP_DIR="/backup/system"
DATE=$(date +%Y%m%d)
LATEST="$BACKUP_DIR/latest"
mkdir -p "$BACKUP_DIR/$DATE"
rsync -aHAX --delete \
--exclude='/proc/*' \
--exclude='/sys/*' \
--exclude='/dev/*' \
--exclude='/tmp/*' \
--exclude='/run/*' \
--exclude='/mnt/*' \
--exclude='/media/*' \
--exclude='/backup/*' \
--link-dest="$LATEST" \
/ "$BACKUP_DIR/$DATE/"
rm -f "$LATEST"
ln -s "$BACKUP_DIR/$DATE" "$LATEST"
Restore Procedures
Database Restore
# Stop the service
sudo systemctl stop pisovereign
# Create a backup of the current database (just in case)
pg_dump -Fc -h postgres -U pisovereign -d pisovereign \
-f /tmp/pisovereign-pre-restore.dump
# Restore from backup (custom format)
gunzip -c /backup/pisovereign/daily/pisovereign-20260207.dump.gz > /tmp/restore.dump
pg_restore -h postgres -U pisovereign -d pisovereign --clean --if-exists /tmp/restore.dump
rm /tmp/restore.dump
# Or using CLI
pisovereign-cli restore \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--input /backup/pisovereign-20260207.dump
# Verify database connectivity and integrity
pg_isready -h postgres -U pisovereign -d pisovereign
psql -h postgres -U pisovereign -d pisovereign -c "SELECT 1;"
# Start service
sudo systemctl start pisovereign
# Verify
pisovereign-cli status
Restore from S3
# Download from S3
aws s3 cp s3://pisovereign-backups/daily/pisovereign-20260207.dump.gz /tmp/
# Or using CLI
pisovereign-cli restore \
--s3-bucket pisovereign-backups \
--s3-key daily/pisovereign-20260207.dump.gz \
--s3-region eu-central-1
Configuration Restore
# Restore config
sudo cp /backup/pisovereign/config/config-20260207.toml /etc/pisovereign/config.toml
# Verify syntax
pisovereign-cli config validate
# Restart service
sudo systemctl restart pisovereign
Disaster Recovery
Complete system recovery procedure:
- Flash fresh Raspberry Pi OS
# On another computer, flash SD card
# Use Raspberry Pi Imager
- Basic system setup
# SSH in, update system
sudo apt update && sudo apt upgrade -y
- Restore from full image (if available)
# On another system
gunzip -c pisovereign-full-20260207.img.gz | sudo dd of=/dev/mmcblk0 bs=4M status=progress
- Or restore components
# Install PiSovereign
# (Follow installation guide)
# Restore configuration
sudo mkdir -p /etc/pisovereign
sudo cp config.toml.backup /etc/pisovereign/config.toml
# Restore database
pisovereign-cli restore \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--input pisovereign-backup.dump
# Restore Vault (if using local Vault)
sudo tar -xzf vault-backup.tar.gz -C /opt/vault/
# Start services
sudo systemctl start pisovereign
Backup Verification
Verify Database Backup
# Check file integrity
gzip -t /backup/pisovereign/daily/pisovereign-20260207.dump.gz && echo "OK"
# Test restore to a temporary database
createdb -h postgres -U pisovereign pisovereign_verify
gunzip -c /backup/pisovereign/daily/pisovereign-20260207.dump.gz | \
pg_restore -h postgres -U pisovereign -d pisovereign_verify
psql -h postgres -U pisovereign -d pisovereign_verify \
-c "SELECT COUNT(*) FROM conversations;"
dropdb -h postgres -U pisovereign pisovereign_verify
Automated Verification
#!/bin/bash
# /usr/local/bin/verify-backup.sh
DB_URL="postgres://pisovereign:pisovereign@postgres:5432"
BACKUP_FILE="/backup/pisovereign/daily/pisovereign-$(date +%Y%m%d).dump.gz"
if [ ! -f "$BACKUP_FILE" ]; then
echo "ERROR: Today's backup not found!"
exit 1
fi
# Verify gzip integrity
if ! gzip -t "$BACKUP_FILE" 2>/dev/null; then
echo "ERROR: Backup file is corrupted!"
exit 1
fi
# Verify database integrity by test-restoring to a temporary database
createdb -h postgres -U pisovereign pisovereign_verify
gunzip -c "$BACKUP_FILE" | pg_restore -h postgres -U pisovereign -d pisovereign_verify 2>&1
INTEGRITY=$(psql -h postgres -U pisovereign -d pisovereign_verify -tAc "SELECT 1;" 2>&1)
dropdb -h postgres -U pisovereign pisovereign_verify
if [ "$INTEGRITY" != "1" ]; then
echo "ERROR: Database integrity check failed: $INTEGRITY"
exit 1
fi
echo "Backup verification passed"
Add to cron:
# Verify backup at 3 AM (after 2 AM backup)
0 3 * * * /usr/local/bin/verify-backup.sh || echo "Backup verification failed!" | mail -s "PiSovereign Backup Alert" admin@example.com
Retention Policy
Recommended Policy
| Type | Retention | Storage Estimate |
|---|---|---|
| Daily | 7 days | ~70 MB |
| Weekly | 4 weeks | ~40 MB |
| Monthly | 12 months | ~120 MB |
| Total | - | ~230 MB |
Cleanup Script
#!/bin/bash
# /usr/local/bin/cleanup-backups.sh
BACKUP_DIR="/backup/pisovereign"
# Remove old daily backups (older than 7 days)
find "$BACKUP_DIR/daily" -name "*.dump.gz" -mtime +7 -delete
# Remove old weekly backups (older than 28 days)
find "$BACKUP_DIR/weekly" -name "*.dump.gz" -mtime +28 -delete
# Remove old monthly backups (older than 365 days)
find "$BACKUP_DIR/monthly" -name "*.dump.gz" -mtime +365 -delete
# Remove old config backups (older than 30 days)
find "$BACKUP_DIR/config" -name "*.toml" -mtime +30 -delete
# Report disk usage
echo "Backup disk usage:"
du -sh "$BACKUP_DIR"/*
Quick Reference
Backup Commands
# Local backup
pisovereign-cli backup \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--output /backup/pisovereign.dump
# S3 backup
pisovereign-cli backup --s3-bucket mybucket --s3-prefix daily/
# Verify backup
pg_restore --list /backup/pisovereign.dump
Restore Commands
# Local restore
pisovereign-cli restore \
--database-url postgres://pisovereign:pisovereign@postgres:5432/pisovereign \
--input /backup/pisovereign.dump
# S3 restore
pisovereign-cli restore --s3-bucket mybucket --s3-key daily/pisovereign.dump
Monitoring Backup Health
Add to Prometheus:
# prometheus/rules/backups.yml
groups:
- name: backups
rules:
- alert: BackupMissing
expr: time() - file_mtime{path="/backup/pisovereign/daily/latest.dump.gz"} > 86400
for: 1h
labels:
severity: warning
annotations:
summary: "Daily backup is missing"
description: "No backup created in the last 24 hours"
Next Steps
- Security Hardening - Encrypt backups
- Monitoring - Monitor backup health
Security Hardening
Production security guide for PiSovereign deployments
Security Architecture
┌─────────────────────────────────────────────────┐
│ Network: Traefik TLS 1.3 + Docker isolation │
├─────────────────────────────────────────────────┤
│ Application: Rate limiting, auth, validation │
├─────────────────────────────────────────────────┤
│ Secrets: HashiCorp Vault, encrypted storage │
├─────────────────────────────────────────────────┤
│ Host: SSH hardened, firewall, auto-updates │
└─────────────────────────────────────────────────┘
Principles: Defense in depth — least privilege — fail secure — audit everything.
Host Security Basics
Docker provides process isolation, but the host still needs hardening. Apply these essentials on any machine running PiSovereign:
| Area | Action |
|---|---|
| SSH | Disable password auth, use Ed25519 keys, set PermitRootLogin no, consider a non-default port |
| Firewall | Allow only SSH + 443 (HTTPS). On Linux: ufw default deny incoming && ufw allow 22/tcp && ufw allow 443/tcp && ufw enable |
| Fail2ban | apt install fail2ban — protects SSH and can monitor Docker logs for repeated 401/429 responses |
| Updates | Enable automatic security updates (unattended-upgrades on Debian/Ubuntu) |
| Users | Lock root (passwd -l root), use a personal account with sudo |
For comprehensive OS hardening, refer to the CIS Benchmark for your distribution.
Application Security
Rate Limiting
[security]
rate_limit_enabled = true
rate_limit_rpm = 120 # Per IP per minute
[api]
max_request_size_bytes = 1048576 # 1 MB
request_timeout_secs = 30
API Authentication
Generate and store API keys in Vault:
docker compose exec vault vault kv put secret/pisovereign/api-keys \
admin="$(openssl rand -base64 32)"
All requests require Authorization: Bearer <api-key>. Invalid keys return a generic 401 — no information leakage. Rate limiting is applied per key.
Input Validation
PiSovereign validates all inputs automatically:
- Maximum lengths enforced on all string fields
- Content-type verification
- JSON schema validation
- Path traversal protection
- SQL injection prevention via parameterized queries
Container Isolation
Docker Compose provides process-level isolation. The default stack additionally:
- Runs Ollama on an
internal: truenetwork (ollama-internal) — no direct external access - Binds services to
127.0.0.1where possible (Baïkal, Vault UI) - Uses read-only filesystem mounts for config files
- Limits container capabilities via Docker defaults
Vault Security
PiSovereign uses a ChainedSecretStore — Vault is the primary store with config.toml as fallback. See Vault Setup for initial configuration.
Seal/Unseal
The Docker stack auto-initializes and auto-unseals Vault for convenience. In production, consider:
- Manual unseal: Remove the
vault-initcontainer, unseal interactively after each restart - Key splitting (Shamir’s Secret Sharing):
vault operator init -key-shares=5 -key-threshold=3— distribute shares to different people/locations - Cloud KMS auto-unseal: Use AWS KMS, GCP KMS, or Azure Key Vault for unattended unseal without storing keys locally
Token Management
PiSovereign uses AppRole authentication with short-lived tokens:
# Tokens expire after 1 hour, max 4 hours
docker compose exec vault vault write auth/approle/role/pisovereign \
token_policies="pisovereign" \
token_ttl=1h \
token_max_ttl=4h \
secret_id_ttl=24h
Best practices:
- Use short TTLs (1 hour default is good)
- Rotate secret IDs regularly
- Never log tokens
- Revoke tokens on application shutdown
Audit Logging
docker compose exec vault vault audit enable file \
file_path=/vault/logs/audit.log
Network Security
TLS Configuration
Traefik handles TLS termination. Harden the defaults:
# docker/traefik/dynamic.yml
tls:
options:
default:
minVersion: VersionTLS13
cipherSuites:
- TLS_AES_256_GCM_SHA384
- TLS_CHACHA20_POLY1305_SHA256
curvePreferences:
- X25519
- CurveP384
sniStrict: true
In config.toml:
[security]
min_tls_version = "1.3"
tls_verify_certs = true
Network Isolation
The Docker Compose stack defines two networks:
| Network | Type | Purpose |
|---|---|---|
pisovereign-network | bridge | Main service communication |
ollama-internal | internal bridge | Isolates Ollama — no external access |
Traefik is the only service exposed to the host network. All other services communicate internally.
Security Monitoring
Configure structured JSON logging:
[logging]
level = "info"
format = "json"
include_request_id = true
include_user_id = true
Key events to monitor:
- Failed authentication attempts (401s)
- Rate limit triggers (429s)
- Vault access failures
- Unusual request patterns
See Monitoring for Prometheus alert rules covering HighFailedAuthRate and RateLimitTriggered.
Incident Response
- Isolate — stop external access:
docker compose downor firewall deny-all - Preserve evidence — copy container logs:
docker compose logs > incident-$(date +%Y%m%d).log - Rotate credentials:
docker compose exec vault vault kv put secret/pisovereign/api-keys \ admin="$(openssl rand -base64 32)" - Review access — check Docker logs, Vault audit log, SSH
lastlog - Restore from known-good backup if needed
Security Checklist
Initial Setup
- Host SSH uses key-only authentication
- Firewall allows only required ports
- Automatic security updates enabled
- Default passwords changed
Application
- Rate limiting enabled
- API keys stored in Vault
- TLS 1.3 minimum enforced
- Logs do not contain secrets
Vault
- Unseal keys secured (not on same host in production)
- AppRole configured with short TTLs
- Audit logging enabled
Ongoing
- Monthly credential rotation
- Review Vault audit logs
- Keep Docker images updated
- Review container security scans
References
- CIS Benchmarks — OS hardening baselines
- OWASP API Security Top 10
- HashiCorp Vault Security Model
- Mozilla SSL Configuration Generator
- Docker Security Best Practices
References
📚 External resources and documentation references
This page collects official documentation, tutorials, and resources referenced throughout the PiSovereign documentation.
Hardware
Raspberry Pi 5
| Resource | Description |
|---|---|
| Raspberry Pi 5 Product Page | Official product information |
| Raspberry Pi 5 Documentation | Hardware specifications and setup |
| Raspberry Pi OS | Operating system downloads |
| Raspberry Pi Imager | SD card flashing tool |
| GPIO Pinout | Interactive pinout reference |
Hailo AI Accelerator
| Resource | Description |
|---|---|
| Hailo-10H AI HAT+ Product Page | Official product information |
| Hailo Developer Zone | SDKs, tools, and documentation |
| HailoRT SDK 4.20 Documentation | Runtime SDK reference |
| Hailo Model Zoo | Pre-compiled models |
| Hailo-Ollama GitHub | Ollama-compatible inference server |
Storage
| Resource | Description |
|---|---|
| NVMe SSD Compatibility | NVMe boot support |
| PCIe HAT+ Documentation | PCIe expansion |
Rust Ecosystem
Language & Tools
| Resource | Description |
|---|---|
| The Rust Programming Language | Official Rust book |
| Rust by Example | Learn Rust through examples |
| Rust API Guidelines | Best practices for API design |
| Rust Edition Guide | Edition migration guide |
| rustup Documentation | Toolchain manager |
| Cargo Book | Package manager documentation |
Frameworks Used
| Resource | Description |
|---|---|
| Axum Documentation | Web framework |
| Tokio Documentation | Async runtime |
| SQLx Documentation | Async SQL toolkit |
| Serde Documentation | Serialization framework |
| Tower Documentation | Middleware framework |
| Tracing Documentation | Application instrumentation |
| Clap Documentation | Command-line parser |
| Reqwest Documentation | HTTP client |
| Utoipa Documentation | OpenAPI generation |
Testing & Quality
| Resource | Description |
|---|---|
| Rust Testing | Testing in Rust |
| cargo-tarpaulin | Code coverage tool |
| cargo-deny | Dependency linting |
| Clippy Lints | Lint reference |
| Rustfmt Configuration | Formatter options |
Security
HashiCorp Vault
| Resource | Description |
|---|---|
| Vault Documentation | Official documentation |
| Vault Getting Started | Beginner tutorials |
| KV Secrets Engine v2 | Key-value secrets |
| AppRole Auth Method | Application authentication |
| Vault Security Model | Security architecture |
| Vault Production Hardening | Production best practices |
System Security
| Resource | Description |
|---|---|
| CIS Benchmarks | Security configuration guides |
| OWASP API Security Top 10 | API security risks |
| Mozilla SSL Configuration | TLS configuration generator |
| SSH Hardening Guide | SSH security |
| Fail2ban Documentation | Intrusion prevention |
Cryptography
| Resource | Description |
|---|---|
| RustCrypto | Pure Rust crypto implementations |
| ring Documentation | Crypto library |
| Argon2 Specification | Password hashing |
APIs & Integrations
AI & Language Models
| Resource | Description |
|---|---|
| OpenAI API Reference | OpenAI API docs |
| Ollama API | Ollama REST API |
| LLM Tokenization | Understanding tokenizers |
Communication
| Resource | Description |
|---|---|
| WhatsApp Business API | WhatsApp Cloud API |
| WhatsApp Webhooks | Webhook setup |
| Resource | Description |
|---|---|
| Proton Bridge | Proton Mail IMAP/SMTP bridge |
| Gmail IMAP | Gmail IMAP/SMTP settings |
| Outlook IMAP | Outlook IMAP/SMTP settings |
| IMAP RFC 3501 | IMAP protocol |
| SMTP RFC 5321 | SMTP protocol |
| XOAUTH2 SASL | OAuth2 for IMAP/SMTP |
Calendar
| Resource | Description |
|---|---|
| CalDAV RFC 4791 | CalDAV protocol |
| iCalendar RFC 5545 | iCalendar format |
| Baïkal Server | CalDAV/CardDAV server |
Weather
| Resource | Description |
|---|---|
| Open-Meteo API | Free weather API |
Infrastructure
Docker
| Resource | Description |
|---|---|
| Docker Documentation | Official docs |
| Docker Compose | Multi-container apps |
| Docker on Raspberry Pi | ARM installation |
Reverse Proxy
| Resource | Description |
|---|---|
| Traefik Documentation | Cloud-native proxy |
| Let’s Encrypt | Free TLS certificates |
| Nginx Documentation | Web server/proxy |
Monitoring
| Resource | Description |
|---|---|
| Prometheus Documentation | Metrics collection |
| Grafana Documentation | Visualization |
| Loki Documentation | Log aggregation |
| OpenTelemetry | Observability framework |
Databases
| Resource | Description |
|---|---|
| PostgreSQL 17 Documentation | Relational database |
| pgvector | Vector similarity search for PostgreSQL |
Development Tools
VS Code
| Resource | Description |
|---|---|
| rust-analyzer | Rust language server |
| CodeLLDB | Debugger |
| Even Better TOML | TOML support |
GitHub
| Resource | Description |
|---|---|
| GitHub Actions | CI/CD platform |
| Release Please | Release automation |
| GitHub Pages | Static site hosting |
Documentation
| Resource | Description |
|---|---|
| mdBook Documentation | Documentation tool |
| rustdoc Book | Rust documentation |
Standards & Specifications
| Resource | Description |
|---|---|
| OpenAPI Specification | API description format |
| JSON Schema | JSON validation |
| Semantic Versioning | Version numbering |
| Keep a Changelog | Changelog format |
| Conventional Commits | Commit message format |
Community
| Resource | Description |
|---|---|
| Rust Users Forum | Community forum |
| Rust Discord | Chat community |
| This Week in Rust | Weekly newsletter |
| Raspberry Pi Forums | Hardware community |
💡 Tip: Many of these resources are updated regularly. Always check for the latest version of documentation when implementing features.