AI Memory System
PiSovereign includes a persistent AI memory system that enables your assistant to remember facts, preferences, and past interactions. This creates a more personalized and contextually aware experience.
Overview
The memory system provides:
- Persistent Storage: All interactions can be stored in PostgreSQL with encryption at rest
- Semantic Search (RAG): Retrieve relevant memories based on meaning, not just keywords
- Automatic Learning: The AI learns from conversations automatically
- Memory Decay: Less important or rarely accessed memories fade over time
- Deduplication: Similar memories are merged to prevent redundancy
- Content Encryption: Sensitive data is encrypted at rest using XChaCha20-Poly1305
How It Works
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │────▶│ RAG Retrieval │────▶│ Context + Query│
│ "What's my │ │ (Top 5 similar) │ │ sent to LLM │
│ favorite..." │ └──────────────────┘ └─────────────────┘
└─────────────────┘ │ │
│ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Stored Memory │◀────│ Learning Phase │◀────│ AI Response │
│ (Encrypted) │ │ (Q&A + Metadata) │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
1. RAG Context Retrieval
When you ask a question:
- The query is converted to an embedding vector using
nomic-embed-text - Similar memories are found using cosine similarity search
- The top N most relevant memories are sorted by type priority (corrections and facts first) and injected into the prompt with an instructive preamble that explicitly tells the LLM to treat them as known facts
- Full memory content is used (not truncated summaries), with a 2 000-character budget to stay within the token window
- The AI generates a response with full context
2. Automatic Learning
After each response (including streamed responses):
- The Q&A pair is evaluated for importance using lightweight heuristics (no LLM call):
- AI naming cues (“nenn dich”, “your name is”, “du heißt”) → +0.40
- Identity cues (“my name is”, “I live in”, “ich heiße”) → +0.35
- Correction cues (“that’s wrong”, “please remember”, “eigentlich”) → +0.30
- Preference cues (“I prefer”, “I like”, “ich mag”) → +0.25
- Word count adjustments (longer = more valuable)
- Final score clamped to [0.2, 0.9]
- The memory type is automatically classified (priority order):
- AI naming signals →
Fact - Correction signals →
Correction - Preference signals →
Preference - Identity/fact signals →
Fact - Default →
Context
- AI naming signals →
- Embeddings are generated for semantic search
- If a similar memory exists (>85% similarity), they’re merged (on plaintext, before encryption)
- Content is encrypted before storage
Note: Both the HTTP chat endpoint (
ChatService) and the messenger path (MemoryEnhancedChat) use the same shared heuristic module (importance.rs) for consistent importance estimation and type classification.
3. Memory Types
| Type | Purpose | Example |
|---|---|---|
| Fact | General knowledge | “Paris is the capital of France” |
| Preference | User preferences | “User prefers dark mode” |
| Correction | Feedback/corrections | “Actually, the meeting is Tuesday not Monday” |
| ToolResult | API/tool outputs | “Weather API returned: 22°C, sunny” |
| Context | Conversation context | “Q: What time is it? A: 3:00 PM” |
4. Relevance Scoring
When memories are retrieved for RAG context, they are ranked using a combined relevance score that balances three factors:
relevance_score = similarity × 0.50 + importance × 0.20 + freshness × 0.30
Where:
similarity(50%): Cosine similarity between query and memory embeddings (0.0–1.0)importance(20%): Current importance after decay (0.0–1.0), with per-type floorsfreshness(30%): Exponential decay based on time since last access:e^(-0.01 × hours). Memories from seconds ago score ~1.0, from one day ago ~0.79, from one week ago ~0.19.
This ensures that memories from the current conversation session (stored moments ago) dominate when relevant, while long-term knowledge still contributes via the similarity and importance terms.
After scoring, memories are sorted by type priority before injection:
- Corrections (highest priority — user explicitly corrected the AI)
- Facts (identity, names, important knowledge)
- Preferences
- Context
- Tool Results
Configuration
Add to your config.toml:
[memory]
# Enable memory storage
enabled = true
# Enable RAG context retrieval
enable_rag = true
# Enable automatic learning from interactions
enable_learning = true
# Number of memories to retrieve for RAG context
rag_limit = 5
# Minimum similarity threshold for RAG retrieval (0.0-1.0)
rag_threshold = 0.5
# Similarity threshold for memory deduplication (0.0-1.0)
merge_threshold = 0.85
# Minimum importance score to keep memories
min_importance = 0.1
# Decay factor for memory importance over time
decay_factor = 0.95
# Enable content encryption
enable_encryption = true
# Path to encryption key file (generated if not exists)
encryption_key_path = "memory_encryption.key"
[memory.embedding]
# Embedding model name
model = "nomic-embed-text"
# Embedding dimension
dimension = 384
# Request timeout in milliseconds
timeout_ms = 30000
Memory Decay
Memory importance decays over time using an Ebbinghaus-inspired model with per-type modifiers that ensure critical memories resist forgetting:
stability = 1.0 + ln(1 + access_count)
type_modifier = memory_type.decay_modifier()
effective_rate = (base_decay_rate × type_modifier) / stability
reinforcement = min(access_count × 0.005, 0.08)
new_importance = max(
importance × e^(-effective_rate × days) + reinforcement,
memory_type.importance_floor()
)
Type-specific modifiers
| Memory Type | Decay Modifier | Importance Floor | Effect |
|---|---|---|---|
| Correction | 0.50 | 0.35 | Decays very slowly, never drops below 0.35 |
| Fact | 0.70 | 0.30 | Decays slowly, never drops below 0.30 |
| Preference | 0.80 | 0.25 | Moderate decay |
| Tool Result | 1.00 | 0.10 | Normal decay, ephemeral |
| Context | 1.00 | 0.10 | Normal decay, ephemeral |
This mirrors the human brain: corrections and facts are “episodic memories” that the brain retains much longer than transient working-memory items.
Other factors:
base_decay_rate: Derived fromdecay_factor(default: 0.95)stability: Grows logarithmically with each access — first access gives stability ≈ 1.0, ~3 accesses double it, with diminishing returnsreinforcement: A small bonus (up to 0.08) that prevents heavily-used memories from vanishing entirelydays_since_access: Time elapsed since the memory was last retrieved
Memories below min_importance are automatically cleaned up.
Security
Content Encryption
All memory content and summaries are encrypted using:
- Algorithm: XChaCha20-Poly1305 (AEAD)
- Key Size: 256 bits
- Nonce Size: 192 bits (unique per encryption)
The encryption key is stored at encryption_key_path and auto-generated if missing.
⚠️ Important: Backup your encryption key! Without it, encrypted memories cannot be recovered.
Embedding Vectors
Embedding vectors are stored unencrypted to enable similarity search. They reveal:
- Semantic similarity between memories
- General topic clustering
They do NOT reveal:
- Actual content
- Specific details
Embedding Models
The system supports various Ollama embedding models:
| Model | Dimensions | Use Case |
|---|---|---|
nomic-embed-text | 384 | Default, balanced |
mxbai-embed-large | 1024 | Higher accuracy |
bge-m3 | 1024 | Multilingual |
To use a different model:
[memory.embedding]
model = "mxbai-embed-large"
dimension = 1024
Database Schema
Memories are stored in PostgreSQL with pgvector for similarity search:
-- Main memories table
CREATE TABLE memories (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
conversation_id UUID,
content TEXT NOT NULL, -- Encrypted
summary TEXT NOT NULL, -- Encrypted
importance DOUBLE PRECISION NOT NULL,
memory_type TEXT NOT NULL,
tags JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
accessed_at TIMESTAMPTZ NOT NULL,
access_count INTEGER DEFAULT 0,
embedding vector(384) -- pgvector column for similarity search
);
-- IVFFlat index for fast cosine similarity search
CREATE INDEX idx_memories_embedding ON memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Full-text search index
CREATE INDEX idx_memories_fts ON memories
USING gin (to_tsvector('english', content || ' ' || summary));
Manual Memory Management
You can manually store specific information:
// Store a fact
memory_service.store_fact(user_id, "User's birthday is March 15", 0.9).await?;
// Store a preference
memory_service.store_preference(user_id, "Prefers metric units", 0.8).await?;
// Store a correction
memory_service.store_correction(user_id, "Actually prefers tea, not coffee", 1.0).await?;
Maintenance
Applying Decay
Memory decay runs as an automatic background task (daily by default). The interval is controlled by the decay_factor configuration. You can also trigger it manually:
let decayed = memory_service.apply_decay().await?;
println!("Decayed {} memories", decayed.len());
Or via the REST API:
curl -X POST -H "Authorization: Bearer $API_KEY" \
http://localhost:3000/v1/memories/decay
Cleaning Up Low-Importance Memories
let deleted = memory_service.cleanup_low_importance().await?;
println!("Deleted {} memories", deleted);
Statistics
let stats = memory_service.stats(&user_id).await?;
println!("Total: {}, With embeddings: {}, Avg importance: {:.2}",
stats.total_count, stats.with_embeddings, stats.avg_importance);
Troubleshooting
Memories Not Being Retrieved
- Check that
enable_rag = true - Verify
rag_thresholdisn’t too high (try 0.3) - Ensure embeddings are generated (check
with_embeddingsin stats) - Confirm Ollama is running with the embedding model
High Memory Usage
- Lower
rag_limitto reduce context size - Run
cleanup_low_importance()more frequently - Increase
min_importancethreshold - Reduce
decay_factorfor faster decay
Encryption Key Lost
If you lose the encryption key, encrypted memories cannot be recovered.
To start fresh:
- Delete
memory_encryption.key - Clear the
memoriesandmemory_embeddingstables - A new key will be generated on next startup
Architecture
The memory system follows the ports-and-adapters pattern:
MemoryContextPort— the primary port interface used byChatServiceto inject RAG context into prompts. Implementations receive a query string and return relevant memory snippets.MemoryService— the core service that orchestrates embedding generation, semantic search, encryption, and storage. Requires three ports:MemoryPort— persistence (PostgreSQL adapter)EmbeddingPort— vector generation (Ollama adapter usingnomic-embed-text)EncryptionPort— content encryption (ChaChaEncryptionAdapterorNoOpEncryption)
// The MemoryContextPort trait signature
#[async_trait]
pub trait MemoryContextPort: Send + Sync {
async fn retrieve_context(
&self,
user_id: &UserId,
query: &str,
limit: usize,
) -> Result<Vec<MemoryContext>, MemoryError>;
}
API Endpoints
See the API Reference for full REST API documentation covering:
GET /v1/memories— list memoriesPOST /v1/memories— create a memoryGET /v1/memories/search?q=...— semantic searchGET /v1/memories/stats— storage statisticsPOST /v1/memories/decay— trigger decayGET /v1/memories/{id}— get specific memoryDELETE /v1/memories/{id}— delete memory