AI Memory System

PiSovereign includes a persistent AI memory system that enables your assistant to remember facts, preferences, and past interactions. This creates a more personalized and contextually aware experience.

Overview

The memory system provides:

  • Persistent Storage: All interactions can be stored in PostgreSQL with encryption at rest
  • Semantic Search (RAG): Retrieve relevant memories based on meaning, not just keywords
  • Automatic Learning: The AI learns from conversations automatically
  • Memory Decay: Less important or rarely accessed memories fade over time
  • Deduplication: Similar memories are merged to prevent redundancy
  • Content Encryption: Sensitive data is encrypted at rest using XChaCha20-Poly1305

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   User Query    │────▶│   RAG Retrieval  │────▶│  Context + Query│
│  "What's my     │     │  (Top 5 similar) │     │  sent to LLM    │
│   favorite..."  │     └──────────────────┘     └─────────────────┘
└─────────────────┘              │                        │
                                 │                        ▼
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│ Stored Memory   │◀────│  Learning Phase  │◀────│   AI Response   │
│ (Encrypted)     │     │ (Q&A + Metadata) │     │                 │
└─────────────────┘     └──────────────────┘     └─────────────────┘

1. RAG Context Retrieval

When you ask a question:

  1. The query is converted to an embedding vector using nomic-embed-text
  2. Similar memories are found using cosine similarity search
  3. The top N most relevant memories are sorted by type priority (corrections and facts first) and injected into the prompt with an instructive preamble that explicitly tells the LLM to treat them as known facts
  4. Full memory content is used (not truncated summaries), with a 2 000-character budget to stay within the token window
  5. The AI generates a response with full context

2. Automatic Learning

After each response (including streamed responses):

  1. The Q&A pair is evaluated for importance using lightweight heuristics (no LLM call):
    • AI naming cues (“nenn dich”, “your name is”, “du heißt”) → +0.40
    • Identity cues (“my name is”, “I live in”, “ich heiße”) → +0.35
    • Correction cues (“that’s wrong”, “please remember”, “eigentlich”) → +0.30
    • Preference cues (“I prefer”, “I like”, “ich mag”) → +0.25
    • Word count adjustments (longer = more valuable)
    • Final score clamped to [0.2, 0.9]
  2. The memory type is automatically classified (priority order):
    • AI naming signals → Fact
    • Correction signals → Correction
    • Preference signals → Preference
    • Identity/fact signals → Fact
    • Default → Context
  3. Embeddings are generated for semantic search
  4. If a similar memory exists (>85% similarity), they’re merged (on plaintext, before encryption)
  5. Content is encrypted before storage

Note: Both the HTTP chat endpoint (ChatService) and the messenger path (MemoryEnhancedChat) use the same shared heuristic module (importance.rs) for consistent importance estimation and type classification.

3. Memory Types

TypePurposeExample
FactGeneral knowledge“Paris is the capital of France”
PreferenceUser preferences“User prefers dark mode”
CorrectionFeedback/corrections“Actually, the meeting is Tuesday not Monday”
ToolResultAPI/tool outputs“Weather API returned: 22°C, sunny”
ContextConversation context“Q: What time is it? A: 3:00 PM”

4. Relevance Scoring

When memories are retrieved for RAG context, they are ranked using a combined relevance score that balances three factors:

relevance_score = similarity × 0.50  +  importance × 0.20  +  freshness × 0.30

Where:

  • similarity (50%): Cosine similarity between query and memory embeddings (0.0–1.0)
  • importance (20%): Current importance after decay (0.0–1.0), with per-type floors
  • freshness (30%): Exponential decay based on time since last access: e^(-0.01 × hours). Memories from seconds ago score ~1.0, from one day ago ~0.79, from one week ago ~0.19.

This ensures that memories from the current conversation session (stored moments ago) dominate when relevant, while long-term knowledge still contributes via the similarity and importance terms.

After scoring, memories are sorted by type priority before injection:

  1. Corrections (highest priority — user explicitly corrected the AI)
  2. Facts (identity, names, important knowledge)
  3. Preferences
  4. Context
  5. Tool Results

Configuration

Add to your config.toml:

[memory]
# Enable memory storage
enabled = true

# Enable RAG context retrieval
enable_rag = true

# Enable automatic learning from interactions
enable_learning = true

# Number of memories to retrieve for RAG context
rag_limit = 5

# Minimum similarity threshold for RAG retrieval (0.0-1.0)
rag_threshold = 0.5

# Similarity threshold for memory deduplication (0.0-1.0)
merge_threshold = 0.85

# Minimum importance score to keep memories
min_importance = 0.1

# Decay factor for memory importance over time
decay_factor = 0.95

# Enable content encryption
enable_encryption = true

# Path to encryption key file (generated if not exists)
encryption_key_path = "memory_encryption.key"

[memory.embedding]
# Embedding model name
model = "nomic-embed-text"

# Embedding dimension
dimension = 384

# Request timeout in milliseconds
timeout_ms = 30000

Memory Decay

Memory importance decays over time using an Ebbinghaus-inspired model with per-type modifiers that ensure critical memories resist forgetting:

stability      = 1.0 + ln(1 + access_count)
type_modifier  = memory_type.decay_modifier()
effective_rate = (base_decay_rate × type_modifier) / stability
reinforcement  = min(access_count × 0.005, 0.08)

new_importance = max(
    importance × e^(-effective_rate × days) + reinforcement,
    memory_type.importance_floor()
)

Type-specific modifiers

Memory TypeDecay ModifierImportance FloorEffect
Correction0.500.35Decays very slowly, never drops below 0.35
Fact0.700.30Decays slowly, never drops below 0.30
Preference0.800.25Moderate decay
Tool Result1.000.10Normal decay, ephemeral
Context1.000.10Normal decay, ephemeral

This mirrors the human brain: corrections and facts are “episodic memories” that the brain retains much longer than transient working-memory items.

Other factors:

  • base_decay_rate: Derived from decay_factor (default: 0.95)
  • stability: Grows logarithmically with each access — first access gives stability ≈ 1.0, ~3 accesses double it, with diminishing returns
  • reinforcement: A small bonus (up to 0.08) that prevents heavily-used memories from vanishing entirely
  • days_since_access: Time elapsed since the memory was last retrieved

Memories below min_importance are automatically cleaned up.

Security

Content Encryption

All memory content and summaries are encrypted using:

  • Algorithm: XChaCha20-Poly1305 (AEAD)
  • Key Size: 256 bits
  • Nonce Size: 192 bits (unique per encryption)

The encryption key is stored at encryption_key_path and auto-generated if missing.

⚠️ Important: Backup your encryption key! Without it, encrypted memories cannot be recovered.

Embedding Vectors

Embedding vectors are stored unencrypted to enable similarity search. They reveal:

  • Semantic similarity between memories
  • General topic clustering

They do NOT reveal:

  • Actual content
  • Specific details

Embedding Models

The system supports various Ollama embedding models:

ModelDimensionsUse Case
nomic-embed-text384Default, balanced
mxbai-embed-large1024Higher accuracy
bge-m31024Multilingual

To use a different model:

[memory.embedding]
model = "mxbai-embed-large"
dimension = 1024

Database Schema

Memories are stored in PostgreSQL with pgvector for similarity search:

-- Main memories table
CREATE TABLE memories (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    conversation_id UUID,
    content TEXT NOT NULL,      -- Encrypted
    summary TEXT NOT NULL,       -- Encrypted
    importance DOUBLE PRECISION NOT NULL,
    memory_type TEXT NOT NULL,
    tags JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    accessed_at TIMESTAMPTZ NOT NULL,
    access_count INTEGER DEFAULT 0,
    embedding vector(384)        -- pgvector column for similarity search
);

-- IVFFlat index for fast cosine similarity search
CREATE INDEX idx_memories_embedding ON memories
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Full-text search index
CREATE INDEX idx_memories_fts ON memories
    USING gin (to_tsvector('english', content || ' ' || summary));

Manual Memory Management

You can manually store specific information:

// Store a fact
memory_service.store_fact(user_id, "User's birthday is March 15", 0.9).await?;

// Store a preference
memory_service.store_preference(user_id, "Prefers metric units", 0.8).await?;

// Store a correction
memory_service.store_correction(user_id, "Actually prefers tea, not coffee", 1.0).await?;

Maintenance

Applying Decay

Memory decay runs as an automatic background task (daily by default). The interval is controlled by the decay_factor configuration. You can also trigger it manually:

let decayed = memory_service.apply_decay().await?;
println!("Decayed {} memories", decayed.len());

Or via the REST API:

curl -X POST -H "Authorization: Bearer $API_KEY" \
  http://localhost:3000/v1/memories/decay

Cleaning Up Low-Importance Memories

let deleted = memory_service.cleanup_low_importance().await?;
println!("Deleted {} memories", deleted);

Statistics

let stats = memory_service.stats(&user_id).await?;
println!("Total: {}, With embeddings: {}, Avg importance: {:.2}",
    stats.total_count, stats.with_embeddings, stats.avg_importance);

Troubleshooting

Memories Not Being Retrieved

  1. Check that enable_rag = true
  2. Verify rag_threshold isn’t too high (try 0.3)
  3. Ensure embeddings are generated (check with_embeddings in stats)
  4. Confirm Ollama is running with the embedding model

High Memory Usage

  1. Lower rag_limit to reduce context size
  2. Run cleanup_low_importance() more frequently
  3. Increase min_importance threshold
  4. Reduce decay_factor for faster decay

Encryption Key Lost

If you lose the encryption key, encrypted memories cannot be recovered.

To start fresh:

  1. Delete memory_encryption.key
  2. Clear the memories and memory_embeddings tables
  3. A new key will be generated on next startup

Architecture

The memory system follows the ports-and-adapters pattern:

  • MemoryContextPort — the primary port interface used by ChatService to inject RAG context into prompts. Implementations receive a query string and return relevant memory snippets.
  • MemoryService — the core service that orchestrates embedding generation, semantic search, encryption, and storage. Requires three ports:
    • MemoryPort — persistence (PostgreSQL adapter)
    • EmbeddingPort — vector generation (Ollama adapter using nomic-embed-text)
    • EncryptionPort — content encryption (ChaChaEncryptionAdapter or NoOpEncryption)
// The MemoryContextPort trait signature
#[async_trait]
pub trait MemoryContextPort: Send + Sync {
    async fn retrieve_context(
        &self,
        user_id: &UserId,
        query: &str,
        limit: usize,
    ) -> Result<Vec<MemoryContext>, MemoryError>;
}

API Endpoints

See the API Reference for full REST API documentation covering:

  • GET /v1/memories — list memories
  • POST /v1/memories — create a memory
  • GET /v1/memories/search?q=... — semantic search
  • GET /v1/memories/stats — storage statistics
  • POST /v1/memories/decay — trigger decay
  • GET /v1/memories/{id} — get specific memory
  • DELETE /v1/memories/{id} — delete memory