AI Memory System

PiSovereign includes a persistent AI memory system that enables your assistant to remember facts, preferences, and past interactions. This creates a more personalized and contextually aware experience.

Overview

The memory system provides:

Persistent Storage: All interactions can be stored in PostgreSQL with encryption at rest
Semantic Search (RAG): Retrieve relevant memories based on meaning, not just keywords
Automatic Learning: The AI learns from conversations automatically
Memory Decay: Less important or rarely accessed memories fade over time
Deduplication: Similar memories are merged to prevent redundancy
Content Encryption: Sensitive data is encrypted at rest using XChaCha20-Poly1305

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   User Query    │────▶│   RAG Retrieval  │────▶│  Context + Query│
│  "What's my     │     │  (Top 5 similar) │     │  sent to LLM    │
│   favorite..."  │     └──────────────────┘     └─────────────────┘
└─────────────────┘              │                        │
                                 │                        ▼
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│ Stored Memory   │◀────│  Learning Phase  │◀────│   AI Response   │
│ (Encrypted)     │     │ (Q&A + Metadata) │     │                 │
└─────────────────┘     └──────────────────┘     └─────────────────┘

1. RAG Context Retrieval

When you ask a question:

The query is converted to an embedding vector using nomic-embed-text
Similar memories are found using cosine similarity search
The top N most relevant memories are sorted by type priority (corrections and facts first) and injected into the prompt with an instructive preamble that explicitly tells the LLM to treat them as known facts
Full memory content is used (not truncated summaries), with a 2 000-character budget to stay within the token window
The AI generates a response with full context

2. Automatic Learning

After each response (including streamed responses):

The Q&A pair is evaluated for importance using lightweight heuristics (no LLM call):
- AI naming cues (“nenn dich”, “your name is”, “du heißt”) → +0.40
- Identity cues (“my name is”, “I live in”, “ich heiße”) → +0.35
- Correction cues (“that’s wrong”, “please remember”, “eigentlich”) → +0.30
- Preference cues (“I prefer”, “I like”, “ich mag”) → +0.25
- Word count adjustments (longer = more valuable)
- Final score clamped to [0.2, 0.9]
The memory type is automatically classified (priority order):
- AI naming signals → Fact
- Correction signals → Correction
- Preference signals → Preference
- Identity/fact signals → Fact
- Default → Context
Embeddings are generated for semantic search
If a similar memory exists (>85% similarity), they’re merged (on plaintext, before encryption)
Content is encrypted before storage

Note: Both the HTTP chat endpoint (ChatService) and the messenger path (MemoryEnhancedChat) use the same shared heuristic module (importance.rs) for consistent importance estimation and type classification.

3. Memory Types

Type	Purpose	Example
Fact	General knowledge	“Paris is the capital of France”
Preference	User preferences	“User prefers dark mode”
Correction	Feedback/corrections	“Actually, the meeting is Tuesday not Monday”
ToolResult	API/tool outputs	“Weather API returned: 22°C, sunny”
Context	Conversation context	“Q: What time is it? A: 3:00 PM”
DreamInsight	Insight generated during Dream Mode	“User discusses cooking topics on weekends”
Hypothesis	Testable prediction from Dream Mode	“User prefers concise responses in the evening”
Archived	Memory archived during Dream Mode NREM consolidation	Previously a Fact/Context now superseded by a merged memory

4. Relevance Scoring

When memories are retrieved for RAG context, they are ranked using a combined relevance score that balances three factors:

relevance_score = similarity × 0.50  +  importance × 0.20  +  freshness × 0.30

Where:

similarity (50%): Cosine similarity between query and memory embeddings (0.0–1.0)
importance (20%): Current importance after decay (0.0–1.0), with per-type floors
freshness (30%): Exponential decay based on time since last access: e^(-0.01 × hours). Memories from seconds ago score ~1.0, from one day ago ~0.79, from one week ago ~0.19.

This ensures that memories from the current conversation session (stored moments ago) dominate when relevant, while long-term knowledge still contributes via the similarity and importance terms.

After scoring, memories are sorted by type priority before injection:

Corrections (highest priority — user explicitly corrected the AI)
Facts (identity, names, important knowledge)
Preferences
Context
Tool Results

Configuration

Add to your config.toml:

[memory]
# Enable memory storage
enabled = true

# Enable RAG context retrieval
enable_rag = true

# Enable automatic learning from interactions
enable_learning = true

# Number of memories to retrieve for RAG context
rag_limit = 5

# Minimum similarity threshold for RAG retrieval (0.0-1.0)
rag_threshold = 0.5

# Similarity threshold for memory deduplication (0.0-1.0)
merge_threshold = 0.85

# Minimum importance score to keep memories
min_importance = 0.1

# Decay factor for memory importance over time
decay_factor = 0.95

# Enable content encryption
enable_encryption = true

# Path to encryption key file (generated if not exists)
encryption_key_path = "memory_encryption.key"

[memory.embedding]
# Embedding model name
model = "nomic-embed-text"

# Embedding dimension
dimension = 384

# Request timeout in milliseconds
timeout_ms = 30000

Memory Decay

Memory importance decays over time using an Ebbinghaus-inspired model with per-type modifiers that ensure critical memories resist forgetting:

stability      = 1.0 + ln(1 + access_count)
type_modifier  = memory_type.decay_modifier()
effective_rate = (base_decay_rate × type_modifier) / stability
reinforcement  = min(access_count × 0.005, 0.08)

new_importance = max(
    importance × e^(-effective_rate × days) + reinforcement,
    memory_type.importance_floor()
)

Type-specific modifiers

Memory Type	Decay Modifier	Importance Floor	Effect
Correction	0.50	0.35	Decays very slowly, never drops below 0.35
Fact	0.70	0.30	Decays slowly, never drops below 0.30
Preference	0.80	0.25	Moderate decay
Tool Result	1.00	0.10	Normal decay, ephemeral
Context	1.00	0.10	Normal decay, ephemeral

This mirrors the human brain: corrections and facts are “episodic memories” that the brain retains much longer than transient working-memory items.

Dream Mode Integration

Dream Mode extends the memory decay system with nightly consolidation:

NREM phases apply an additional decay_rate (default: 0.95) to unreinforced memories
Memories with importance below the type floor are archived (type changed to Archived, original_memory_type preserved)
Similar memories above the merge_threshold (default: 0.85 cosine similarity) are consolidated into a single memory
Archived memories receive a RAG penalty — they are deprioritized during retrieval but remain searchable
The archived_at and archived_by_session fields track provenance for each archived memory

Other factors:

base_decay_rate: Derived from decay_factor (default: 0.95)
stability: Grows logarithmically with each access — first access gives stability ≈ 1.0, ~3 accesses double it, with diminishing returns
reinforcement: A small bonus (up to 0.08) that prevents heavily-used memories from vanishing entirely
days_since_access: Time elapsed since the memory was last retrieved

Memories below min_importance are automatically cleaned up.

Security

Content Encryption

All memory content and summaries are encrypted using:

Algorithm: XChaCha20-Poly1305 (AEAD)
Key Size: 256 bits
Nonce Size: 192 bits (unique per encryption)

The encryption key is stored at encryption_key_path and auto-generated if missing.

⚠️ Important: Backup your encryption key! Without it, encrypted memories cannot be recovered.

Embedding Vectors

Embedding vectors are stored unencrypted to enable similarity search. They reveal:

Semantic similarity between memories
General topic clustering

They do NOT reveal:

Actual content
Specific details

Embedding Models

The system supports various Ollama embedding models:

Model	Dimensions	Use Case
`nomic-embed-text`	384	Default, balanced
`mxbai-embed-large`	1024	Higher accuracy
`bge-m3`	1024	Multilingual

To use a different model:

[memory.embedding]
model = "mxbai-embed-large"
dimension = 1024

Database Schema

Memories are stored in PostgreSQL with pgvector for similarity search:

-- Main memories table
CREATE TABLE memories (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    conversation_id UUID,
    content TEXT NOT NULL,      -- Encrypted
    summary TEXT NOT NULL,       -- Encrypted
    importance DOUBLE PRECISION NOT NULL,
    memory_type TEXT NOT NULL,
    tags JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    accessed_at TIMESTAMPTZ NOT NULL,
    access_count INTEGER DEFAULT 0,
    embedding vector(384)        -- pgvector column for similarity search
);

-- IVFFlat index for fast cosine similarity search
CREATE INDEX idx_memories_embedding ON memories
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Full-text search index
CREATE INDEX idx_memories_fts ON memories
    USING gin (to_tsvector('english', content || ' ' || summary));

Manual Memory Management

You can manually store specific information:

// Store a fact
memory_service.store_fact(user_id, "User's birthday is March 15", 0.9).await?;

// Store a preference
memory_service.store_preference(user_id, "Prefers metric units", 0.8).await?;

// Store a correction
memory_service.store_correction(user_id, "Actually prefers tea, not coffee", 1.0).await?;

Maintenance

Applying Decay

Memory decay runs as an automatic background task (daily by default). The interval is controlled by the decay_factor configuration. You can also trigger it manually:

let decayed = memory_service.apply_decay().await?;
println!("Decayed {} memories", decayed.len());

Or via the REST API:

curl -X POST -H "Authorization: Bearer $API_KEY" \
  http://localhost:3000/v1/memories/decay

Cleaning Up Low-Importance Memories

let deleted = memory_service.cleanup_low_importance().await?;
println!("Deleted {} memories", deleted);

Statistics

let stats = memory_service.stats(&user_id).await?;
println!("Total: {}, With embeddings: {}, Avg importance: {:.2}",
    stats.total_count, stats.with_embeddings, stats.avg_importance);

Troubleshooting

Memories Not Being Retrieved

Check that enable_rag = true
Verify rag_threshold isn’t too high (try 0.3)
Ensure embeddings are generated (check with_embeddings in stats)
Confirm Ollama is running with the embedding model

High Memory Usage

Lower rag_limit to reduce context size
Run cleanup_low_importance() more frequently
Increase min_importance threshold
Reduce decay_factor for faster decay

Encryption Key Lost

If you lose the encryption key, encrypted memories cannot be recovered.

To start fresh:

Delete memory_encryption.key
Clear the memories and memory_embeddings tables
A new key will be generated on next startup

Architecture

The memory system follows the ports-and-adapters pattern:

MemoryContextPort — the primary port interface used by ChatService to inject RAG context into prompts. Implementations receive a query string and return relevant memory snippets.
MemoryService — the core service that orchestrates embedding generation, semantic search, encryption, and storage. Requires three ports:
- MemoryPort — persistence (PostgreSQL adapter)
- EmbeddingPort — vector generation (Ollama adapter using nomic-embed-text)
- EncryptionPort — content encryption (ChaChaEncryptionAdapter or NoOpEncryption)

// The MemoryContextPort trait signature
#[async_trait]
pub trait MemoryContextPort: Send + Sync {
    async fn retrieve_context(
        &self,
        user_id: &UserId,
        query: &str,
        limit: usize,
    ) -> Result<Vec<MemoryContext>, MemoryError>;
}

API Endpoints

See the API Reference for full REST API documentation covering:

GET /v1/memories — list memories
POST /v1/memories — create a memory
GET /v1/memories/search?q=... — semantic search
GET /v1/memories/stats — storage statistics
POST /v1/memories/decay — trigger decay
GET /v1/memories/{id} — get specific memory
DELETE /v1/memories/{id} — delete memory

PiSovereign Documentation