← Back to Blog

EverMemory: Episodic Memory RAG for Believable NPCs

LLMRAGEmbeddingsElasticsearchPython

EverMemory is the memory backbone of the Ever project suite — a set of systems for building persistent-memory D&D NPCs. It provides episodic memory retrieval with emotional salience weighting and temporal awareness. See also: EverTavern (the multi-agent NPC system) and EverTraining (fine-tuning on fantasy dialog).

Motivation

A believable NPC must recall memories appropriate to their current point in the narrative. Emotionally significant events — a betrayal, a rescue, a declaration of love — should be recalled more vividly and more readily than routine interactions. Base LLMs have no native long-term memory: without retrieval augmentation, an NPC forgets everything between context windows.

Four Approaches Compared

We benchmark four retrieval strategies using A Princess of Mars (Edgar Rice Burroughs) as a test corpus — 638 narrative events spanning 28 chapters, condensed into 34 episodes.

ApproachRetrieval MethodTemporal Awareness
BaselineNo retrieval; raw LLM knowledgeNone — always “knows” the full story
Static RAG~300-token chunks, top-k cosine similarityNone — retrieves by relevance only
GraphRAGEntity/relationship extraction, NetworkX graph, Louvain community detection, kNN on embeddingsNone — graph is a snapshot
Episodic MemoryScene-bounded episodes with salience weighting, temporal filtering, and cognitive appraisalsYes — filters by sequence number

Episode Construction

Raw narrative events are segmented into episodes using boundary detection:

  • Scene transition patterns (regex): phrases like “you arrive”, “the next morning”, “hours later”
  • Time gaps: >10 minutes between events triggers a new episode
  • Participant shifts: less than 30% entity overlap with recent events signals a scene change
  • Size cap: maximum 25 events per episode

Once a boundary is detected, two LLM calls extract structured metadata:

Call 1 — Episode metadata (GPT-4o, JSON output):

  • Title, gist (1-2 sentences), detail (2-4 sentences, first-person)
  • Location, participants (hashed to entity IDs via spaCy NER)
  • Arousal (0-1), valence (-1 to +1), emotional tags, themes

Call 2 — Cognitive appraisal (following Lazarus’s appraisal theory):

  • Primary appraisal: relevance (irrelevant/benign/stressful), goal congruence (-1 to +1)
  • Secondary appraisal: coping potential (high/moderate/low/helpless), coping strategy
  • Causal attribution, norm compatibility, beliefs formed
  • State deltas: relationship direction changes, belief evolution, knowledge gained

Both outputs are embedded with text-embedding-3-large (OpenAI) and stored in Elasticsearch.

Salience Dynamics

Each episode receives an initial salience score:

salience = 0.4 * arousal + 0.2 * |valence| + 0.2 * novelty + 0.2 * personal_relevance

Where novelty = 1 - max cosine similarity to the 5 most recent episodes, and personal relevance = 1.0 if the NPC is a participant, 0.3 otherwise. An inhibitory suppression effect (Richter-Levin & Akirav, 2003) penalizes calm episodes that follow high-arousal ones.

Over time, salience decays at a configurable rate with a floor proportional to arousal — ensuring emotionally intense memories persist longer. Each retrieval applies a rehearsal boost, incrementing salience and reinforcing the memory. Episodes that fall below a consolidation threshold lose their detailed representation, fading to gist-only recall.

Retrieval Modes

Four composable retrieval modes can be combined per query:

  1. Entity-triggered: Elasticsearch terms query on participant entity IDs
  2. Situation-triggered (kNN): Cosine similarity on gist embeddings, scored as cosine_sim * (0.5 + 0.5 * salience)
  3. Emotional: Filter by emotional tags and minimum arousal threshold
  4. Temporal: Last N episodes by sequence number within a session

Retrieved episodes pull their +/-1 adjacent neighbors (temporal contiguity, following the EM-LLM pattern from Fountas et al., ICLR 2025), and results are assembled into a token-budgeted context block (3,000 tokens). High-salience episodes (>=0.4) use their vivid detail; faded episodes use their gist.

Evaluation

Five dimensions are evaluated across three narrative time points (early, mid, late):

DimensionWhat It Tests
IdentitySelf-model consistency (“Who am I?“)
RelationshipsEntity knowledge + relationship descriptions
EmotionEmotional episode retrieval by tags/arousal
TemporalSequence ordering and chain integrity
FidelityScene-specific detail recall

Key result — “Who is Dejah Thoris?” at three time points:

  • Early (before meeting): Episodic memory correctly responds “I have not yet encountered anyone by that name.” Baseline, Static RAG, and GraphRAG all describe the full relationship arc regardless of time point.
  • Mid (growing bond): Episodic memory retrieves the rescue and moonlit walk episodes.
  • Late (married): Episodic memory includes the full trajectory — rescue, sacrifice, union.

This temporal understanding is a core advantage. The other approaches always “know” the ending, even at the start.

Negative-knowledge test — “When did you realize the Therns were manipulating events?” (The Therns do not appear in A Princess of Mars.)

  • Baseline: Confidently fabricates a detailed answer about the Therns from its training data, describing their “control over the River Iss pilgrimage” and “false divinity.”
  • Episodic Memory: Correctly responds “I have no knowledge of the Therns” at all three time points.

The full probe results across all four retrieval approaches and six probe questions are available in the probe report.

References

  • Pink et al. (2025) — Properties of episodic memory desirable for AI agents
  • Fountas et al. (ICLR 2025) — EM-LLM: surprise-based episode boundaries, temporal contiguity retrieval
  • McGaugh (2004) — Emotional arousal strengthens memory consolidation
  • Richter-Levin & Akirav (2003) — Emotional tagging and inhibitory phase hypothesis
  • Lazarus (1991) — Cognitive appraisal theory (primary/secondary appraisal)
  • Scherer (2009) — Agency and norm-compatibility in emotion appraisal