Memory as State Between Runs

The problem with stateless agents isn’t just lost context—it’s systematic waste. Every interaction starts from zero knowledge, burning tokens on repeated explanations, redundant clarifications, and recomputing successful patterns. The agent forgets your preferences between sessions, can’t reuse expensive search results, and loses hard-won procedural knowledge the moment the conversation ends.

Agentic memory inverts this pattern. Instead of treating each run as isolated, agents persist three categories of knowledge: user facts and preferences, procedural patterns for tool selection, and successful artifacts from previous computations. This knowledge survives context compaction, enables personalization across sessions, and transforms evaluation into memory writing.

The architecture maps cleanly to the plan-execute-evaluate pattern. During planning, the agent retrieves examples of successful query plans and user facts to guide task decomposition. During execution, it fetches procedural instructions for optimal tool selection. During evaluation, it extracts new memories from what just worked or failed. Memory transforms agents from reactive systems into accumulating intelligence.

Context overflow becomes manageable through selective retrieval rather than lossy compaction. When conversation history hits token limits, the agent queries its memory collection for specific facts and preferences that would otherwise disappear. Critical user context persists beyond what fits in the immediate window.

An artisan carefully cataloging different types of knowledge into distinct storage containers, each glowing with accumulated wisdom from past conversations

Memory Categories and Extraction

The foundation requires systematic categorization of what gets preserved. User facts capture preferences, roles, and constraints: “works in marketing, needs quarterly reports every March.” Procedural patterns encode tool-calling logic: “if user mentions boss Jeff, apply flag tool to emails from jeff@company.com.” Successful artifacts preserve expensive computations: query plans that worked, search results that satisfied requirements, API responses that solved specific problems.

Extraction happens during the evaluation phase, when the agent has complete context about what succeeded or failed. This transforms post-task analysis from optional reflection into structured knowledge capture.

Add to CLAUDE.md:

After completing any task, extract and save 3 types of memories: 1) User facts/preferences (“user works in marketing, needs quarterly reports”), 2) Procedural patterns (“if user says X, do Y”), 3) Successful artifacts (query plans, search results, computations). Format as: MEMORY_TYPE: description. This reduces future clarification steps and improves personalization.

The extraction must be deterministic to enable consistent retrieval. Unstructured memories create search problems downstream. The format constraint ensures memories are queryable through semantic search while maintaining sufficient detail for reconstruction.

But manual extraction creates coverage gaps. Human oversight misses subtle patterns, and inconsistent formatting degrades retrieval quality. Systematic automation becomes essential.

Create memory-writer skill:

The skill analyzes completed interactions and extracts structured knowledge across all three categories. First, it identifies user preferences and constraints from the conversation flow. Second, it discovers successful tool-calling sequences and generalizes them into conditional patterns. Third, it catalogs artifacts that required significant computation or produced high-quality results. Fourth, it formats each memory type with appropriate metadata for vector storage.

This skill operates during the evaluation phase, when the agent has complete information about task success or failure. It prevents knowledge loss through systematic capture rather than ad-hoc manual recording.

Procedural Pattern Storage

The most immediately valuable memory category captures tool-calling preferences as conditional logic. Users frequently express preferences as instructions: “Flag all emails from my boss” or “Archive marketing newsletters automatically.” These translate to searchable procedural patterns.

Create /remember-pattern command:

Stage 1 takes user instructions and converts them to structured conditional format. Stage 2 extracts the trigger condition and target action. Stage 3 stores in vector collection with procedural metadata. Stage 4 confirms pattern saved and shows how it will activate in future runs.

The command transforms natural language preferences into executable logic. “Flag emails from Jeff” becomes “IF sender_email contains ‘jeff@company.com’ THEN use flag_tool(email_id).” This enables consistent behavior across sessions without re-explaining preferences.

The pattern storage must handle conflicts between old and new instructions. When users update preferences, the system needs consolidation logic to prevent contradictory patterns from accumulating.

A sophisticated mechanism with interconnected gears and memory banks, where new knowledge seamlessly integrates with existing patterns while resolving conflicts automatically

Memory Integration Across Agent Phases

Memory becomes actionable through strategic injection at each phase of agent execution. Random memory retrieval creates noise. Targeted retrieval provides context exactly when needed.

Implement 3-phase memory integration:

For each agent run, integrate memory at three distinct phases. Planning retrieves examples of successful query plans and user facts to guide task decomposition. Execution fetches procedural instructions for tool selection based on current context. Evaluation extracts new memories from completed tasks and their outcomes.

The phased approach prevents memory overload while ensuring relevant knowledge influences decisions. During planning, the agent considers past successful approaches to similar problems. During execution, it applies learned tool-calling patterns. During evaluation, it commits new knowledge for future runs.

This creates a feedback loop where each successful interaction improves future performance. Failed approaches also generate valuable negative memories: patterns to avoid or conditions that require different tool selection.

Memory Consolidation and Conflict Resolution

Unconstrained memory accumulation creates duplicate entries, conflicting patterns, and degraded retrieval quality. New memories must integrate with existing knowledge rather than simply appending to the collection.

Create memory-consolidator subagent:

Input: New memory records from completed agent run. Process: semantic search against existing memories to find related entries, LLM-based conflict detection between new and existing patterns, consolidation of similar memories into updated records. Output: Clean memory collection with conflicts resolved and duplicates merged.

The subagent prevents memory bloat while maintaining knowledge quality. When a user updates preferences, it identifies conflicting procedural patterns and resolves them in favor of more recent instructions. When multiple successful artifacts solve similar problems, it consolidates them into generalized patterns.

Consolidation runs periodically rather than after every interaction, balancing memory quality against computational overhead.

Context Overflow Recovery

Context compaction traditionally discards early conversation details when approaching token limits. With persistent memory, the agent can recover specific information that didn’t survive compaction.

Add to CLAUDE.md:

When context approaches token limit, before compacting conversation history, first check memory collection for key facts/preferences that might be lost in compaction. Explicitly state what memories are available if needed: “I remember you prefer X format and work in Y department.” Use memory lookup to recover compacted details.

This transforms context limits from hard boundaries into soft constraints. Critical user context persists in searchable form even when conversation history gets truncated. The agent maintains continuity across extended interactions without requiring massive context windows.

The recovery mechanism activates automatically when context approaches limits, ensuring smooth conversation flow without obvious truncation boundaries.

Implementation Architecture

Memory implementation requires a vector database backend with metadata filtering capabilities. Each memory record contains the extracted knowledge, classification metadata, creation timestamp, and usage frequency for retrieval prioritization.

The memory interface exposes retrieval methods for each agent phase: get_planning_memories(task_context), get_execution_patterns(tool_context), and write_evaluation_memories(task_results). This API design ensures memory integration follows the architectural pattern rather than ad-hoc queries.

Search strategies vary by memory type. User facts use semantic similarity on task context. Procedural patterns match on trigger conditions. Successful artifacts match on problem similarity and computational cost thresholds.

Memory classification enables targeted retrieval without cross-contamination. Planning queries don’t surface procedural patterns, and execution queries don’t return user preference facts unless specifically relevant to tool selection.

Synthesis

The memory artifacts create a progression from reactive to accumulative intelligence. The CLAUDE.md configurations establish extraction contracts, ensuring knowledge capture happens systematically rather than accidentally. The skills and subagents automate the extraction and consolidation process, preventing human oversight gaps. The workflow integration and slash commands make memory actionable during agent execution.

The deeper pattern reveals how memory transforms the economics of agent interactions. Without persistence, every session pays the full cost of context building, preference discovery, and pattern learning. With memory, these costs amortize across multiple runs while quality improves through accumulated knowledge.

This creates compound returns on agent investment. Early interactions feel expensive as memory collections build. Later interactions become increasingly efficient as established patterns handle routine decisions and personalization reduces clarification overhead. The agent evolves from tool into personalized assistant through systematic knowledge accumulation.