Multi-Tier Action Architecture

Context windows became the bottleneck the moment agents gained autonomy. The first generation of research agents accumulated hundreds of thousands of tokens—web searches, tool results, intermediate planning—all crammed into working memory until performance degraded and costs spiraled. The naive approach treated context as unlimited storage rather than expensive, degrading workspace.

The solution emerged from production systems that actually shipped: multi-tier action architectures that separate immediate capability from total capability. Instead of loading every possible tool into context, successful agents use progressive disclosure patterns—thin tool layers that dynamically surface relevant capabilities while pushing complex operations to file systems and shell environments. This architectural shift enables both long-running autonomy and bounded context consumption.

The pattern crystallizes around two core insights: tools consume context exponentially (MCP servers average 35 tools and 35k tokens), and computers provide infinite action space through finite interfaces. Rather than binding N actions to N tools, production agents use minimal atomic tools—read, write, bash, glob—that can orchestrate arbitrary complexity through the file system. This inverts the traditional agent architecture from capability-first to context-first design.

A conductor managing an orchestra with a simple baton while complex sheet music flows to musicians' stands, representing thin tool interfaces coordinating complex operations

Progressive Disclosure Architecture

The MCP explosion created a new problem: tool obesity. GitHub servers ship 35 tools. Linear servers add dozens more. Loading comprehensive toolsets consumes context before the agent performs its first action. The solution requires treating tool access as a search problem rather than a loading problem.

Add to CLAUDE.md:

Use progressive disclosure for tools and context. Instead of loading all tools at once, use tool search to find relevant tools dynamically. Keep the initial tool set minimal (under 20 tools). Store additional tools, skills, and SOPs in the file system and retrieve them as needed.

The constraint forces architectural discipline. When you can only load twenty tools, you select for atomic, composable operations rather than specialized, single-purpose functions. This naturally leads to the computer-as-primitive pattern that characterizes production agent systems.

Tool search operates on indexed descriptions rather than loaded definitions. Instead of dumping tool manifests into context, agents query tool capabilities semantically and load only matching functions. This scales tool libraries without scaling context consumption.

Create /tool-search command:

Stage 1 indexes tool descriptions from MCP servers and file system. Stage 2 performs semantic search over tool descriptions for current task context. Stage 3 loads only matching tools into working context, maintaining pointers to unloaded capabilities.

File System Offloading Patterns

Context accumulation follows a predictable pattern: initial prompt, tool invocations, results, reflection, more tools, more results. By turn five, the working context contains 80% historical information and 20% active reasoning. File system offloading breaks this cycle by treating storage and memory as separate layers.

Create context-offloading skill:

The skill automatically saves tool results and intermediate outputs to files after N turns or when they exceed token threshold. First, it detects result accumulation patterns. Second, it generates structured summaries and file pointers. Third, it maintains retrieval commands for accessing full data when needed. Fourth, it cleans working context while preserving access paths.

This creates a two-tier memory system: hot context for active reasoning and cold storage for accumulated state. Agents maintain awareness of stored information through file manifests while working in clean contexts. Retrieval happens on-demand rather than preemptively.

The pattern extends beyond tool results to plans, progress tracking, and skill libraries. Long-running agents maintain scratchpad files that persist across context boundaries, enabling session continuity without session accumulation.

Add to CLAUDE.md:

For long sessions: 1) Save tool results to files after 3+ turns, provide summaries in context. 2) Maintain scratchpad files for plans and progress. 3) Use file system for skills, not just temporary storage. 4) Provide file paths for retrieval when full data needed.

A library with a reading room containing only essential reference books while vast stacks extend into connected storage areas, representing hot context and cold storage separation

Multi-Tier Action Space Implementation

The insight from successful agents—Manis, Claude Code, Amp—is counterintuitive: they use fewer tools to do more things. The resolution lies in multi-tier action architectures that separate tool calling from action execution. The tool layer provides atomic primitives; the computer layer provides unlimited composition.

Create /multi-tier-action command:

Stage 1 establishes thin tool calling layer with read, write, bash, and glob primitives. Stage 2 configures file system access for complex operations. Stage 3 sets up MCP server syncing to local files rather than loading as tools. Stage 4 provides templates for pushing composite actions to bash scripts rather than specialized tools.

This architecture enables the “dozen tools, infinite actions” pattern. Rather than creating specialized tools for every operation, agents compose atomic tools through shell scripting and file manipulation. The computer becomes the action space; tools become the interface.

The pattern scales because computational complexity lives in the execution layer, not the context layer. A bash script can orchestrate hundreds of operations using two tools: write (to create the script) and bash (to execute it). Context consumption remains constant while capability grows unbounded.

Context caching becomes crucial for multi-tier architectures. Repetitive tool loading destroys the efficiency gains from progressive disclosure. Cache hit rates become the primary performance metric, determining whether the architecture succeeds in production.

Implement context caching for agent sessions:

Configure context caching to reuse chat history across turns. Only add incremental tool results at each turn instead of resending full context. Monitor cache hit rate as key performance metric. Set up caching for invariant portions of context like tool definitions and system prompts.

Long-Running Agent Patterns

Context isolation enables autonomous task execution that exceeds single session boundaries. Rather than maintaining continuity through accumulated context, agents maintain continuity through persistent state files. This creates the “Ralph Wiggum loop” pattern: pick up from file state, execute in clean context, update file state, terminate.

Create ralph-wiggum-loop subagent:

Input: task specification and file system state. Process: reads plan.txt and progress.txt, selects next atomic task, executes in isolated context with minimal tools, updates progress files, commits results to file system. Output: updated state files and task completion status.

Each loop iteration operates in fresh context with access to accumulated work through file pointers. This prevents context rot while enabling indefinite task duration. Progress tracking becomes explicit rather than implicit in conversation history.

The pattern enables true agent autonomy without human context management. Agents can pause, resume, and continue across hours or days by reading state from files rather than conversation history. Task complexity scales independently of context complexity.

Memory evolution emerges from successful agent sessions. Rather than losing learned preferences when contexts expire, agents extract patterns and preferences for permanent storage in configuration files.

Create memory-evolver skill:

The skill reflects on completed agent sessions to extract persistent learnings. First, it analyzes session patterns for user preferences and successful strategies. Second, it identifies knowledge worth retaining versus session-specific information. Third, it updates Claude.md with extracted preferences and patterns. Fourth, it distinguishes between personal preferences (coding style) and universal learnings (debugging patterns).

This implements continual learning in token space rather than model space. Agents improve over time by accumulating preferences and patterns in their instruction set rather than through retraining.

A relay race where runners pass batons containing detailed notes, with each runner starting fresh but carrying forward accumulated knowledge

The synthesis reveals a fundamental tension in agent architecture: capability versus context. Traditional approaches optimize for capability—load every tool, maintain every piece of context, accumulate every result. Multi-tier architectures optimize for context—minimize loaded tools, offload results to storage, isolate execution contexts. The architectural choice determines whether agents scale to production complexity or collapse under their own computational weight. The pattern that emerged from shipping systems suggests context engineering isn’t optimization—it’s the fundamental constraint that shapes successful agent design.