Context Density Architectures

The context engineering problem evolved from maximizing tokens to maximizing information density. Early approaches treated context windows as storage constraints—pack more in, get better results. This misunderstands the fundamental architecture. Context windows aren’t storage; they’re attention substrates. Dense information per token matters more than token count optimization.

Current multi-agent architectures embed a structural tension. Smart models make better decisions but suffer from tool-calling latency. Fast models handle orchestration effectively but lack reasoning depth for complex analysis. The default pattern—use the smartest model as orchestrator—creates bottlenecks when reasoning models spend cycles on discovery rather than analysis.

This architectural mismatch becomes more pronounced as model capabilities diverge. Reasoning models like o1 excel at deep thinking but struggle with rapid tool execution. Fast models like Sonnet handle API calls efficiently but miss nuanced decision points. The solution requires separating discovery from analysis rather than forcing one model to handle both responsibilities.

The emerging pattern: fast orchestrators with smart oracles. This inverts the traditional hierarchy. Fast models discover and filter. Slow models analyze pre-filtered context in batch operations. The architecture preserves reasoning quality while eliminating tool-calling latency from expensive models.

Context Density Protocol

Information density optimization requires explicit instruction about relevance filtering. Models default to comprehensive coverage rather than targeted precision. This wastes attention on peripheral details.

Add to CLAUDE.md:

Optimize for information density per token, not just token count. Before responding, identify the most relevant information and present it concisely. Remove redundant explanations unless explicitly requested for clarity.

The instruction shifts model behavior from exhaustive to targeted. Instead of explaining every concept, models focus on the specific information needed for the current task. This preserves attention capacity for complex reasoning rather than expending it on obvious explanations.

Fast Orchestrator Architecture

The core pattern separates discovery from analysis through model specialization. Fast models excel at tool calling and file system navigation. Slow models excel at understanding complex relationships within pre-filtered context.

Implement fast orchestrator with smart oracle pattern:

Use Sonnet as primary orchestrator for tool calls and file discovery. When complex reasoning is needed, delegate to Opus with pre-filtered context from Sonnet. Fast model determines which files are relevant, slow model analyzes them in batch without tool loops.

A jogger quickly scouting a hiking trail and marking interesting spots on their phone while a thoughtful writer back at the cabin studies the notes and photographs to craft a detailed travel story

This architecture prevents the latency problem where reasoning models call tools sequentially. Instead of Opus reading 50 files one by one, Sonnet identifies the relevant 3 files and Opus analyzes them in a single context window. The deterministic layer between discovery and analysis eliminates agentic loops for pure reasoning tasks.

The key insight: most “agent” tasks are actually structured as discovery followed by analysis. File system exploration, API investigation, and research gathering benefit from fast iteration. Deep understanding of relationships, architectural decisions, and complex debugging require sustained attention. Separating these phases optimizes both speed and quality.

Workflow Stage Management

Models tend to rush toward implementation without proper planning phases. This creates incomplete solutions that require extensive revision. Explicit stage management prevents this acceleration.

Add to CLAUDE.md:

For multi-step workflows, explicitly state which stage we’re in (research/planning/implementation) and confirm completion before proceeding to next stage. Ask for explicit approval to advance workflow stages rather than assuming continuation.

Stage boundaries force deliberate progression. Research phases gather requirements without jumping to solutions. Planning phases explore alternatives before committing to approaches. Implementation phases focus on execution without revisiting architectural decisions. Each stage has distinct success criteria and requires explicit approval to advance.

AI-Native Data Management

Traditional database architectures create friction for AI workflows. Schema migrations, query optimization, and relational constraints add complexity without corresponding benefits for AI-generated content. Markdown with frontmatter provides structured data that remains human-readable and version-controlled.

Create markdown-frontmatter CRM skill:

The skill manages contacts using markdown files with YAML frontmatter for structured data (company, role, tags) and markdown body for notes. Includes git sync, automatic data enrichment via web search, and deterministic schema validation without database migrations.

This approach optimizes for the AI development cycle. Changes to data structure require updating frontmatter templates rather than running database migrations. Git provides full change history and branching for experimental data structures. AI models can read and write markdown natively without serialization overhead.

The CRM skill demonstrates the pattern: structured metadata in YAML frontmatter, unstructured content in markdown body, version control for change tracking, and web APIs for enrichment. This scales from contact management to project tracking, knowledge bases, and configuration management.

Regression Testing for AI Workflows

Traditional unit testing assumes deterministic behavior. AI workflows produce different outputs on identical inputs. This breaks standard testing approaches and creates regression detection problems.

Create /snapshot-eval command:

Stage 1: Run AI workflow and capture complete output as versioned snapshot. Stage 2: On subsequent runs, generate diff between stored snapshot and new output. Stage 3: Present changes for human review and approval. Stage 4: Accept changes to update baseline or revert to previous snapshot.

Snapshot evaluation sidesteps the LLM-as-judge problem. Instead of asking models to evaluate their own outputs, human reviewers examine concrete differences between runs. This works for both unit-level operations (single prompt responses) and end-to-end workflows (multi-step agent processes).

The command enables safe prompt iteration. Changes to instructions, context, or model parameters produce visible diffs rather than silent degradation. Teams can track prompt evolution and revert problematic changes without losing development velocity.

A chef comparing two versions of the same recipe side by side in their kitchen notebook, tasting both dishes and carefully noting which changes improved the flavor and which made it worse

Instruction Count Management

Research indicates models follow 100-200 distinct instructions effectively. Beyond this threshold, instruction interference and attention dilution degrade performance. Many prompts accumulate instructions without pruning obsolete ones.

Add to CLAUDE.md:

This prompt contains [X] total instructions. Research shows models follow 100-200 instructions effectively. If you notice conflicting or redundant instructions, flag them for removal rather than applying instruction severity inflation (all caps, emphasis).

Instruction counting prevents context rot through explicit tracking. Models can identify when prompts exceed effective instruction limits and suggest consolidation. This maintains prompt quality over time rather than requiring periodic manual audits.

The meta-instruction creates self-monitoring behavior. Models recognize when they’re receiving conflicting directions and flag the problem rather than attempting to satisfy contradictory requirements. This prevents the common pattern of emphasizing instructions through formatting (capitalization, bold text) when the real problem is instruction quantity.

Memory Compaction Patterns

Continual learning requires factual memory without dangerous instruction modification. Most approaches either attempt to modify base prompts (risky) or store raw conversation history (inefficient). Memory compaction extracts learnings without touching core instructions.

Create memory compactor subagent:

Input: Daily AI interaction logs. Process: Extract factual learnings (user preferences, project constraints) and behavioral patterns into structured summaries. Create decaying resolution: today’s details, weekly summaries, monthly themes. Output: Context injection for subsequent sessions focusing on factual recall, not instruction modification.

The subagent implements basic episodic memory through structured extraction. Rather than storing complete conversation transcripts, it identifies recurring patterns, user preferences, and project-specific constraints. These facts can be injected into future sessions without modifying core prompts.

Decaying resolution prevents information overload. Recent interactions maintain full detail while older patterns compress into high-level themes. This balances context relevance with memory capacity.

Single Model Mastery

Model proliferation creates shallow competence across multiple systems rather than deep intuition with one system. Switching between models for marginal improvements prevents developing the pattern recognition that enables advanced prompting techniques.

Implement single model mastery approach:

Pick one model family (Claude/GPT/Gemini) and one agent harness for 30-day focused usage. Develop deep intuition for capabilities, failure modes, and prompting patterns. Track performance improvements over time rather than model-hopping for theoretical advantages.

Single model focus builds compound expertise. Understanding a model’s specific reasoning patterns, output formatting tendencies, and failure modes enables prompt optimization that outweighs theoretical model advantages. This approach treats model selection as a specialization decision rather than a per-task optimization.

A violinist practicing daily with the same beloved instrument, learning every subtle nuance of its tone and response until their music flows with effortless precision

The 30-day constraint forces depth over breadth. Instead of switching models when encountering difficulties, practitioners develop techniques to work within limitations. This builds problem-solving skills that transfer to new models rather than creating dependency on model-specific advantages.

Synthesis

The artifacts cluster around two architectural principles. Items 1-4 establish fast orchestration with smart delegation: context density optimization, orchestrator patterns, stage management, and snapshot evaluation all separate rapid discovery from deep analysis. Items 5-8 implement memory and mastery systems: instruction management, memory compaction, data architecture, and single-model focus all build compound capabilities over time.

The tension emerges between immediate optimization and long-term capability development. Fast orchestrators maximize current performance while memory systems and single-model mastery build expertise that compounds across projects. Neither approach alone suffices—you need rapid execution for daily productivity and sustained focus for developing advanced techniques.

The deeper pattern: context engineering evolved from token optimization to attention architecture. Early approaches treated models as storage systems. Current understanding recognizes them as attention substrates where information density and workflow design matter more than raw context capacity.