Architecture for Dense Context

Context engineering emerged as the discipline of extracting maximum value from today’s models. The challenge isn’t just getting information into the model—it’s maintaining information density while minimizing context bloat. Every token competes for attention, every instruction dilutes focus, and every architectural decision compounds across the workflow.

The core tension: models improve weekly, but the fundamental constraints remain. Transformer attention remains quadratic, context windows have limits, and deterministic engineering provides leverage that scales independently of model capabilities. The boxes that agent harnesses need to handle—file systems, tool use, code execution, reasoning—won’t change dramatically in the next decade. This creates an unusual opportunity: building workflows that improve with new models while remaining architecturally sound.

Recent model releases reinforce a crucial distinction between fast orchestrators and slow thinkers. The common pattern uses the smartest available model as the top-level orchestrator, delegating to faster sub-models. But this inverts the efficiency curve. Slow models excel at complex reasoning, not at tool calling and file reading. Fast models handle routing and context gathering efficiently, then hand consolidated context to slow models for the heavy cognitive work.

The implementation patterns that follow address three core problems: preventing context dilution through instruction overflow, architecting appropriate model selection for different workload types, and building evaluation systems that prevent workflow regression as models and prompts evolve.

Context Density Optimization

Context window size creates a false sense of abundance. The constraint isn’t token limits—it’s attention dilution. Each additional token competes for the model’s focus, reducing the effective impact of essential information. The fix requires explicit density optimization at the instruction level.

Add to CLAUDE.md:

Maximize information per token density, not just token count. Focus on keeping context as small as possible while including all relevant information. Before each response, explicitly identify what can be compressed or removed without losing essential context.

This constraint changes how you construct prompts. Instead of adding more context when results seem incomplete, you compress existing context to make room for essential information. The pattern applies recursively: each section of your CLAUDE.md should maximize information density, each example should demonstrate multiple concepts, each instruction should serve multiple purposes.

Add to CLAUDE.md:

Keep total instructions under 150 items to prevent attention dilution. If adding new instruction, review existing ones and remove/consolidate redundant items. Use structured sections instead of all-caps for emphasis.

The 150-item limit isn’t arbitrary—research shows models can follow 100-200 instructions effectively, but performance degrades beyond that threshold. This forces prioritization. Each new instruction must justify displacing existing ones, creating natural pressure toward consolidation and clarity.

Model Architecture Patterns

The orchestrator-thinker pattern solves the speed-intelligence tradeoff by matching model capabilities to task requirements. Fast models handle routing, context gathering, and deterministic operations. Slow models focus on complex reasoning with pre-gathered context.

Implement fast orchestrator + slow thinker pattern:

Use fast model (Sonnet) as orchestrator to identify relevant files/context, then pass consolidated context to slow model (Opus/o1) for complex reasoning. Avoid having slow models do tool calling and file reading directly.

The key insight: slow models will methodically read every file if given tool access, burning time and tokens on mechanical operations. Fast models identify which files matter, consolidate them into dense context, then hand the reasoning task to the slow model. This preserves the slow model’s cognitive capacity for actual thinking rather than mechanical file operations.

This pattern requires a deterministic layer between fast and slow models. The orchestrator identifies relevant context, a script or tool consolidates it into a single prompt, and the slow model receives pre-processed context for analysis. The deemphasis on agentic loops in favor of dense context processing often produces better results than having the slow model manage its own tool calling.

Evaluation and Workflow Validation

Model improvements create a perpetual evaluation problem. Each new model or prompt modification might improve or degrade output quality, but without systematic comparison, changes become cargo cult rituals. Snapshot-based evaluation treats AI outputs like code changes, enabling regression testing for prompt modifications.

Create snapshot-based evaluation skill:

The skill stores AI outputs as snapshots, runs diffs when prompts/workflows change, allows accepting/rejecting changes like code reviews. Implements regression testing for AI workflows by comparing outputs before/after modifications.

This skill changes how you iterate on prompts. Instead of making changes and hoping for improvement, you capture current output as a baseline, modify the prompt, generate new output, and compare the diff. Good changes get accepted, regressions get rejected. The pattern works for any deterministic workflow component—prompt modifications, model switches, instruction changes.

The implementation requires output normalization to handle formatting differences while preserving semantic changes. Store outputs in structured format, diff the semantic content, and provide tooling for quick accept/reject decisions. This enables confident prompt iteration without losing known-good patterns.

Production Workflow Integration

Context engineering patterns must integrate with real development workflows, not exist as academic exercises. The proxy debugging pattern provides visibility into model interactions, enabling reverse-engineering of successful workflows and systematic debugging of failures.

Create /proxy-debug command:

Sets up logging proxy to capture all model requests/responses, stores them for analysis. Helps reverse-engineer successful workflows and debug model behavior by examining exact prompts and responses.

This command creates a debugging layer between your application and the model API. Every request and response gets logged with timestamp, token usage, and latency metrics. When a workflow produces exceptional results, you can examine the exact prompt sequence that created them. When workflows fail, you can trace the failure through the complete interaction history.

The stored logs enable pattern recognition across successful workflows. Common prompt structures, effective context organization, and optimal model selection decisions emerge from the data rather than intuition. This creates a feedback loop where successful patterns get identified and systematized.

Create CRM writer subagent:

Input: meeting notes/emails. Process: searches web for contact enrichment data, creates/updates markdown-based CRM entries with structured frontmatter. Output: structured contact records with automatic company/person linking and data validation.

This subagent demonstrates practical AI integration for routine data management. Takes unstructured meeting notes or emails, enriches contact information through web search, and maintains a markdown-based CRM system. The markdown format provides flexibility for schema evolution while remaining AI-readable and human-editable.

A person maintaining a recipe box, writing new favorites on cards while occasionally pulling out older recipes to reference, cross-referencing ingredients between cards, and periodically reorganizing the tabs as their cooking interests evolve

Model Consistency and Mastery

The proliferation of new models creates optimization anxiety—constantly switching to the latest release in search of marginal improvements. But model mastery requires sustained engagement with specific capabilities and limitations. Deep expertise with one model often produces better results than surface familiarity with many models.

Implement single model mastery approach:

Pick one model family and use it consistently for 1-2 months instead of switching between models frequently. Develop deep intuition for that model’s behavior, prompt preferences, and capabilities before evaluating alternatives.

This pattern contradicts the tendency to chase every new model release. Instead of constant switching, sustained focus builds intuitive understanding of how specific models respond to different prompt structures, context organization, and instruction formats. This expertise often provides more value than incremental capability improvements from newer models.

The mastery period reveals model-specific optimization opportunities that aren’t apparent from casual use. Preferred instruction formats, optimal context organization, effective few-shot examples, and failure modes become predictable through sustained interaction. This knowledge enables more sophisticated prompt engineering than generic best practices applied to any model.

Synthesis

The artifacts above separate into two complementary functions: context management and architectural optimization. The CLAUDE.md constraints and snapshot evaluation both focus on preventing degradation—instruction overflow, context bloat, and workflow regression. The orchestrator pattern and debugging tools enable architectural optimization by matching model capabilities to appropriate tasks and providing visibility into successful patterns.

This creates productive tension between constraint and capability. Density optimization forces ruthless prioritization of context elements, while architectural patterns enable sophisticated multi-model workflows. The evaluation system prevents regression while enabling confident iteration. The debugging proxy reveals why certain patterns succeed, enabling systematic improvement rather than random experimentation.

The deeper pattern: context engineering has evolved from prompt optimization to workflow architecture. The discipline now encompasses instruction management, model selection, evaluation systems, and production integration. As model capabilities plateau within current architectures, the engineering discipline around extracting maximum value from existing capabilities becomes the primary lever for improvement.