Your Agent Is Drowning in Its Own Context
Your AI agent isn't stupid.
It's drowning in its own context.
You're 50 messages into debugging. It was brilliant at message 10. At message 50 it asks a question you answered 30 minutes ago. That's not "reasoning failure." It's context rot.
What Is Context Rot?
Context rot is the progressive loss of task signal as the context window fills with low-value residue:
- Error logs from failed attempts
- Retry loops that went nowhere
- Stale plans from abandoned approaches
- Half-finished branches of thought
Your fix is still "in there." It's just buried under 40,000 tokens of sludge.
The agent can technically see the information. But it can't prioritize it. The signal-to-noise ratio has collapsed.
The Paradox: Bigger Context Made It Worse
Here's the kicker: bigger context windows made the problem worse.
At 8K tokens you hit limits constantly. You naturally summarize. You compress. You prune. At 200K tokens you rarely hit limits. So you never summarize. You never compress. You accumulate sludge.
We optimized for capacity. We needed relevance.
The research community celebrated "infinite context" like it was the promised land. But information retrieval has known for decades: more isn't better. Relevant is better.
The Right Mental Model
The right mental model isn't "infinite memory."
It's CPU cache:
- Small
- Curated
- Refreshed constantly
- Brutally eviction-prone
Agents don't fail because they lack information. They fail because they can't prioritize information under load.
Your 200K context window is like having 64GB of L1 cache. It defeats the entire purpose of the hierarchy. The whole point of cache is that it's small and fast, not big and slow.
Information Bottleneck Theory
Naftali Tishby's Information Bottleneck principle (1999) offers a useful frame:
Keep the minimal representation that preserves what matters.
Translation for agents:
- Compress state — Distill progress to decisions and constraints
- Preserve constraints — Keep the invariants (rules, specs, goals)
- Discard residue — Delete failed branches, stale plans, error logs
Context engineering is compression engineering.
The goal isn't to keep everything. The goal is to keep just enough to make the next decision well.
What We Saw at NLLabs
We ran an experiment internally:
- Setup A: Model could retrieve from a large pool of skills documentation and past conversations (RAG-style)
- Setup B: "Right state" passively injected into working context via a tight AGENTS.md-style brief
Setup B won decisively.
Why? Because "can access X" is not "will access X." Retrieval adds another pile of potentially irrelevant text. If your agent is already drowning, more text makes it worse.
The missing primitive isn't a vector database. It's an eviction policy.
RAG vs. Compression
RAG is retrieval. Context rot is about selection + compression.
RAG says: "Let me go find more stuff." Compression says: "Let me throw away what doesn't matter."
When you're drowning, you don't need more water. You need to get rid of the water.
This doesn't mean RAG is useless — it means RAG solves a different problem. RAG helps when you lack information. Context rot happens when you have too much information.
What Actually Worked
Our solution is simple and unsexy:
1. State Checkpoints Every N Turns
Every 10-15 messages, checkpoint the current state:
- What's the goal?
- What constraints exist?
- What decisions have been made?
- What open questions remain?
This becomes the new "source of truth" for the next segment.
2. Tiered Memory
Not all context is equal:
- Hot (L1): Last 10 messages, raw and uncompressed
- Warm (L2): Messages 11-100, compressed to key decisions
- Cold (L3): Messages 101+, archived unless explicitly retrieved
3. Explicit Forgetting
Delete dead branches. If you tried something and it didn't work, remove it from context. Keep a one-line note ("tried X, didn't work") but delete the 5,000-token error log.
4. Passive Injection of Invariants
Rules, specs, and key definitions go into a static context file (like AGENTS.md). This ensures they're always present, not buried under conversation history.
Reliability Audit
If you're fighting "agent reliability," start with an audit:
- What in your context is invariant (rules, specs) vs incidental (logs, chatter)?
- Are you checkpointing state, or just appending?
- Do you have an eviction policy, or are you hoarding everything?
Then:
- Move invariants into passive context (always-present files)
- Checkpoint state every N turns
- Measure task success before and after
The Core Insight
Context windows got bigger. Your compression needs to get better.
The bottleneck isn't memory capacity. It's attention allocation under cognitive load.
Humans have known this forever — that's why we take notes, make outlines, and summarize meetings. We externalize and compress because our working memory is tiny.
AI agents have bigger working memory. But the fundamental problem is the same: relevance degrades as context grows.
Solve for compression, not capacity.
Related: Executive Function, The Daemon Model