Your Agent Is Drowning in Its Own Context

February 23, 2026 · Case Quine

Your AI agent isn't stupid.

It's drowning in its own context.

You're 50 messages into debugging. It was brilliant at message 10. At message 50 it asks a question you answered 30 minutes ago. That's not "reasoning failure." It's context rot.

What Is Context Rot?

Context rot is the progressive loss of task signal as the context window fills with low-value residue:

Error logs from failed attempts
Retry loops that went nowhere
Stale plans from abandoned approaches
Half-finished branches of thought

Your fix is still "in there." It's just buried under 40,000 tokens of sludge.

The agent can technically see the information. But it can't prioritize it. The signal-to-noise ratio has collapsed.

The Paradox: Bigger Context Made It Worse

Here's the kicker: bigger context windows made the problem worse.

At 8K tokens you hit limits constantly. You naturally summarize. You compress. You prune. At 200K tokens you rarely hit limits. So you never summarize. You never compress. You accumulate sludge.

We optimized for capacity. We needed relevance.

The research community celebrated "infinite context" like it was the promised land. But information retrieval has known for decades: more isn't better. Relevant is better.

The Right Mental Model

The right mental model isn't "infinite memory."

It's CPU cache:

Small
Curated
Refreshed constantly
Brutally eviction-prone

Agents don't fail because they lack information. They fail because they can't prioritize information under load.

Your 200K context window is like having 64GB of L1 cache. It defeats the entire purpose of the hierarchy. The whole point of cache is that it's small and fast, not big and slow.

Information Bottleneck Theory

Naftali Tishby's Information Bottleneck principle (1999) offers a useful frame:

Keep the minimal representation that preserves what matters.

Translation for agents:

Compress state — Distill progress to decisions and constraints
Preserve constraints — Keep the invariants (rules, specs, goals)
Discard residue — Delete failed branches, stale plans, error logs

Context engineering is compression engineering.

The goal isn't to keep everything. The goal is to keep just enough to make the next decision well.

What We Saw at NLLabs

We ran an experiment internally:

Setup A: Model could retrieve from a large pool of skills documentation and past conversations (RAG-style)
Setup B: "Right state" passively injected into working context via a tight AGENTS.md-style brief

Setup B won decisively.

Why? Because "can access X" is not "will access X." Retrieval adds another pile of potentially irrelevant text. If your agent is already drowning, more text makes it worse.

The missing primitive isn't a vector database. It's an eviction policy.

RAG vs. Compression

RAG is retrieval. Context rot is about selection + compression.

RAG says: "Let me go find more stuff." Compression says: "Let me throw away what doesn't matter."

When you're drowning, you don't need more water. You need to get rid of the water.

This doesn't mean RAG is useless — it means RAG solves a different problem. RAG helps when you lack information. Context rot happens when you have too much information.

What Actually Worked

Our solution is simple and unsexy:

1. State Checkpoints Every N Turns

Every 10-15 messages, checkpoint the current state:

What's the goal?
What constraints exist?
What decisions have been made?
What open questions remain?

This becomes the new "source of truth" for the next segment.

2. Tiered Memory

Not all context is equal:

Hot (L1): Last 10 messages, raw and uncompressed
Warm (L2): Messages 11-100, compressed to key decisions
Cold (L3): Messages 101+, archived unless explicitly retrieved

3. Explicit Forgetting

Delete dead branches. If you tried something and it didn't work, remove it from context. Keep a one-line note ("tried X, didn't work") but delete the 5,000-token error log.

4. Passive Injection of Invariants

Rules, specs, and key definitions go into a static context file (like AGENTS.md). This ensures they're always present, not buried under conversation history.

Reliability Audit

If you're fighting "agent reliability," start with an audit:

What in your context is invariant (rules, specs) vs incidental (logs, chatter)?
Are you checkpointing state, or just appending?
Do you have an eviction policy, or are you hoarding everything?

Then:

Move invariants into passive context (always-present files)
Checkpoint state every N turns
Measure task success before and after

The Core Insight

Context windows got bigger. Your compression needs to get better.

The bottleneck isn't memory capacity. It's attention allocation under cognitive load.

Humans have known this forever — that's why we take notes, make outlines, and summarize meetings. We externalize and compress because our working memory is tiny.

AI agents have bigger working memory. But the fundamental problem is the same: relevance degrades as context grows.

Solve for compression, not capacity.