The Reasoning Sink Hypothesis

The Guessing Game

There is one task most models cannot do without some scaffolding or tools: play a guessing game.

Example guessing-game prompt where the model needs to secretly commit to an answer before the user starts guessing.

To play a guessing game, the model needs to actually lock in an answer first without revealing it. In a normal chat transcript, the model can claim to have locked in an answer, but it is free to change its mind later.

Most chat templates make this game impossible to play:

If the model does not lock in an answer, the model cannot ground its later responses.
If the model locks in the answer in its visible output, it breaks the game.
If the model locks in the answer in its reasoning block, even if it is hidden from the user, it can get removed after the next user turn.

There are a few ways to make this possible:

Keep all reasoning blocks, even after new user turns. For example, every Claude model released since Opus 4.5 does this, and many other labs are moving to this approach too: Gemini 3.5 Flash, DeepSeek V4, Kimi K2.6+, MiniMax M3, and GLM 4.7+.
Make “responding to the user” a tool. Since reasoning content stays intact between tool calls, the model can commit the answer in its reasoning block and continue the conversation.¹ This is analogous to yield vs. return in Python. yield suspends the function without ending it, preserving the generator’s local state, similar to preserved reasoning blocks. A normal return, by contrast, finalizes the function and discards that execution frame.
Give the model a tool to commit the answer without revealing it to the user. Similar to how some smart models repurpose the Python tool for thinking, a model can use no-op tool calls as a reasoning block that does not get cleared.

Preserved Thinking

Preserving all thinking blocks allows the model to play this game, and it solves other issues too:

Clearing reasoning blocks would invalidate the subsequent KV cache. Preserved thinking means the KV cache is preserved too.
The model can see its past reasoning and avoid redoing previous reasoning work.

But how do models work without preserved thinking?

The Reasoning Sink Hypothesis: When models are optimized in multi-turn environments, they can learn to encode and preserve their reasoning content in other channels.

This is related to encoded reasoning or Chain-of-Thought steganography, where the model encodes information in its chain-of-thought. The reasoning sink hypothesis refers to the behavior where a model learns to compress information from its chain-of-thought and encode it in other outputs.

Consider this:

A reasoning LLM is trained in a simulated, multi-turn environment, e.g. tau-bench, and reasoning blocks are cleared after each user message. At each step, the model needs to optimize the reward of the current action, such as tool call blocks and text blocks, plus future rewards.

Each action can have its own reward: Is the tool call correct? Is the response to the user in the right language? Is it communicated well? There is also a final reward: Is the session successful?

If the reasoning blocks are cleared after each new user turn, this creates an optimization pressure to preserve the model’s thinking so future turns can reuse it.

The optimization pressure can come from many sources.

Length Penalty

If the future rewards include a length penalty, the model is incentivized to summarize and preserve its reasoning content so it can reduce the amount of work needed in future turns. But the only way to preserve the reasoning content is to put it in channels that do not get cleared.

Consistency

Sampling/reasoning is not deterministic. Having a sink to store reasoning allows the model to maintain consistency across turns. Any task or reward that values consistency can therefore incentivize a reasoning sink.

Carrying Context Forward

Models can suffer from context rot and struggle to follow complex instructions or efficiently utilize all the information in the context. It can therefore be advantageous for a model to reason and compress the important context, then carry it forward, increasing the success rate of future turns.

What Reasoning Sink Looks Like

Like Chain-of-Thought steganography, reasoning sink may show up as things not perceptible by humans.

We can test this by following Anthropic’s paraphrase approach to see if paraphrasing each model output in the next turn results in a lower score. If paraphrasing the model’s output removes some hidden content, the model’s performance may decrease, and the thinking duration on later turns might increase compared to the baseline.

However, there are other confounding factors that might contribute to the same outcome: paraphrasing the context moves the model slightly out of distribution, which can lead to worse results too. So this test wouldn’t be very useful.

But I think reasoning sink is likely to be more visible. The optimization pressure does not require the model to always hide information that might be useful for future turns. As long as it does not negatively affect the reward of individual actions, the model is free to utilize output channels as reasoning sinks.

Code and Tools

Models can use tools as a reasoning sink.

Chart comparing code line counts across models, with GPT-5.5 producing more code than nearby Claude models. Source: https://x.com/htihle/status/2066454739752047051.

GPT-5.5 does not have preserved thinking. Coincidentally, it also appears to produce more code than Claude models per task, and trends upward with newer models, which have likely gone through more multi-turn training and learned to use tools as a reasoning sink.
On the same chart, we can see Claude models produce cleaner outputs, and the line count peaked at Opus 4.1, which is the last Opus model that did not use preserved thinking.

Writing Style

Models can also use the final output as a reasoning sink.

Like Goblin in GPT models, writing quirks can be very visible without hurting the reward. One thing I noticed in newer GPT models is that they like to use the phrase “I would” and “I’d”. Here is what Codex summarized by looking into my local logs:

exact I would is very often negative/cautionary: I would not use, I would not start, I would not make, I would avoid…. It reads like a senior-engineer guardrail: “don’t take this path,” “avoid this framing,” “don’t overbuild.”

It also appears in explicit recommendation sections: What I would build next, What I would recommend, I would start with….

I’d is much more action/design oriented: I’d make…, I’d do…, I’d model it as…, I’d keep…, I’d use…, I’d prefer….

It often appears in final-answer advice like:

“Here’s how I’d structure it…”

“I’d start with…”

“I’d frame it as…”

“The pattern I’d recommend is…”

Timeline chart from local Codex logs showing the share of final responses using I would or I'd rising in newer GPT models. Source: my local Codex logs

This can be explained by the reasoning sink hypothesis. Saying “I would …” in a first-person tone helps the model condense and preserve its reasoning in the final response.

Solution

Preserved thinking solves the reasoning sink naturally. If nothing is cleared, there is nothing to sink.

But clearing old reasoning is still useful. It frees up context, and it reduces latency. Anthropic has a Context Editing feature to remove old thinking blocks, and Claude Code clears old thinking to reduce latency when resuming a cold session. I think clearing reasoning blocks is still a good design, in the same way subagents or context-forking can help with context utilization.

We can have both.

The answer is to give the model an explicit sink. This mirrors the attention sink: an attention sink gives the model a dedicated token to absorb excess attention, and a reasoning sink gives the model a dedicated channel to keep reasoning that should survive clearing.

It should be part of the chat template, a separate response channel that holds a reasoning block that does not get cleared:

<think>
...cleared after new user turn...
</think>
<think_sink>
...preserved across turns...
</think_sink>
<output>
...user-visible response...
</output>

Preamble

OpenAI uses the harmony format for its reasoning models. It includes the following channels:

analysis: private reasoning and tool planning (interleaved reasoning).
commentary: user-visible progress updates (preambles) and tool calls.
final: the completed user-facing answer.

Preamble, which models use to let the user know what they are going to do before tool calls, can be thought of as one form of reasoning sink. It is not cleared, and it encapsulates the hidden reasoning behind the tool calls it makes.

Summary Channel

OpenAI Codex adds a summary channel in the harmony format, where the compacted summary of a previous thread lives.

A reasoning sink channel can extend this naturally. The summary channel compacts a previous, cleared thread; a reasoning sink can be a new channel that compacts the reasoning block that is about to be cleared.

Micro-compaction

A dedicated reasoning sink is a form of context compaction that the model can learn end-to-end. It is also similar to work like Context-1 that lets the model compact tool results.

Why Adopt Reasoning Sink

For models that clear past reasoning blocks, a reasoning sink gives the model a way to maintain its context separately from the user-visible channels, instead of leaking it into code or writing style.

For models that keep all reasoning traces, enabling a reasoning sink essentially gives the model a way to clear its own context, reducing noise and supporting longer-horizon tasks.

Footnotes

AskUserQuestion tool in Claude Code: https://x.com/ChangJonathanC/status/2030521846106583291.↩︎