Vitoria Lima

notes 03/30: RLMs, recursive language models

somewhere in PRsomewhere in PRI wrote this during my coding vacation. I really like to surf first thing in the morning and then go all in in my many rabbit holes. If you like longboard surfing pls reach out with any reccs!, March 30 2026

I read Alex Zhang's piece on Recursive Language Modelsalexzhang13.github.io/blog/2025/rlm this week and I have not stopped thinking about it since. (Thread herex.com previewx.com/a1zhang/....) The intuition behind it is so clean it almost feels obvious in hindsight.

Before we dive into it, let's take a step back and introduce a few concepts plainly.

What is recursionrecursionA programming technique where a function calls itself to solve a problem by breaking it into smaller, identical sub-problems. Each call handles a smaller piece until reaching the base case.?

Recursion is one of those ideas in computer science that sounds intimidating until it clicks, and then it feels like the most natural thing in the world.

A recursive function is a function that calls itself. That is the entire definition. But the magic is in the structure: you have nested in nested in nested functions, and you can go down the rabbit hole and get out of it, as long as the innermost function (the base casebase caseThe simplest instance of a recursive problem that can be solved directly without further recursion. Without a base case, recursion would continue forever.) is solved. Then the results unwind back up, layer by layer, until you have your answer.

Russian matryoshka nesting dolls arranged from largest to smallest
Matryoshka dolls: the original recursive data structure.

Why Russian dolls? Because recursion is nothing more or less than Russian dolls. Open the largest doll and you find a smaller version of itself inside. Open that one and there is another. Keep going until you reach the smallest doll, the base case, which does not open. Then you close them back up, smallest first, each one nesting inside the next. That is exactly how a recursive function works: each call opens a new layer, the heart function resolves, and the return values nest back together on the way out.

click to open ↓

R0 - click to recurse
recursion: factorial(4)
call stack
the recursive formula
factorial(n) = n × factorial(n − 1)
factorial(0) = 1   ← base case

Russian dolls, but make it AI

R1 - Matryoshka

What recursive language modelsrecursive language model (RLM)A language model that can spawn recursive sub-calls of itself, each operating on a subset of the context. The root model coordinates while child calls do the heavy lifting on smaller chunks. Not a new architecture, an inference strategy. do is exactly this, but with an LLM as the function. The root model receives your query, but instead of trying to swallow the entire context at once, it writes a little program to partition the work and spawns smaller copies of itself to handle subsets. Each copy can spawn further copies. The innermost call does the actual work on a manageable chunk, returns its result, and the whole thing unwinds back to the surface. Russian dolls.


The illusion of infinite context

Here is the part that I find genuinely fascinating. The root model never sees the full context. It only ever touches its own small window: the query, some code it wrote to decompose the data, and the summaries returned by its children. The heavy lifting happens deeper in the stack, out of sight, out of the context windowcontext windowThe maximum number of tokens a language model can process in a single call. Current limits range from 8K to 2M tokens depending on the model. RLMs effectively bypass this limit through recursive decomposition..

This gives the illusion of an infinite context window.

Do we really need longer context windows, or is the illusion enough?

Maybe for enterprise use cases you really do need longer context windows. Processing massive document corpora, legal discovery, medical record review across years of patient history. But I assume that for most consumers, "self-healing" context (meaning, self-summarizing memory) might be enough. A model that summarizes and compresses its own memory recursively. The illusion of having read everything, distilled into what matters, might be more useful than actually having read everything.

rlm: recursive decomposition
128K
1 / 4
root context max depth corpus
why so low?
The root model never directly ingests the raw document. It writes code to partition and delegate, then receives only summaries or extracted results from its children. Each depth level adds ~1-2K tokens to the root window: the query, the partition logic, and the returned answers. The actual document lives in the recursive calls, not in the root.

I like how the authors also asked themselves: do we really need to solve context rot? Do we really need longer context windows? Maybe for enterprise use cases, but for consumers, is not self-healing memory (which is self-summarization) enough?

What can this unleash product wise?

See more about this in my other notes, from paper to productsfrom-paper-to-products.

If you are curious about this kind of conversations, I would love to get you a coffee and a croissant.

vitoria@vitorialima.com