When you ask a chatbot about a 200-page document, it does much more work than you'd expect — and the work grows brutally with length. We're building an attention mechanism that reads smartly instead of reading everything, and we've made three breakthroughs along the way.
Every time an AI reads, it compares each word to every other word it's seen. With 100 words that's 10,000 comparisons. Easy. But double the length, and the work quadruples. Triple the length, nine times the work.
For a short message, this is invisible. For a long document, it becomes the bottleneck — slower responses, higher costs, and a hard ceiling on how much text the AI can actually pay attention to at once.
A 100-word essay? Trivial. A 10,000-word short story? One hundred million word-pairs to compare. A whole novel? Out of reach for current models without major workarounds.
You don't re-read every page every time you finish a sentence. You glance back at the last paragraph for context, and if you forget who someone is, you flip back to where they were introduced. That's the entire idea.
Our solution copies that pattern. The AI reads nearby all the time (cheap), and only flips back to distant pages when something triggers a memory (expensive, but rare). Done well, this turns a quadratic cost into something close to linear.
Glance at the last few pages, every step. Free. Fast. Catches recent context.
When something nearby triggers a distant memory, jump back to the right page. Costs more — only do it when needed.
It sounds simple. But making it actually work means answering three hard questions:
The rest of this article is how we answered each one.
When the AI is deciding which distant page to flip to, "relevant" can mean very different things. Sometimes the relevant info is the overall vibe of a page. Sometimes it's one weird word that stands out. Sometimes it's a specific phrase that lines up with what's being asked.
No single way of measuring "relevance" works in all three cases. Our first attempt used just the average — like skimming a one-line summary of every page. It missed pages where the answer was a single distinctive word buried in mostly-irrelevant text.
Instead of one scoring method, we use three — each good at spotting a different kind of relevance. Then we combine their picks.
Good when the relevant info is spread across many sentences.
Good when the answer is one rare or distinctive token.
Good when the answer is on-topic but otherwise unremarkable.
Each librarian picks their own pages independently. The AI flips to all of them, removes duplicates, and reads. The third librarian is the most expensive, so we only call them in when the first two disagree — saving work on most queries.
Once the AI picks the right page, it has to actually read it. Our first instinct was to keep using the page summary — it's already there, why waste time reading the whole thing?
Big mistake. A summary tells you "Chapter 3 is about a dog." It doesn't help if you need the dog's actual name. The summary is fine for finding the right page. It's terrible for remembering what was on it.
Can't reach distant pages at all. The detail is just gone.
Knows it's "the chapter about the dog," but the actual details are blurred.
Once we know the right page, look at the actual words.
So our rule is: summaries help us pick. Actual pages give us the answer. Two different jobs, two different tools.
Once the basic system worked on small puzzles, we tested it inside a real AI model. The result was a shock — even when our system flipped to the right page, the model only got the answer right about 30% of the time. Barely better than guessing.
We ran a test: what if we cheated, and force-fed the right page to the model directly? Accuracy jumped to 88%. So the page-finding wasn't the problem. Something between "having the right page" and "writing the right answer" was broken.
Imagine the question is "What's Alice's favorite color?" and the AI flips to a page that says "... Alice → blue ...". The AI's attention naturally lights up on the word "Alice" — that's what we asked about. So the AI reads the word "Alice" and writes back "Alice" as its answer.
It found the right page. It found the right word. It just read the wrong word on that page. The clue was the next word over: "blue."
When the AI finds a matching word on a flipped-to page, instead of reading that word, read the next word over. Match "Alice" — but pick up "blue."
From 32% to 82%. That's not a tweak — that's a different model. And the breakthrough wasn't a fancier algorithm or more compute. It was realizing the AI was reading the wrong word.
Real text doesn't always put the answer right next to the key. Sometimes there's punctuation. Sometimes the answer is two words over. So we taught the AI to learn the spacing by example — +1, +2, or wherever — instead of hard-coding "next word."
"Read what you matched" — the original failure mode.
"Alice → blue" — adjacent.
"Alice, blue" — comma in between.
When we trained on text where the spacing varied, the AI learned which to use, per situation. Accuracy held up around 82% even with shifting layouts.
The first breakthrough required a lot of hand-holding: telling the AI which page was right, where the answer sat, exactly which word to read. That works in a research lab. It doesn't scale to real text from the internet, where there are no labels.
So we tried something interesting. We trained one model with all the labels — call it the tutor. Then we used the tutor to teach a second model — the relay — without showing the relay the labels directly. The relay just watched the tutor's choices. Then the relay teaches a student. Labels stop at the tutor.
Pure chaining had a problem: each generation got slightly worse — a kind of telephone game effect. Lessons drifted.
The fix: when training the relay, mostly imitate the tutor — but every so often, sneak in a real human-labeled example to keep the relay grounded. We call it an anchor. Like checking your map every now and then to make sure you haven't drifted off course.
The anchored version actually beat the original tutor. Sometimes a student watching a careful teacher learns better than the teacher learned from the textbook directly.
Once the model worked, the next question was: can it do less work without getting much worse? Some questions are easy and don't need much flipping back. Others are hard. We wanted to give the AI the option.
So we trained it to handle different effort levels. Full power: ask all the librarians. Half power: ask half of them. Low power: ask one. The surprise was: training it to handle all three levels made every level a little better.
"Half power" gets the biggest jump from this kind of training: 87% accuracy with half the work. That's the new sweet spot.
Why does training for harder conditions improve performance under easy conditions too? Probably the same reason a student who studied for a hard test does fine on the easy version. Practicing the harder case forces you to actually learn the underlying skill, not memorize a shortcut.
Here's what our system does, every time it has to read a long document. The first three pieces are the foundation. The last three (★) are our breakthroughs.
A lot has moved over the last few months. Some questions are partially answered. Others are still open. Here's where we are.
So far our test cases are name-and-color-style pairs — the structure is too clean. Real documents have rambling, ambiguity, multiple pieces of relevant information. The next milestone is showing the same approach works on actual long documents and chat transcripts.
Our chain of teachers reduced how many human labels we need, but didn't eliminate them. Can we get the chain started with even fewer? Or train the first tutor from a tiny example set and let the rest follow?
The clever tricks we use in the lab are tricky to translate to the kind of code that runs on the chips inside ChatGPT-style models. We have a sketch of what such code should look like, but we haven't built it yet. This is the biggest open engineering question.
Our speed dial works when we set it. Ideally the AI itself would know when a question needs more effort and when it doesn't — turning the dial up or down on its own. We've tried a few versions of this. None work well yet.