A founder's planning guide to continual learning — and the no-regret bets to make before it arrives.
Dwarkesh Patel published an essay on continual learning that's worth your time if you set technical strategy. Stripped of the research vocabulary, it asks one question every founder should be asking too: when does using an AI start making your copy of it permanently better? Right now the answer is never. The model you deploy on Monday is exactly as ignorant of your business on Friday. That single fact — and the date it stops being true — should shape your roadmap more than any benchmark score.
Here's how to plan around it.
The roadmap question is "when does deployment become training"
Most AI roadmaps are built on the wrong axis. They plot capability — this model scores 12 points higher on some eval, so we'll wait for it. The axis that actually changes your strategy is whether the model improves from being used.
Today it doesn't. A frozen-weights model amortises one enormous training run across billions of sessions, and your sessions contribute nothing back. Dwarkesh's framing is sharp: we have a genius grad student who has aced every classroom case study but has never been allowed to take an internship — and we keep handing it more case studies. Around 30–50% of a lab's compute goes to serving the model in deployment, and none of that compute currently makes the model smarter. The most valuable signal in the entire system — what real organisations actually use the model for, and where it actually fails — is generated in deployment and thrown away.
The labs are betting they can close that loop. When they do, the company that owns the richest deployment loop in its niche wins compounding that a competitor cannot buy off the shelf. So the roadmap question isn't "how smart will it get" — it's "what should I be doing in each phase between now and the day deployment becomes training."
Why today's AI forgets everything it learns for you
Two limits explain the amnesia, and both matter for planning.
The first is sample inefficiency. These models are, by Dwarkesh's estimate, roughly one-millionth as sample-efficient as humans during training. That's tolerable for the labs because training is a one-time cost spread across everyone. It's not tolerable for learning the specifics of your business, where the relevant data is scarce — a few hundred examples of how your team handles an edge case, not a few hundred million.
The second is no continual learning: there's no mechanism today to take what a model figured out during a session and write it back into its weights. Humans do this constantly — most employees aren't net-productive until six months in, precisely because competence is built on the job, then consolidated. A model can look brilliant inside a long session and lose all of it when the context clears. People often ask why you can't just fit those six months into the context window. You can fake a lot that way — and for the next two years you should — but it's rented memory, not owned competence.
For your roadmap, the practical translation is blunt: assume any AI you deploy in 2026 is a brilliant contractor with no long-term memory of your company. Design the work, the guardrails, and the data capture around that assumption.
Verifiable isn't the bar — grindable is
Before the three horizons, one idea decides which parts of your roadmap AI eats first — and it's the most underrated point in the essay.
It is not enough for a task to be verifiable (you can check whether it was done right). To improve fast, a task also has to be grindable: you can spin up a deterministic, replayable simulator and run thousands of parallel attempts from the same starting point. Coding is grindable — drop a thousand agents into identical copies of a container and let them attack the same failing test. That's why coding and math have raced ahead.
Computer use is just as verifiable — did the invoice get paid, did the booking go through — yet it lags, because it isn't grindable. You can't run a thousand bots through the real Amazon checkout to practice; you'd get blocked. The real world doesn't offer a reset button. Building a business, winning a deal, running a campaign — none of it replays in a datacenter.
The roadmap consequence:
- Grindable, verifiable workflows — code generation, test writing, data extraction, document parsing, log triage — bet AI-heavy and bet now. Progress here is fast and will keep being fast.
- Non-grindable workflows — closing enterprise deals, navigating your messy internal tools, judgment calls with real-world consequences — keep a human firmly in the loop, and don't roadmap as if the whole org automates on the same date. It won't.
The three horizons to plan against
Here's the planning model I use with teams. Three horizons, each with a different job for you.
| Horizon | Timeframe | What the model can do | What you should be doing |
|---|---|---|---|
| 1 — Frozen agents | Now → 2026 | Strong on grindable, verifiable tasks. No memory of your org beyond the live context window. | Deploy agents on coding, extraction, support triage. Stand up a real eval set. Start logging every interaction and outcome. |
| 2 — The engineered bridge | 2026 → 2027 | Very long, cheaper effective context plus memory scaffolding (retrieval, summaries, fine-tunes). You simulate learning with engineering. | Build the context/memory layer you control. Turn captured interactions into retrieval and evals. Lock in data portability with vendors. |
| 3 — Continual learning | ~2027 → 2028 | Deployed model distills session learnings back into weights and compounds on your usage — plausibly, not certainly. | Feed the loop. Whoever owns the richest interaction-and-feedback data in the niche owns the compounding. |
What this means for build-vs-buy
If models will soon improve from deployment, the durable asset stops being the model and becomes the loop around it. That reshapes the classic build-vs-buy decision:
- Buy / rent the model. Don't train foundation models, and don't over-invest in bespoke fine-tunes you'll throw away each time the base model leaps. The frontier moves under you twice a year.
- Build and own the loop. Your proprietary interaction data, your domain-specific evaluation harness, and your workflow integration depth are the things a learning model needs and can't get elsewhere. That's the moat — not the prompt sitting on top.
Need help thinking this through? Book a 30-min call — no pitch.
The asset to start hoarding now: your feedback loop
Concretely, the highest-leverage thing on your 2026 roadmap is unglamorous: instrument the loop that a future learning model will train on — and that today's retrieval and evals already need.
- Capture per-task feedback. A thumbs up / thumbs down at the level of a completed task, not a chat message. This is the exact supervision signal the next generation of techniques is designed to learn from — and it's a clean retrieval and eval signal today.
- Log real edge cases as eval sets. Every failure your team catches in production becomes a test that stops the regression next time. Today's most successful online-learning systems work because they squeeze signal from real usage — one well-known coding-autocomplete model learns from which of its suggestions get accepted across hundreds of millions of requests a day. You can't match that volume, but you can match the shape: capture the accept/reject signal on your own workflows.
- Write down the tacit knowledge. How your systems fit together, your common failure modes, how work actually flows. This is precisely the organisation-specific knowledge models can't absorb yet — and capturing it is your defensibility in the meantime, the same argument I make in how AI automation is reshaping product teams.
What to put on the roadmap this year — and what to wait on
A simple split for the next four quarters:
- Do now: Deploy agents on grindable, verifiable workflows. Stand up a model-agnostic eval harness. Instrument per-task feedback and interaction logging. Make sure you can export your own data.
- Design for: A context and memory layer you control, not one trapped inside a single vendor. Contracts that keep your data and feedback portable across providers. An architecture that treats agents as a first-class consumer, not an afterthought.
- Don't overbuild: Bespoke fine-tunes you'll discard at the next base-model jump. "Moats" that are one release away from irrelevance. Fully autonomous AI on a non-grindable core workflow before Horizon 3 actually arrives — keep the human in the loop until the capability is real, not promised.
How we help
This is the seam I work on with founders at Kompella Technologies — turning "AI is moving fast" into a roadmap with specific bets, owners, and dates. Two shapes:
- As a fractional CTO or fractional Chief AI Officer embedded 1–3 days a week, owning the AI roadmap, the eval and feedback architecture, and the build-vs-buy calls — so you deploy now without painting yourself into a corner before continual learning lands.
- As a build-and-ship engagement where our team stands up the agent deployment, eval harness, feedback capture, and data-portability layer end-to-end alongside your engineers, with the playbook handed over so your team owns the loop after we leave.
Book a Free 30-Min Strategy Call →
Related reading from Kompella Technologies:
- Agents Are the New Mobile: AI-Ready Architecture 2026 — why AI agents are becoming a first-class customer of your product
- AI Strategy for Startups — the decision framework for shipping AI to production, not AI that demos
- How AI Automation Is Reshaping Product Teams — the three layers of AI adoption inside product orgs
- Build vs Buy Software — how to decide what to own when the frontier keeps moving


