ai-strategy10 min read

AI That Learns on the Job: A 2027 Roadmap

Founder, Kompella Technologies — Fractional CTO & CPO

Published June 27, 2026

Editorial cover — 'AI That Learns on the Job' with a diagram of the continual-learning loop: a model deployed across three horizons (frozen agents, engineered bridge, continual learning) with learnings distilled back into the weights

The AI roadmap question that matters for 2027–28 isn't how smart models get — it's when deployment becomes training: the point where a model learns from your work and keeps it. Plan in three horizons — today's amnesiac agents, a bridge you engineer with context and memory, and continual learning that compounds on your data. The durable moat shifts from the model to the feedback loop you own. Build for the gap, not the demo.

A founder's planning guide to continual learning — and the no-regret bets to make before it arrives.

Dwarkesh Patel published an essay on continual learning that's worth your time if you set technical strategy. Stripped of the research vocabulary, it asks one question every founder should be asking too: when does using an AI start making your copy of it permanently better? Right now the answer is never. The model you deploy on Monday is exactly as ignorant of your business on Friday. That single fact — and the date it stops being true — should shape your roadmap more than any benchmark score.

Here's how to plan around it.

The roadmap question is "when does deployment become training"

Most AI roadmaps are built on the wrong axis. They plot capability — this model scores 12 points higher on some eval, so we'll wait for it. The axis that actually changes your strategy is whether the model improves from being used.

Today it doesn't. A frozen-weights model amortises one enormous training run across billions of sessions, and your sessions contribute nothing back. Dwarkesh's framing is sharp: we have a genius grad student who has aced every classroom case study but has never been allowed to take an internship — and we keep handing it more case studies. Around 30–50% of a lab's compute goes to serving the model in deployment, and none of that compute currently makes the model smarter. The most valuable signal in the entire system — what real organisations actually use the model for, and where it actually fails — is generated in deployment and thrown away.

The labs are betting they can close that loop. When they do, the company that owns the richest deployment loop in its niche wins compounding that a competitor cannot buy off the shelf. So the roadmap question isn't "how smart will it get" — it's "what should I be doing in each phase between now and the day deployment becomes training."

Why today's AI forgets everything it learns for you

Two limits explain the amnesia, and both matter for planning.

The first is sample inefficiency. These models are, by Dwarkesh's estimate, roughly one-millionth as sample-efficient as humans during training. That's tolerable for the labs because training is a one-time cost spread across everyone. It's not tolerable for learning the specifics of your business, where the relevant data is scarce — a few hundred examples of how your team handles an edge case, not a few hundred million.

The second is no continual learning: there's no mechanism today to take what a model figured out during a session and write it back into its weights. Humans do this constantly — most employees aren't net-productive until six months in, precisely because competence is built on the job, then consolidated. A model can look brilliant inside a long session and lose all of it when the context clears. People often ask why you can't just fit those six months into the context window. You can fake a lot that way — and for the next two years you should — but it's rented memory, not owned competence.

For your roadmap, the practical translation is blunt: assume any AI you deploy in 2026 is a brilliant contractor with no long-term memory of your company. Design the work, the guardrails, and the data capture around that assumption.

Verifiable isn't the bar — grindable is

Before the three horizons, one idea decides which parts of your roadmap AI eats first — and it's the most underrated point in the essay.

It is not enough for a task to be verifiable (you can check whether it was done right). To improve fast, a task also has to be grindable: you can spin up a deterministic, replayable simulator and run thousands of parallel attempts from the same starting point. Coding is grindable — drop a thousand agents into identical copies of a container and let them attack the same failing test. That's why coding and math have raced ahead.

Computer use is just as verifiable — did the invoice get paid, did the booking go through — yet it lags, because it isn't grindable. You can't run a thousand bots through the real Amazon checkout to practice; you'd get blocked. The real world doesn't offer a reset button. Building a business, winning a deal, running a campaign — none of it replays in a datacenter.

The roadmap consequence:

Grindable, verifiable workflows — code generation, test writing, data extraction, document parsing, log triage — bet AI-heavy and bet now. Progress here is fast and will keep being fast.
Non-grindable workflows — closing enterprise deals, navigating your messy internal tools, judgment calls with real-world consequences — keep a human firmly in the loop, and don't roadmap as if the whole org automates on the same date. It won't.

Plan per-workflow, not per-company. The curve is jagged, not uniform.

The three horizons to plan against

Here's the planning model I use with teams. Three horizons, each with a different job for you.

Horizon	Timeframe	What the model can do	What you should be doing
1 — Frozen agents	Now → 2026	Strong on grindable, verifiable tasks. No memory of your org beyond the live context window.	Deploy agents on coding, extraction, support triage. Stand up a real eval set. Start logging every interaction and outcome.
2 — The engineered bridge	2026 → 2027	Very long, cheaper effective context plus memory scaffolding (retrieval, summaries, fine-tunes). You simulate learning with engineering.	Build the context/memory layer you control. Turn captured interactions into retrieval and evals. Lock in data portability with vendors.
3 — Continual learning	~2027 → 2028	Deployed model distills session learnings back into weights and compounds on your usage — plausibly, not certainly.	Feed the loop. Whoever owns the richest interaction-and-feedback data in the niche owns the compounding.

The jump from Horizon 2 to Horizon 3 is the one nobody can date with confidence — the labs themselves call it an open empirical question. Notice, though, that the work is the same in all three. Capturing your interactions and feedback pays off immediately as retrieval and evals in Horizons 1–2, and becomes the fuel for weight-level learning in Horizon 3. That's what makes it a no-regret bet: useful whether or not the optimistic timeline lands.

What this means for build-vs-buy

If models will soon improve from deployment, the durable asset stops being the model and becomes the loop around it. That reshapes the classic build-vs-buy decision:

Buy / rent the model. Don't train foundation models, and don't over-invest in bespoke fine-tunes you'll throw away each time the base model leaps. The frontier moves under you twice a year.
Build and own the loop. Your proprietary interaction data, your domain-specific evaluation harness, and your workflow integration depth are the things a learning model needs and can't get elsewhere. That's the moat — not the prompt sitting on top.

This is also the honest answer to "will continual learning kill AI wrappers." It kills the ones that are only a prompt: every base-model release quietly absorbs them. It strengthens the ones wrapped around real proprietary data and real workflow depth. If a six-months-better base model would erase your product, you're building on the wrong layer — a point I made in AI strategy for startups and one that gets sharper, not softer, as continual learning approaches.

Need help thinking this through? Book a 30-min call — no pitch.

The asset to start hoarding now: your feedback loop

Concretely, the highest-leverage thing on your 2026 roadmap is unglamorous: instrument the loop that a future learning model will train on — and that today's retrieval and evals already need.

Capture per-task feedback. A thumbs up / thumbs down at the level of a completed task, not a chat message. This is the exact supervision signal the next generation of techniques is designed to learn from — and it's a clean retrieval and eval signal today.
Log real edge cases as eval sets. Every failure your team catches in production becomes a test that stops the regression next time. Today's most successful online-learning systems work because they squeeze signal from real usage — one well-known coding-autocomplete model learns from which of its suggestions get accepted across hundreds of millions of requests a day. You can't match that volume, but you can match the shape: capture the accept/reject signal on your own workflows.
Write down the tacit knowledge. How your systems fit together, your common failure modes, how work actually flows. This is precisely the organisation-specific knowledge models can't absorb yet — and capturing it is your defensibility in the meantime, the same argument I make in how AI automation is reshaping product teams.

Do this and you're not waiting on the labs. You're building the one input that makes their breakthrough valuable to you the day it ships.

What to put on the roadmap this year — and what to wait on

A simple split for the next four quarters:

Do now: Deploy agents on grindable, verifiable workflows. Stand up a model-agnostic eval harness. Instrument per-task feedback and interaction logging. Make sure you can export your own data.
Design for: A context and memory layer you control, not one trapped inside a single vendor. Contracts that keep your data and feedback portable across providers. An architecture that treats agents as a first-class consumer, not an afterthought.
Don't overbuild: Bespoke fine-tunes you'll discard at the next base-model jump. "Moats" that are one release away from irrelevance. Fully autonomous AI on a non-grindable core workflow before Horizon 3 actually arrives — keep the human in the loop until the capability is real, not promised.

Sequence it like any other technology roadmap for a startup: deploy where the payoff is immediate, instrument relentlessly, and keep optionality on the bets you can't yet date.

How we help

This is the seam I work on with founders at Kompella Technologies — turning "AI is moving fast" into a roadmap with specific bets, owners, and dates. Two shapes:

As a fractional CTO or fractional Chief AI Officer embedded 1–3 days a week, owning the AI roadmap, the eval and feedback architecture, and the build-vs-buy calls — so you deploy now without painting yourself into a corner before continual learning lands.
As a build-and-ship engagement where our team stands up the agent deployment, eval harness, feedback capture, and data-portability layer end-to-end alongside your engineers, with the playbook handed over so your team owns the loop after we leave.

If you're setting a roadmap that has to survive the next two years of AI progress, that's the conversation to have.

Book a Free 30-Min Strategy Call →

Related reading from Kompella Technologies:

Agents Are the New Mobile: AI-Ready Architecture 2026 — why AI agents are becoming a first-class customer of your product
AI Strategy for Startups — the decision framework for shipping AI to production, not AI that demos
How AI Automation Is Reshaping Product Teams — the three layers of AI adoption inside product orgs
Build vs Buy Software — how to decide what to own when the frontier keeps moving

— Ganesh Kompella

Share 𝕏

FAQ

Frequently asked questions

When will AI models actually start learning on the job?

Nobody knows for certain — it's an empirical bet the labs are making, not a shipped feature. The plausible window is 2027–2028, when continual-learning techniques that distill a session's learnings back into the model's weights mature alongside much longer effective context. Plan with optionality, not a date: deploy AI on tasks that pay off today, and build the data and feedback assets that compound the moment on-the-job learning arrives.

Should I wait to build AI features until continual learning arrives?

No. Waiting forfeits two years of compounding. Today's frozen-weights models are already excellent at grindable work — code, data extraction, support triage, document processing — and deploying them now is how you accumulate the proprietary feedback data that makes a future learning model valuable to you specifically. The mistake isn't building early; it's building a thin prompt wrapper instead of owning the workflow, the eval set, and the interaction logs.

What's the difference between RAG / memory and real continual learning?

RAG and agent 'memory' stuff context into the model at runtime — the actual weights never change, so the model forgets the moment the context window clears. Continual learning updates the weights themselves, so the model gets permanently better at your domain. Today you simulate learning with retrieval and context engineering; by 2027–28 the model may do it natively. Both need the same input — your captured interactions — which is why building that pipeline now is the no-regret move.

Will continual learning make my AI product's moat obsolete?

It will erase moats that are just a clever prompt on top of someone else's model — those get absorbed every time the base model improves. It strengthens moats built on proprietary interaction data, a domain-specific evaluation harness, and deep workflow integration, because those are exactly the signals a learning model needs and cannot get anywhere else. Own the loop, not the prompt.

How do I avoid vendor lock-in as models start learning from my data?

Treat your interaction logs, feedback signals, and eval sets as your assets, not the vendor's. Insist on data export and portability in contracts, keep your evaluation harness model-agnostic so you can swap providers, and store raw feedback yourself rather than only inside a vendor's fine-tuning console. The model is rentable; the data and the feedback loop should be yours to carry to whichever model wins.

Is 'AGI by 2027' something I should build my roadmap around?

Plan against capability horizons, not a headline date. Assume today's grindable-task strength continues, assume context and memory keep getting longer and cheaper, and assume on-the-job learning is plausible but unconfirmed for 2027–28. That lets you make no-regret bets — deploy now, own your data loop — without betting the company on a date the labs themselves call an open empirical question.

About the Author

Ganesh Kompella

Founder, Kompella Technologies — Fractional CTO & CPO

Ganesh is the founder of Kompella Technologies, a fractional CTO and CPO firm working with healthcare, fintech, and SaaS startups from pre-seed through Series B. 15+ years and 75+ products shipped, $140M+ ARR built, one IPO guided. Operates across India, Singapore, and the United States.

Get the next one in your inbox.

Monthly digest of what's working in our portfolio companies — fractional CTO patterns, hiring calibration, architecture trade-offs. No spam, unsubscribe anytime.

Let's talk about what you're building.

Book a Free Strategy Call