Why most AI pilots fail at month three — and the four habits that save them.

We've helped fifteen small and mid-sized businesses ship AI into production over the last two years. Three of them are spectacular. The rest are useful. None of them looked like the demo we ran in week three.

Here's a thing nobody tells you about AI pilots: they don't fail in the way you expect. They don't fall over because the model is wrong, or the prompt is bad, or the vector database is slow. They fail because at month three the champion gets pulled onto a different fire, the data the agent was trained on drifts, and the person who would have caught the regression isn't watching anymore.

This is a piece about the four habits we've started insisting on — sometimes politely, sometimes not — before we'll take the engagement.

01. Own the eval before you own the model

If you cannot describe, in one paragraph, what "good" looks like for your agent, you do not have a project. You have a vibe. The first deliverable in any engagement we run is an evaluation harness — a dataset of real questions, real answers, and an automated way to check the agent against them.

Building this is unsexy. It is also the thing that lets you sleep at night. When the model upgrades next month — and it will — your eval is the only signal that tells you whether you got better, worse, or the same. Without it you are operating on faith.

The eval is not a checkbox. It's the only feature that matters in the first thirty days. — from a postmortem we wish we hadn't written

The companies who fail at month three almost always built the eval at month two-and-a-half. By then they were patching, not measuring.

02. Pick a wedge, not a transformation

The biggest favor you can do your future self is to pick a workflow you can describe on a Post-it. Not "we want to use AI in operations." Not "intelligent automation across the back office." A wedge looks like:

Quote turnaround for incoming RFQs.
First-draft replies to support tickets in a single category.
Reconciling line items between two systems that should agree but don't.

Wedges are sized to fit one team's annoyance. They cost less than a senior engineer's quarterly comp. They ship in weeks, not quarters. And — critically — when they work, they finance the next wedge without anyone needing to write a memo about transformation.

Rule of thumb: if the workflow can't be described in a single sentence ending with the word "today" — as in "how we do this today" — it's not a wedge yet. It's a strategy. Put it back in the drawer.

03. Wire the lake before you build the agent

This is the habit that separates the engagements that compound from the ones that stay one-off projects. Most teams reach for the agent first because the agent is the visible part. The lake — the boring, deduplicated, schemaed pile of your company's actual data — is the part that determines whether the second pilot is cheap or expensive.

Wire it once. Reflect every system your business actually depends on into a place you control. Postgres, S3, Iceberg, whatever your team can operate. Don't boil the ocean — start with the systems your wedge needs — but write the connectors as if you'll need ten more.

The third pilot will pay for the first lake. The fifth pilot will be free.

04. Stay boring on purpose

The best AI rollouts we've run look almost identical to good software rollouts from a decade ago. Code review. CI. A staging environment. Feature flags. A real on-call rotation when the agent is doing real work. The model is exotic. The way you ship the model should not be.

This is harder than it sounds, because the field rewards novelty in conversation and punishes it in production. Every quarter there is a new framework, a new pattern, a new way of orchestrating tool calls. Most of them are fine. Almost none of them are worth re-platforming for.

The four-habit checklist, in plain English

Eval first. Write down what good looks like, in tests, before you write prompts.
Wedge, not vision. One workflow, one team, one quarter.
Lake before agent. The boring infrastructure determines the cost of pilot #2.
Boring delivery. Treat AI rollouts like the production software they are.

A working theory

If you take one thing from this essay, take this: the failure mode of AI pilots is not technical. It is organizational. The team that ships pilot two has built habits the team that doesn't ship pilot two has not. They are unfashionable habits. They are also the only ones that survive contact with month three.

If you'd like help building those habits, we do this for a living. If you'd rather build them yourself — and many of our best clients did, before they hired us — these four are the place to start.

Pomogli smo petnaest malih i srednjih firmi da puste AI u produkciju u poslednje dve godine. Tri od njih su spektakularne. Ostale su korisne. Nijedna nije izgledala kao demo koji smo radili u trećoj nedelji.

Evo nečega što vam niko ne kaže o AI pilotima: ne padaju onako kako očekujete. Ne ruše se zato što je model pogrešan, prompt loš ili vektorska baza spora. Padaju jer u trećem mesecu šampiona povuku na drugi požar, podaci na kojima je agent obučen se pomere, a osoba koja bi uhvatila regresiju više ne gleda.

Ovo je tekst o četiri navike na kojima smo počeli da insistiramo — ponekad pristojno, ponekad ne — pre nego što uđemo u angažman.

01. Preuzmi evaluaciju pre nego što preuzmeš model

Ako ne možete u jednom pasusu opisati kako izgleda „dobro" za vašeg agenta, nemate projekat. Imate utisak. Prva isporuka u svakom angažmanu je harness za evaluaciju — skup pravih pitanja, pravih odgovora i automatski način provere agenta.

Pravljenje ovoga nije seksi. Ali je ono što vam dozvoljava da spavate noću. Kada se model nadogradi sledećeg meseca — a hoće — vaša evaluacija je jedini signal koji vam kaže da li ste bolji, gori ili isti. Bez nje radite na veri.

Evaluacija nije čekboks. To je jedina funkcionalnost koja je važna u prvih trideset dana. — iz postmortema koji bismo voleli da nismo napisali

Firme koje padaju u trećem mesecu skoro uvek su pravile evaluaciju u dva i po. Tada su već krpile, ne merile.

02. Izaberi klin, ne transformaciju

Najveća usluga koju možete učiniti budućem sebi je da izaberete tok rada koji možete opisati na papiriću. Ne „želimo AI u operacijama". Ne „inteligentna automatizacija kroz back-office". Klin izgleda ovako:

Vreme izrade ponude za dolazne RFQ.
Prvi nacrti odgovora na tikete u jednoj kategoriji.
Usaglašavanje stavki između dva sistema koji bi trebalo da se slažu, ali se ne slažu.

Klinovi su veličine jedne timske iritacije. Koštaju manje od kvartalne plate seniora. Isporučuju se za nedelje, ne kvartale. I — kritično — kada rade, finansiraju sledeći klin bez memoa o transformaciji.

Praktično pravilo: ako tok rada ne može biti opisan u jednoj rečenici koja se završava sa „danas" — kao u „kako ovo radimo danas" — nije još klin. Strategija je. Vrati ga u fioku.

03. Prvo poveži jezero, pa onda agenta

Ovo je navika koja razdvaja angažmane koji se akumuliraju od onih koji ostaju jednokratni. Većina timova prvo posegne za agentom jer je agent vidljivi deo. Jezero — dosadna, deduplicirana, šemirana hrpa pravih podataka vaše firme — je deo koji određuje da li je drugi pilot jeftin ili skup.

Povežite ga jednom. Reflektujte svaki sistem od koga vaša firma stvarno zavisi u mesto koje kontrolišete. Postgres, S3, Iceberg, šta god vaš tim ume da održava. Ne kuvajte okean — počnite sa sistemima koji su potrebni vašem klinu — ali pišite konektore kao da će vam trebati još deset.

Treći pilot će platiti prvo jezero. Peti pilot je besplatan.

04. Ostanite dosadni namerno

Najbolje AI implementacije koje smo radili izgledaju skoro identično dobrim softverskim implementacijama od pre deset godina. Code review. CI. Staging. Feature flagovi. Pravo on-call dežurstvo kada agent radi pravi posao. Model je egzotičan. Način na koji isporučujete model ne bi trebalo da bude.

Ovo je teže nego što zvuči, jer struka nagrađuje novost u razgovoru, a kažnjava je u produkciji. Svakog kvartala je novi frejmvork, novi obrazac, novi način orkestracije tool calls. Većina je u redu. Skoro nijedan nije vredan migracije.

Lista četiri navike, jednostavno

Prvo evaluacija. Zapišite kako izgleda dobro, kao testovi, pre nego što napišete promptove.
Klin, ne vizija. Jedan tok, jedan tim, jedan kvartal.
Jezero pre agenta. Dosadna infrastruktura određuje cenu drugog pilota.
Dosadna isporuka. Tretirajte AI implementaciju kao produkcioni softver kakav i jeste.

Radna teorija

Ako iz ovog teksta ponesete jednu stvar, neka bude ova: način na koji AI piloti padaju nije tehnički. Organizacioni je. Tim koji isporučuje drugi pilot izgradio je navike koje tim koji ne isporučuje nije. Te navike nisu pomodne. Ali su jedine koje preživljavaju kontakt sa trećim mesecom.

Ako želite pomoć u gradnji tih navika, mi to radimo za život. Ako više volite sami — a mnogi naši najbolji klijenti su radili pre nego što su nas angažovali — ova četiri su mesto za početak.

Why most AI pilots fail at month three — and the four habits that save them.

01. Own the eval before you own the model

02. Pick a wedge, not a transformation

03. Wire the lake before you build the agent

04. Stay boring on purpose

The four-habit checklist, in plain English

A working theory

01. Preuzmi evaluaciju pre nego što preuzmeš model

02. Izaberi klin, ne transformaciju

03. Prvo poveži jezero, pa onda agenta

04. Ostanite dosadni namerno

Lista četiri navike, jednostavno

Radna teorija

Relatedessays.

What Happens When You Debate an AI

RAG isn't search: a primer for operators.

The data lake question every CFO should ask first.

Related
essays.