Schemas are opinions — write them like it.

Engineers think of schemas as a way to organize columns. Operators should think of them as something stranger: a theory of how their business operates, written down in tables and types. When the schema is wrong, the agent on top of it cannot help being wrong. When the schema is right, the agent cannot help being useful.

This is the part of data work that nobody talks about, because the people who do it well find it boring and the people who do it badly do not realize they are doing it.

01. A schema is a claim, not a container

When you decide that customers has a column called tier, you are not choosing a database structure. You are claiming, in writing, that every customer can be placed into one of a finite set of tiers. The agent reading that schema will believe you. The next person reading the data will believe you. Nine months from now, the founder will write strategy on top of this claim.

If that claim is false — if some customers do not fit any tier, if a tier was renamed and the old name is still in the data, if the tiers were never agreed across teams — the schema is lying. The agent's downstream answers, which the founder will trust, will be confidently wrong.

A schema is the smallest possible essay on what your business is. Almost nobody writes it that way. — working note

02. Three places schemas are usually wrong

In every diagnostic we run, the same three patterns show up in the data layer:

The "status" column with twenty-seven values. Originally three. Grew by accretion. Half are duplicates of each other in different cases. The agent will treat them as distinct categories. They are not.
The "type" column nobody owns. The CRM team uses it for one thing, the finance team uses it for another, the support team has stopped using it. The schema does not record this. The agent will average across all three meanings.
The foreign key that is mostly null. A relationship that exists in theory and rarely in practice. The agent will assume the relationship is meaningful when it sees it. It usually is not.

None of these are technical bugs. The data is "fine." The schema is what is wrong, because the schema is making a claim that the data has stopped supporting.

03. Write the opinion down

The exercise we ask operators to do, before any agent is built on a data lake, is unfashionably old: write a short essay — half a page per important table — that says, in English:

What this table is a record of.
What it is not a record of.
Which fields are reliably populated, and which are aspirational.
Which fields disagree across systems, and which we have made authoritative.

This document is not optional documentation. It is the schema, written in the language the agent will be given. When the agent answers a question wrong, this is where you go to figure out whether the answer was wrong because the model misread the schema, or because the schema was lying.

The operators who do this exercise stop arguing with their agents. The operators who do not, never quite trust them.

Rule of thumb: if you cannot describe a table's purpose in two sentences without naming a system, the schema is incomplete. The system is implementation; the table is supposed to outlive it.

A short closing

The agents you build will be exactly as honest as the schema underneath them. If the schema is a confident essay about how your business works, the agent will sound confident for the right reasons. If the schema is a pile of column names that nobody has audited in two years, the agent will hallucinate — and it will sound the same.

Treat your schema like writing. It is.

Inženjeri misle o šemama kao o načinu organizovanja kolona. Operateri bi trebalo da misle o njima kao o nečem čudnijem: teoriji o tome kako njihovo poslovanje funkcioniše, zapisanoj kroz tabele i tipove. Kada je šema pogrešna, agent iznad nje ne može da bude tačan. Kada je šema tačna, agent ne može da ne bude koristan.

Ovo je deo rada sa podacima o kome niko ne priča, jer ljudi koji ga rade dobro misle da je dosadan, a ljudi koji ga rade loše ne shvataju da to rade.

01. Šema je tvrdnja, ne kontejner

Kada odlučite da customers ima kolonu tier, ne birate strukturu baze. Tvrdite, u pisanoj formi, da se svaki klijent može svrstati u jedan od konačnog skupa nivoa. Agent koji čita tu šemu verovaće vam. Sledeća osoba koja čita podatke veruje vam. Devet meseci kasnije, osnivač će na ovoj tvrdnji pisati strategiju.

Ako je ta tvrdnja netačna — ako neki klijenti ne pripadaju nijednom nivou, ako je neki nivo preimenovan a staro ime je još u podacima, ako nivoi nikad nisu usaglašeni preko timova — šema laže. Naknadni odgovori agenta, kojima će osnivač verovati, biće samouvereno pogrešni.

Šema je najmanji mogući esej o tome šta je vaše poslovanje. Skoro niko je tako ne piše. — radna beleška

02. Tri mesta gde su šeme obično pogrešne

U svakoj dijagnostici koju radimo, ista tri obrasca javljaju se u sloju podataka:

Kolona „status" sa dvadeset sedam vrednosti. Originalno su bile tri. Narasla nataloživanjem. Pola su duplikati međusobno u različitim padežima. Agent će ih tretirati kao različite kategorije. Nisu.
Kolona „tip" koju niko ne poseduje. CRM tim je koristi za jedno, finansije za drugo, podrška je prestala da je koristi. Šema to ne beleži. Agent će uprosečiti preko sva tri značenja.
Strani ključ koji je uglavnom null. Veza koja postoji u teoriji i retko u praksi. Agent će pretpostaviti da je veza značajna kada je vidi. Obično nije.

Nijedan od ovoga nije tehnički bag. Podaci su „u redu". Šema je pogrešna, jer šema tvrdi nešto što podaci više ne podržavaju.

03. Zapišite mišljenje

Vežba koju tražimo od operatera, pre nego što se ijedan agent izgradi nad jezerom podataka, nemoderno je stara: napišite kratak esej — pola strane po važnoj tabeli — koji kaže, na običnom jeziku:

Šta je ova tabela zapis o.
Šta nije zapis o.
Koja polja su pouzdano popunjena, a koja su težnja.
Koja se polja ne slažu preko sistema, i koja smo proglasili merodavnim.

Ovaj dokument nije opciona dokumentacija. To je šema, napisana na jeziku koji će biti dat agentu. Kada agent pogrešno odgovori na pitanje, ovde idete da utvrdite da li je odgovor bio pogrešan jer je model loše pročitao šemu, ili je šema lagala.

Operateri koji urade ovu vežbu prestaju da se prepiru sa svojim agentima. Operateri koji ne, nikad im sasvim ne veruju.

Pravilo: ako ne možete da opišete svrhu tabele u dve rečenice bez imenovanja sistema, šema je nepotpuna. Sistem je implementacija; tabela treba da ga nadživi.

Kratko zatvaranje

Agenti koje gradite biće tačno onoliko iskreni koliko je iskrena šema ispod njih. Ako je šema samouveren esej o tome kako vaše poslovanje funkcioniše, agent će zvučati samouvereno iz pravih razloga. Ako je šema gomila imena kolona koje niko nije revidirao dve godine, agent će halucinirati — i zvučaće isto.

Tretirajte šemu kao pisanje. To i jeste.

Schemas are opinions — write them like it.

01. A schema is a claim, not a container

02. Three places schemas are usually wrong

03. Write the opinion down

A short closing

01. Šema je tvrdnja, ne kontejner

02. Tri mesta gde su šeme obično pogrešne

03. Zapišite mišljenje

Kratko zatvaranje

Relatedessays.

What Happens When You Debate an AI

Why most AI pilots fail at month three — and the four habits that save them.

RAG isn't search: a primer for operators.

Related
essays.