Adapted from StartupAI source material dated February 9, 2026. This note explains the product judgment, not internal implementation details.

Source material: ADR-010

Opening thesis

Early in building StartupAI, our system once handed back a polished, confident, completely made-up customer profile — one that had nothing to do with what the founder actually entered. That moment set a rule we now build around: a recommendation is only as trustworthy as the line between what was observed and what was invented. Here is how we keep that line visible.

Fluency is not evidence

That fabricated profile was not a bug we could prompt our way out of. It was a warning about what AI does best and worst at the same time: it can make weak evidence sound decisive. It will turn a one-line idea into a confident segment, a plausible market story, and a clean next step — and none of that is validation.

The real risk is not that AI is imperfect. It is that it blurs belief, synthesis, and observation into one persuasive answer. Act on that and you can spend weeks building for a customer no one has ever actually talked to. The more polished the output, the more dangerous it gets — rough notes invite skepticism, while a finished-looking report feels earned even when it is mostly inference.

A conductor, not a performer

So we rebuilt it so that fabrication-as-evidence is not just discouraged — it is structurally hard to do. The system treats two things as fundamentally different: a hypothesis (AI thinking) and evidence (something actually observed from the real world). Hypotheses are welcome and always available — they fill the canvas and point at what to test — but they are labeled as needing verification, never dressed up as proof.

That is why we keep two numbers apart that founders usually see mashed together: how complete your picture is, and how much of it is backed by real evidence. A fully filled-in canvas built entirely of hypotheses honestly reads as “complete, but unconfirmed.” And when evidence does exist, it carries its provenance — where it came from, how recent, how strong — because what people do should outweigh what they say.

I would rather tell you the truth than tell you it sounds slower than a tool that just makes things up. It is slower. Real evidence takes real time, and some of it costs money. We pass that through honestly, because honesty about confidence means honesty about effort too. And critically, thin evidence never blocks you — the system still helps; it just will not call a guess a fact.

Read confidence as a claim

Read confidence as a claim that still needs support. If a tool says a segment is promising, ask what is behind it. If it says a problem is urgent, ask whether that came from behavior, interviews, research, or inference. A good workflow helps even when evidence is thin — here is the best hypothesis, here is why it might hold, here is what would verify it — which is more honest than refusing to help and more useful than pretending the guess is proven.

Keep usefulness and proof in separate boxes. AI can be genuinely useful before any proof exists. Just do not let a polished hypothesis graduate into a build decision until the market has actually said something.

Key takeaways

A polished recommendation is not validation unless its evidence boundary is visible.
Hypotheses are valuable — but they should not be weighted or labeled like observed evidence.
Ask where every important claim came from before you act on it.
The right tool helps with thin evidence while making that thinness obvious.

Put the judgment into a real validation flow.

StartupAI turns founder ideas into reviewed evidence plans and founder-controlled decisions.

See the workflow

The day our AI invented a customer

Opening thesis

Fluency is not evidence

A conductor, not a performer

Read confidence as a claim

Key takeaways