
The 500-Question Customer
Synthetic audiences are everywhere now, but the question that actually decides whether one is worth trusting is rarely asked: how much of a real person does the model need before it behaves like the original?
In a Columbia study, more than two thousand people each answered five hundred questions about themselves…, and the twins reproduced people’s held-out answers at about eighty-eight percent of the rate the humans matched their own answers two weeks later. The striking part: more than a dozen ways of building the twin — different models, formats, prompting tricks, even fine-tuning — barely changed the result. What carried it was the depth of the data, not the cleverness of the model.
The same lesson shows up when Bain backtested synthetic audiences against a company’s real prior study and concluded the data grounding a model matters more than the model itself, and when BCG reported a synthetic panel predicting real shoppers’ choices with ninety-two percent accuracy — after fine-tuning on real data. A working paper from University College Dublin adds the other half: demographics alone can’t capture how a real person decides, and the data has to be organized for the job it’s doing.
The takeaway for brand leaders: when someone hands you a customer model, stop asking which AI is under the hood and start asking how much real, first-party data it’s built on — and whether it was tuned for the decision you’re making.
Know Your Audience is a weekly podcast for the leaders making consequential brand decisions while the ground shifts beneath them. CMOs, CPOs, and CEOs face a fundamental change in how their organizations can understand customers, and the decisions that depend on it. Produced by Soulmates.ai.