Digital Audience Models Don’t All Look-Alike

people as data points

The brand with the best customer data often wins the marketing game. But as Apple’s new iOS privacy features limit user tracking, some sources of consumer data are drying up. It may feel like marketers have to trade off behavioral targeting for privacy, but this trade-off can be broken. Marketers can have their cake and eat it too.

First let’s look at the most common digital targeting approach used by marketers whose first party data set doesn’t provide the scale they need to meet growth objectives.


Unfortunately, anyone can create a look-alike audience. A company with data on one shopper could use a few demographics from that person, apply it to the U.S. population and call it a look-alike model – not that you’d want to use it. But the point is that there is no standard or hurdle to use the term. In this extreme example, it’s easy to spot the flaw. With a sample of one, there is no way to know if the audience is any better than random selection.

Now consider an audience that’s a look-alike built from a survey of 2,000 people. Is that really any better than one person? Audiences modeled from smaller samples are less likely to be accurate

Moving up the sample size ladder are location-based audiences, a favorite of retail and restaurant brands for years. Yet there’s been a nearly 70% decline in always-on location data since Apple’s iOS 13 rollout. Apple is on iOS 15 now and only 16% of consumers are opting into apps tracking behavior.

Few marketers measure if their digital buy actually reached the people they intended. In part, that may be due to the difficulty they would have in determining the root cause for any issues they uncovered. Was it the audience or the cookies or publisher? It’s a bit like which building sub-contractor is to blame for a building problem. They all point to someone else.

Marketers could use the equivalent of the DOCG in Italian wines. The Denominazione di Origine Controllata e Garantita (DOCG) is independent proof that the wine is what it claims to be and that it’s high quality. Ah, if only… Until then, buyers of digital audiences must develop their own nose for quality.

An Alternative

So, how can retailers make informed, data-based targeting and business decisions? More marketers are turning to cohorted or modeled consumer data for privacy safe, data-based decision making. An example of modeled data are transaction-based insights using anonymized data from credit card and debit card transactions. The data isn’t forecast or speculated. Instead, it’s factual data from purchases made by millions of people and collected on a daily basis.

So, how is this different than a traditional look-alike? A cohorted model is one that is statistically tested against other data and proven to be accurate. They are built using a sample of the available, observed data. This is commonly called the seed file. For the sake of example, let’s say that the seed file is 5% of the total behaviors. The model is grown from the seed. By definition you know the model will be good at predicting the seed behaviors, but the statistical test is how well it predicts behaviors in the rest of the population. So you run the model to predict the other 95% and compare that to the actuals for that group. The math may be complex, but the premise is not. Test how well the model works. Don’t assume it does as many look-alikes do.

This may seem obvious. Wouldn’t every audience company do this? The answer is no and there is one main reason. Creating good models requires very large data sets. Surveys won’t do. As the available data gets smaller, the harder this becomes.

Where the cohorting comes in

If you are thinking that demographics and attitudinal data would be a poor predictor of purchases, you are partially correct. One model or formula for an entire population won’t cut it. Enter cohorting. By grouping like consumers into thousands of small cohorts, data scientists can build different models for every cohort using machine learning and artificial intelligence. Now, this is not one of those “it’s AI, so trust us” references. It’s just acknowledging that creating different models for thousands of cohorts for hundreds of audiences is too much work for us mortals.

As a marketer, thankfully, you don’t need to understand the code, the methodology or the math. Just ask “How big is the data set driving the model?” and “How accurate is the model when tested against the population (excluding the seed file)?”. Trust, but verify.

Tradeoff broken

Models built on anonymous, cohorted data give retailers the ability to target based on category or brand purchases while respecting individual’s privacy, assuming the seed file is excluded from the audience. High quality models can create addressable segments that predict purchases with better than 90% accuracy.