On this episode of AI: Machine-Made Marketing, we explore how synthetic populations—statistical reconstructions of entire societies—are reshaping the marketing landscape. Our guest, Ray Kong, Chief Marketing Officer at Arima, breaks down how Arima’s all-in-one analytics platform leverages AI to create privacy-safe, hyper-granular insights that rival traditional data collection.
Ray explains why synthetic populations are more powerful than point-in-time synthetic data, how they balance consumer privacy with advertiser performance, and why small agencies that fail to adopt AI tools risk obsolescence. From micro-targeted local marketing to predictive fraud modeling, this conversation reveals why Gartner predicts that by 2028, 80% of the Global 5000 will use synthetic data at scale.
Download Transcript
Ray, welcome to the show. I’m so pleased to have you on today. Thanks for having me. I’m really excited to be here, Richard. So to kick things off, and for the uninitiated, please tell us about ARIMA Data. ARIMA is a company which has a platform in the marketing, analytics, marketing technology space. It is what we kind of call an all-in-one analytics platform powered by something we call the Synthetic Society. And the Synthetic Society is basically a statistical reconstruction of the population for marketing purposes. Synthetic populations have been used in other places, in health tech, in fraud risk modeling, those kinds of things. But marketers are a little bit later to the game, and partially that’s because there hasn’t been a really good synthetic population empowered by marketing analytics tools. This sounds so futuristic, the synthetic population. I love that term. It does sound really futuristic, and it is scary to some people, but there really is no real magic to it. There’s sophisticated mathematics to it, that’s for sure, but there’s no magic to it. Well, to some of us, that is magic. When you talk about the AI revolution in marketing, which pretty much everybody is talking about, what’s really at stake in your mind for advertisers, agencies, publishers, tech platforms, and the like? You know, that’s a really interesting question, and there’s multiple sides to the answer, I would say. What’s at stake for not using it is not good decision-making and poor decision-making in a world which demands higher levels of accountability, higher ROI, and really more granularity in marketing programs as we move towards a more one-to-one marketing world. What’s at stake for small agencies, small advertisers, small publishers and not using it is you become obsolete because without these types of tools and without AI, the big, the rich, the multinational agencies, marketers, advertisers have the money to invest in tech and to have capabilities or armies of people that can do things that you no longer can’t. But with AI tools, now the playing field is a lot more leveled in that a small advertising agency now can take AI tools and do the things on whatever side, whether it’s the media buying side, whether that’s the creative side, whether that’s the modeling side, the evaluation side, the analytics side, things that were once only the realm of the big advertising agencies. And so… The stake for the small advertiser, for the small agency in not adopting AI tools is being out of business and being obsolete. Well, we’ll get back to the ROI part here in a moment. I’m anxious to talk about that. But first, with so much excitement and concern around AI, depending upon who you talk to, how can marketers embrace AI responsibly in a way that drives this innovation without losing consumer trust? You know, as in everything, it is about being buyer beware. Really understand what you’re using. Understand the tools that you’re using, the upsides and the downsides. You know, a way that you could describe it is, you know, know how your sausage is made. And so a really popular AI tool now is in creative testing. You just upload a creative concept or an ad or some copy or some images and you say, okay, does my target market like this or not? And tools will return an answer. But we talked about magic versus mathematics. It’s not magic. There is no real answer. And you need to understand how whatever tool it is that you’re using is coming back with the answer because the cost of being wrong is high. And blind faith is not so good when you have a lot at stake or when you have a reputation at stake or when you’re trying to differentiate from whoever else you’re trying to differentiate from. It’s been said that AI can spark a new era of accountability in marketing, and I think you probably agree with that. What does that look like in practice for CMOs trying to prove their ROI today? For CMOs, it is the quantification of what has up until now been very difficult to quantify. And so what does ROI mean? ROI means that when I spend a dollar, how much am I getting back? And yet there is a lot of opaqueness in terms of what you know, the relationship between dollars spent and what you get back in terms of, you know, did somebody buy because they saw a billboard and a radio ad? Or did somebody buy because they searched on the web? Or did somebody go to a store because they happened to be walking by? Or were they driven there by another piece of media? And it has been obtuse. And the whole issue of attribution has been a holy grail, so to speak, as you probably well know in the industry. And AI and stronger data and better models, which are fueled by AI because the models that we had before were too, you know, the models that we need to do this are too complex for us as mere humans to be able to do. And so AI machine learning models are the ones that can do this. actually create better accountability and create better granularity and line of sight to efficiency and spend, efficiency and placement, and effectiveness of marketing. We’re moving away from an era of personally identifiable information, which is losing credibility within the industry. AI seems to fill that gap pretty well in some ways. How does AI provide a smarter balance between consumer privacy and advertiser performance? Well, you’ve described the problem exactly correctly. Marketers are caught in the middle of this sandwich. You know, on one side, there’s the demands for more granularity, for more accountability, for more direct attribution. On the other side is privacy concerns and just plain old data denigration from survey bots and from people just getting tired of being asked their preferences and their survey. And so AI in the form of synthetic data actually fills this void because if AI creates a replication of the population behaviors, attitudes, demographics, all constructed from different data sources, that are without PII, then you actually get a statistical reconstruction of the population with no PII attached, but which actually behaves like the population. And so you are able to granularly understand, granularly attribute, granularly model without risking PII, without risking people’s privacy. And are we at a real paradigm shift moment, do you think, where privacy and efficiency, which were once seen as complete opposites, can finally coexist? The short answer to that is yes. I think so. And yet there’s, you know, the problem today, as we said, is that there are a lot of variations on synthetic data. You know, one variation on synthetic data is you type a question into ChatGPT, how many dog-owning, glasses-wearing female blondes are there in the United States? And you’ll get an answer. You don’t know where that answer is. And that answer is very useful in some ways and very unusable in others. And then there are things like what Arima has created, a synthetic reconstruction of the population using marketing variables. And so imagine a database of 250 million individuals, each populated by over 40,000 marketing attributes. People are scared of that because they’ll say, wait, is there a synthetic me there? And how is this constructed? And how do I know that the answer is correct? 15, 20 years ago, when EVs started to first appear on the market, there was a lot of, oh, this is really awesome. This is the future. It solves so many problems. But I’m not going to buy one because I’m worried about charging infrastructure. And I’m not going to buy one because I don’t trust the manufacturer. And I’m not going to buy one because I don’t trust the reliability. It’s yet to be proven out. I think we’re at that place now with synthetic data and synthetic populations. They’re just emerging. Very few people will disagree that they’re a good idea. But many people are still wary and, as they should be, cautious about adoption because there are still unknown unknowns, so to speak, as we move forward. But I do believe, I absolutely do believe that this is the way that marketing will be done in the future. And actually, I’m not the only one. Gartner came out with a report earlier this year in which they said that by 2028, 80% of the global 5,000 will be using synthetic data at scale to fuel their decision-making, up from 5% today in 2025. So 2025 today, 2028 is three years from now. That’s not a long time for now. That’s a short three years. And if that is going to scale the way Gartner predicts, I would suggest that marketers need to start testing this and start using this and start understanding the pros and cons today so that by the time we get to three years from now, it will be more commonplace than it is today. Did I hear you correctly that you said there’s 40,000 user attributes per user? You did hear that correct. The way that synthetic populations are built is they’re essentially stitched together from multiple data sets. And so you’ll have the census, you’ll have MRI Simmons data, you might have media behavior data from a different data set, you might have some JD Power data sets, you may have other private data sets. And each of them, you know, maybe they’ll have asked a hundred different attributes and you stitch all of these together and they add up pretty quickly. Well, there’s almost too much data because, you know, there’s always, you know, analysis paralysis, which is also a problem in the business. Well, for listeners who may not be familiar, can you break down the difference between synthetic data and synthetic populations and why that distinction matters? Synthetic data is any data where missing values are filled in or the data is recreated from a different data set. And so one example is ChatGPT. So you’ll go to ChatGPT or OpenAI or Gemini or whatever, and you’ll say, how many F1 fans are there in the US? And it’ll come back with an answer. You might ask chat GPT, that’ll come back with one answer. You ask Gemini that, it might come back with a different answer, because they’re pulling from different data sources. That’s synthetic data, and that has a lot of value, because it is fast, it’s easy, it’s super fast, it’s super easy, it’s very accessible. But the problem with that type of data is that if you will then want to create a heat map of where are the F1 fans in the major cities in the US, you cannot do that because it is just a single piece of disconnected data. If you wanted to understand, okay, so F1 fans, are they also NFL fans? You don’t have that information because it’s a single piece of disconnected data. A synthetic population actually, because it has, I’ll call it again, 40,000 variables, It will have that data. It will have mobility patterns. It will be connected to either zip codes or census blocks or locations. So you can heat map proximity to a sports store. You can triangulate between an NFL fan and an F1 fan by city, and you can map them, and you can understand demographics, and you can understand buyer behavior. The difference to us of a synthetic population is that you have all the variables on each synthetic individual as opposed to what I will call like almost point in time or singular pieces of information. And so to us the synthetic population and the tools that use synthetic populations to return analytic data are way more powerful, way more powerful than you know, the single point in time, ask a question and get an answer type of synthetic data resources that are available today. When it comes to synthetic populations, how do they solve for things like privacy compliance and scalable unified profiles, which have always been somewhat in conflict? The source data is always free from PII, and therefore the synthetic population data has no PII. And moreover, because it’s multiple data sources, you know, the 45-year-old male that lives in Washington, D.C., there’s actually in a particular neighborhood, let’s call it, 10, 45-year-old males in Washington, D.C. on those three attributes. But on the other ones, whether you drive a pickup truck or a sports car, how many kids do you have? All of those are stitched together from different places. And so people that we talk to are tempted to look themselves up in our data bases and say, is there a me there? And if they do that, they might find five people facsimiles of themselves, each slightly different, all representative of the population, but no one necessarily exactly like you. And so from that perspective, it is privacy safe in that you can never really, if you wanted to, drill down and identify a specific individual because of the way the data is constructed. So I can never, for example, just go out there and find my synthetic digital twin. No, not possible. You could probably find your doppelgangers. So, you know, maybe the 10 people like you that live in your immediate neighborhood or the, or the four people like you that live in your immediate neighborhood, but you would all be maybe sort of photocopies of each other alike, but not exactly alike. Yeah. Interesting. It would be fascinating to find out what other interests they have that I don’t know about just so I could find out if I like them too. That’s why I’m kind of curious as to, you know, how one could go about sort of diving into that. But let’s dig into use cases. How are synthetic populations being applied today from hyperlocal site selection to new product development, polling, or marketing mix modeling? Yeah, you know, one of the things that we are discovering and one of the exciting, the really exciting things about where I am today in terms of talking to potential users and users is that use cases are limited only by imagination. I regularly hear of somebody saying, oh, can I use the population for X? And they’ll go, yeah, you can. I didn’t think of that, but yes, you can. And so, you know, there are drier and more and less dry uses for the population. In summary, you know, you could use the population for anywhere you use primary data. And so if you were out, you went to a big survey firm, or even you did a survey yourself on usage and behavior, and you used that data, you can use the synthetic population for the same, faster, less expensive, and more broadly. So for… The popular one is evaluating new concepts and messaging. Will my target market respond to this message more than that message? We’ve had clients use the data to optimize their brick and mortar locations and configurations because they understand on a market-by-market basis the online versus offline shopping behavior in local neighborhoods. I was talking with a potential client this morning who has a chain of stores in Alaska, of all places. Well, in Alaska and in Texas. And, you know, Alaska is not Texas and you got to design your store differently in Alaska because the market’s different than in Texas because the market is different. You know, there’s advanced use cases as well. Propensity modeling, whether it’s retention modeling, you know, purchase modeling. One insurance client has started to use the data for fraud risk modeling. And for direct sales progress modeling, so when you are selling online and somebody gives you a credit card, do you know that they are going to actually be good for this? And in the insurance world, that’s a big deal because the long-term risk is high. And so with synthetic populations, you can actually infill other data to understand whether this person is a higher risk or a lower risk. That was the case that we never really thought about until this insurance client said, oh, I think I’m going to start to test this usage. you know, plan and optimize channel campaigns, marketing mix campaigns, marketing mix modeling, understand, you know, online, offline, okay, how do I best spend my marketing dollar based on the population that I have? And is that different in Atlanta than it is in New York? Because the populations of Atlanta and New York are different and they respond to media differently and the media available is differently. And so now I can actually create a channel mix different in New York than I could than I do in Atlanta, as opposed to having to settle for one size fits all across the country. As I said, you know what, it’s limited only by imagination. And this is why it’s so exciting. And the next few years for us are going to be so exciting because the users are going to invent what the data is going to be used for, as opposed to us being… Pretending that we’re so smart that we know what all the use cases are. It’s fascinating that people are bringing that stuff to you, though. I mean, that’s not traditionally how a lot of this has worked. And so I could see where, you know, if I’m in your shoes, that would be nothing short of thrilling, really. I mean… It makes getting up every week and moving along every week pretty… Pretty exciting. And, you know, it really, it really creates a situation where we can’t tell our story to enough people because, because just because of just that, that people will come and tell us, oh, I would like to try this and I would like to try that. And it’s like, oh, let’s go, man. Let’s do it. If you were to look two to three years down the road, what role do you see synthetic populations playing in the broader AI marketing ecosystem? I think they will be way more common than they are today. They will be way more used than they are today. In the same way, you know, I talk about EVs before, you know, first it was just Tesla and then there were a few other brands and now virtually every major manufacturer of cars has an EV coming out. There are subtle differences and important differences in between the different brands and the cars and the way they’re manufactured. And, That will be the same case for synthetic populations when you, you know, three years from now. But they will be, you know, this is a marketing nightmare, but they are faster, better, and cheaper. Period. And I know that’s the triple threat of death or the triple whatever it is. The three things that you should never talk about in terms of what your product does, but that’s in fact what synthetic populations will do. And so I’ll come back to, you know, if we were three years from now and are talking to marketers and you were to say, well, there’s so many different versions available, I would say, again, buyer beware and be aware of what it is that you are buying. Know what you’re trying to do and know that the tool and the source fits what you’re trying to do because there are important differences in the same way that, you know, the Ford Mustang EV is different than the Tesla model, whatever, which is different than the The BMW model, EV model, there are differences there that you need to understand. Last question. What marketing opportunities do you foresee arriving in 2026 that have you most excited? the ability to better target and market at local levels and small levels. And so if I’m a microbrewery to really understand how my market is different than the market of one of the national or international brewers, if I’m a small retailer, to really understand how to fulfill my market’s needs and to target my market in a way that I could never do before. And so I think us as consumers are actually better served that way and that we’ll get better marketing, more relevant marketing to us and things that are just more valuable to us. Excellent. Well, Ray, this has been fascinating. If somebody wanted to find out more about you or better yet, come to you with a whole new way of looking at this stuff, where would you send them? aremadata.com, A-R-I-M-A-D-A-T-A dot com, or reach out to me on LinkedIn, Ray Kong on LinkedIn. Awesome. Well, thanks so much for taking part today. This has been fascinating. Thanks for the time. And thanks for your questions, Richard.