News
Foundational Models for MMM: Interesting, but Built on a Stable Base?
Amazon Ads recently published an intriguing whitepaper proposing a “foundational model” approach to Marketing Mix Modeling (MMM), analogous to how large language models like GPT learn generalizable patterns across many data sources.
It’s an appealing idea: a shared, privacy-safe model that learns from many brands, reducing noise and cost while improving consistency. But I can’t help wondering whether this foundation may rest on shifting ground.
MMMs are, by nature, models. They depend heavily on the assumptions, priors, and data chosen by their modelers. A “foundational” MMM trained on hundreds of brand-level models risks compounding those assumptions rather than correcting them. And when synthetic data enters the mix, the system depends further on abstractions. (See our recent blog post on the risk of AI designing ad optimization on non-experimental assumptions of causality.)
It brings to mind Nassim Taleb's cautions about systems built on interconnected dependencies createing fragility, e.g., financial models leading up to the 2008 mortgage crisis. When many players adopt the same flawed premise, the system becomes brittle.
Aside from one passing reference, what's missing from the whitepaper is a discussion of validation through high-quality experiments. Randomized controlled trials (RCTs) remain the best evidence for causal advertising impact. In the hierarchy of evidence, only meta-analyses of many RCTs ranks higher. The industry should focus on building toward benchmarks of incrementality RCTs to calibrate and test any MMM foundation, preventing models from recursively learning from other models.
And it’s worth remembering who is proposing this. Amazon is now in the MMM business, joining Google’s Meridian and Meta’s Robyn. As I argued in The First Principle of Honest Advertising Measurement Is Independence from the Media, credible measurement depends on independence. It’s hard not to be cautious when the companies selling most of the media are also building the tools to “prove” its ROI.
Foundational MMMs could be a leap forward, but only if their footing is grounded in experiments, not self-reference. Don't build sand castles on the beach at low tide.
(Join the discussion of this essay on LinkedIn.)
The First Principle of Honest Advertising Measurement Is Independence from the Media
When you buy a house, you rely on the surveyor and the structural engineer—people who don’t work for the seller. When you pick a restaurant, you trust the health inspector’s “A” in the window, not the chef’s Yelp review. When you invest, you care that the books have been vetted by an independent auditor. Yet in advertising, marketers routinely let the seller grade its own reporting.
That contradiction should make any serious business leader pause. The single most important principle of credible measurement—independence—is still the rare exception in marketing.
The missing audit culture
Across mature professions, independence is non-negotiable. ISO/IEC 17025 requires that testing laboratories “be impartial and be structured and managed so as to safeguard impartiality,” ensuring protection from undue influence or conflicts of interest. The U.S. GAO’s Yellow Book demands that auditors be organizationally independent from those they audit. Clinical trials rely on independent data-monitoring committees to stop studies if conflicts distort findings (ISO; GAO; ICH Guidelines).
The logic is the same everywhere: no matter how advanced the math, if the measurer benefits from a positive result, the credibility is shot. Advertising somehow missed that memo.
A cultural blind spot
Many advertisers genuinely want to measure impact better, but legacy incentives often get in the way. Marketing departments are rewarded for deploying budget efficiently, not for proving whether that budget truly drove incremental sales. It’s understandable, but it creates a blind spot.
Meanwhile, there’s no shortage of metrics: reach, impressions, viewability, clicks, attention, engagement. Each enjoyed its moment in the sun, but none answers the CFO’s central question: How many sales happened that wouldn’t have happened without the marketing spend?
Too often, advertisers accept ROI reports produced by the same entities selling them media or managing their buys. Agencies and platforms rarely hide their preference for internally run “studies” that tend to deliver friendly numbers. Vendors openly admit (over drinks with other vendors) that clients rarely prize accuracy in reporting results. One senior product head at a major agency told me, “Why would we want third-party randomized controlled trials? That only introduces risk for us.” A top consulting firm, hired by a major digital platform, confided that they use that platform’s internal synthetic-control mechanism “because the results are regularly favorable for them.”
Like drunks using lamp-posts: for support rather than illumination, What independence protects
Independence is not a nicety; it’s the foundation of credible causal inference. It protects against three forms of bias:
In design. When the party that profits from the outcome designs the test, parameters conveniently align with its interests.
In analysis. Data processing, covariate selection, and modeling choices can subtly tilt results. Independence enforces transparency.
In interpretation. Even an honest analyst may face pressure to present “directionally positive” findings. Structural separation shields objectivity.
In statistical terms, independence reduces systematic error, the enemy of validity. In business terms, it’s an insurance policy against self-deception.
Other fields learned this long ago
Financial markets insist on auditor independence because self-auditing destroyed companies from Enron to Wirecard. Laboratories maintain impartiality accreditation (ISO 17025) because test results affect safety and trade. Clinical researchers separate trial oversight from sponsors to protect patients and truth. Forensic labs maintain chain-of-custody independence to keep prosecutors from tainting evidence (ISO; GAO; ICH; National Academies Press).
Advertising commands billions in corporate capital every year. Why should its measurement standards be lower than food safety or bridge engineering? Granted, no one dies when campaigns underperform, but companies live and die by market share, a zero-sum game where those who measure best have a valuable edge.
A few bright spots
There are encouraging examples, such as Netflix, eBay, and Indeed, which have built experimentation cultures rooted in transparency and rigorous testing. They prove that independence and scientific discipline aren’t academic ideals, they’re competitive advantages.
More advertisers are beginning to follow suit. For those genuinely striving to optimize media spend, demanding verifiable incremental impact isn’t risky, it’s liberating. It clarifies what truly works and earns credibility across the business.
What independence doesn’t mean
It doesn’t mean outsourcing everything to a third-party vendor. Advertisers themselves are independent from the sellers and can build capability in-house. The key is structural separation: the people whose performance depends on campaign success shouldn’t be the same people validating its impact. The CFO’s team doesn’t audit its own books; marketing shouldn’t either.
Yes, I run a company that performs independent experiments for advertisers. I have skin in the game. But the principle stands regardless of who executes the measurement: someone must own the truth, and it cannot be the party selling, or even the one buying, the ads.
A better path forward
This industry doesn’t lack intelligence or ambition. It lacks a measurement culture grounded in independence and evidence. Marketers deserve clarity about what’s truly driving growth. Vendors that can prove genuine incrementality should welcome that scrutiny; everyone else will raise their game or fade away.
If marketers spent half as much energy insisting on credible causal measurement as they do chasing vanity metrics, the entire ecosystem would benefit. Media sellers would compete on actual performance, not on weaponized opacity. Agencies would be valued for insight, not self-preservation. CFOs would trust marketing again.
The principle that scales
Independence isn’t optional. It’s the first condition of truth. Many disciplines that measure cause and effect learned this long ago. Advertising is simply overdue to catch up.
If you’re still taking the media company’s ROI slide deck at face value, well, I do live in Brooklyn. Maybe you'd be interested a nice shiny bridge?
The Compounding Power of Long-Term Advertising: A Review of the Evidence
A question came up on the Research Wonks list about the impact of advertising short-term vs. long-term. (If you don't know the Wonks, sign up at ResearchWonks.com: forum of ~1500 media researchers.)
I googled then AI-ed. Below are some influential works on the subject. ChatGPT summarizes the consensus thusly:
Long-term impact is typically 2–5× the short-term effect. This ratio appears consistently across academic (Hanssens, Mela), industry (Sequent Partners, Gain Theory), and practitioner (Binet & Field) research.
Short-term metrics underestimate ROI. Campaigns optimized for near-term sales often misallocate budgets away from high-ROI brand building.
Different mechanisms drive each timescale: short-term activation (rational persuasion, promotions) vs. long-term brand effects (emotional memory, reduced price sensitivity, loyalty).
Sustained investment compounds results. Continuing advertising reinforces brand memory and repeat purchasing; stopping spend causes rapid decay.
Media and creativity matter. Broad-reach, emotional campaigns (especially on TV and high-quality digital) generate stronger long-term multipliers.
Best practice: balance roughly 60% brand / 40% activation spending and evaluate outcomes over multi-year horizons to capture the full economic impact.
Evaluating Long-Term Effects of Advertising, Sequent Partners, 2014
Short- and Long-term Effects of Online Advertising: Differences between New and Existing Customers, Breuer et al., 2012
What Is Known About the Long-Term Impact of Advertising, Hanssens, 2011
The Long-Term Impact of Promotion and Advertising on Consumer Brand Choice, Mela et al., 1997
The Long and the Short of It, Binet et al.,
Focusing on the Long-term: It’s Good for Users and Business, Hohnhold, 2015
Five key results from our long-term impact of media investment study, Chappell, Gain Theory
Giving Marketing the Credit it Deserves, Rubinson, TransUnion, MMA
The Hierarchy of Advertising Evidence
In medicine and other sciences, the “Hierarchy of Evidence” ranks research methods by the strength of their causal claims, with meta-analyses of randomized controlled trials at the top and anecdotal opinion at the bottom. Advertising faces the same challenge: separating true incrementality from correlation. This hierarchy adapts that framework to advertising, showing how experimental methods — ideally randomized controlled trials (RCTs) — provide the most credible evidence of causal impact. Quasi-experimental and observational approaches, such as marketing mix models, synthetic controls, or attribution, remain widely practiced and valuable, especially when experiments are impractical (e.g., in small national markets or for channels like in-store promotions). Still, because these approaches lack randomization, they depend on modeling choices and statistical assumptions that are harder to verify, and thus they are subject to bias, overfitting and provide less certain evidence of causal impact than experiments.
Experiments
Models Plus Experiments (MPE) – Most rigorous framework: routinely validating MMM (or other models) with RCTs; combines scale of models with causal credibility of experiments.
Meta-Analysis of Many Large-Scale Experiments – Aggregating results across multiple RCTs; strongest external validity.
Large-Scale Geo Experiments (Cluster RCTs) – Random assignment of all geo regions within a country (e.g., DMAs); a consistent, robust, and scalable framework applicable across virtually all media channels.
Large-Scale User-Level Experiments – Randomizing millions of IDs; high internal validity but limited by cross-device fragmentation and low match rates, noisy for measuring small ad lifts.
Small-Scale Geo Experiments – Still randomized but less power, higher risk of idiosyncratic bias from a few markets.
Small-Scale User-Level Experiments – Feasible but often underpowered; generalizability limited.
Interrupted Time Series / Switchback Tests – Turning campaigns on/off repeatedly; can work if treatment effect is immediate and reversible, but vulnerable to temporal confounds.
Observational / Quasi-Experimental Analytics
Synthetic Controls – Weighted combination of control units to construct a “synthetic twin”; can provide credible counterfactuals but sensitive to donor pool and overfitting. (E.g., Meta's GeoLift R package.)
Bayesian Structural Time Series (BSTS) – Flexible probabilistic framework for counterfactual forecasting with covariates, trend, and seasonality; powerful but dependent on priors and specification choices. (E.g., Google's CausalImpact R package.)
Difference-in-Differences (DiD) – Compares changes in treated vs. untreated groups over time; intuitive and widely used but hinges on the parallel trends assumption.
Marketing Mix Models (MMM) – Longstanding regression framework using aggregate longitudinal data to estimate channel contributions; offers a holistic view of the mix but causal claims depend on assumptions unless validated by experiments.
Matched Market Tests – Selects untreated markets that resemble treated ones; easy to explain but prone to hidden bias.
Lookalike Controls (Propensity Score Matching or Machine Learning–adjusted cohorts) – Construct control groups matched on demographics, purchase history, or other observables; widely used in research panel-based lift tests but subject to hidden bias from unmeasured factors.
Exposed vs. Unexposed Comparisons – Naïve lift tests comparing those who saw ads to those who didn’t; easy to run but confounded by targeting bias.
Attribution Models (MTA, heuristic last-click, etc.) – Useful for exploring customer journeys, but correlation-based and not causal evidence.
Expert Judgment – Can inform hypotheses or priors, but not empirical evidence.
Black Box AI Optimization – Proprietary “incrementality” claims without transparency; credibility requires experimental validation.
Self-Reporting by Media Vendors – Metrics reported by the seller; rife with bias and not reliable for causal claims.
Popular Opinion / Conventional Wisdom – Lowest rung; anecdotal, not evidence.
How to interpret results from a randomized controlled experiment correctly
Screenshot of reporting from Central Control’s platform Experiment Designer
In the Research Wonks forum (recommended for all marketing analysts: visit researchwonks.com for details), I was asked my opinion on a thread titled "Does stat testing encourage the wrong decisions?" The heart of the question was how to interpret tests that aren't deemed quite "statistically significant," yet the profit value of the product is very high and purchase rates are very low (e.g., boats, solar panels, home owner's insurance).
Here is a modified version of my response, which is really a bit of an explainer on experimentation statistics themselves. As I noted in an earlier post in the forum, all of this pertains to well-constructed randomized controlled trials (RCT). If your definition of a test is relying on synthetic controls, matched markets or other quasi experimental methods, I'm much more skeptical of the results overall. More on that below.
Marketers often fixate on the 95% confidence level as the determinant of whether a test was “valuable” or not, when that's only one component of what makes an experiment result meaningful. It’s also often misunderstood.
Confidence level (CL) measures the likelihood that a result is a false positive — a Type I error. It’s not really saying “there was almost certainly a positive lift.” It’s saying that if you ran the test under the same conditions an infinite number of times, about 5% of the time you’d see a lift when in fact there wasn’t one. In other words, a false positive (Type I error).
The p-value of the test has an inverse relationship to the CL: a p-value under 0.05 indicates significance at the 95% CL.
It goes without saying that this foundation of statistics clearly describes an impossibility: the threshold of infinity notwithstanding, in advertising, test conditions are always unique, where each is a one-time snapshot of a combination of the creative, offer, audience, media, point in time, etc. But statistics is theoretical stuff. In a class on regression I once took, the professor described the concept of a “parent population,” being the theoretical infinite number of samples you could draw under the model, and the “daughter population,” which is the single sample you actually observed. The p-value, and most other statistical metrics, refer back to the likelihood that your daughter population correctly represents the “true” (theoretical) population.
Power describes the likelihood that your test will detect a positive result if one really exists. The common convention in experiment design is 80% power, which can feel low, since it’s answering the critical question: “Will I actually detect a lift if it’s there?” For experiments supporting high-stakes decisions, you might power for 90%. Unfortunately, power isn’t something you get as a marker in the results (like the p-value); it’s a design property, determined up front by inputs including sample size and the effect size you care about.
The maddening part is: with 80% power, if you fail to reject the null (i.e., no significant lift detected), there’s still a 1 in 5 chance that a true lift of the size you powered for was actually present, but your test missed it due to random chance, such as an unlucky draw of randomly assigned experiment units, ill-winds blowing from the south, etc. This is a Type II error, a false negative.
Which leads to three other important topics: confidence intervals, Bayesian thinking, and the value of an accumulated body of evidence.
Confidence intervals (CIs) are often neglected in superficial reporting of test results, but they’re crucial for interpretation. A CI shows the plausible range of effect sizes given your data, which bears on the original question in the Wonks forum: how to interpret significance in the case of high-value, low incidence transactions.
Ideally, results should look like: “Estimated lift = 5%, p = 0.04, 95% CI = +0.2% to +9.8%.” The p-value tells us the result just clears the 95% confidence level (so the false-positive risk is controlled at 5%). The CI tells us the range is fairly wide: the true lift could be as small as +0.2% or as large as +9.8%. So, even with a test that is "significant" at a 95% CL, the actual sales impact could vary widely, which is key for ROI. The measured effect size is really just a point estimate.
The formula is: CI = Estimate ± (Zα/2 × SE)
If you're using a 90% (or 80%) confidence level, and the lift is deemed positive but the p-value is > 0.05, then the 95% CI will straddle zero — meaning the data are consistent with no effect, a small positive effect, or even a negative effect. That’s why 95% CL is a useful rule of thumb, as a p-value of less than 0.05 assures the whole range of the CI will be greater than zero.
Beware the temptation to run one-tailed tests in ad experiments, where the presumption is the effect could only be positive. Ads can backfire. Think of the X10 camera from years ago — pioneer of the notorious pop-under ad format — that so annoyed people it arguably created negative lift. That logic also shows up in uplift modeling, where one audience segment is “sleeping dogs,” whose purchase likelihood can be harmed by advertising.
Bayesian analysis offers a more flexible interpretation than the rigid frequentist thresholds. Most MMM models today are Bayesian. In that mindset, even a p-value of 0.15 can be treated as evidence pointing toward a positive lift — not definitive, but suggestive, and potentially worth acting on depending on priors and expected ROI.
Finally, I always stress that the greatest confidence comes not from any one test but from a body of evidence built through repeated RCTs. In the hierarchy of evidence, the only thing stronger than a single well-run RCT is a meta-analysis of many of them. The closer you get to that “infinity of tests,” the more solid your knowledge becomes. Big advertisers, agencies, MMM shops, and publishers who accumulate 10s, then 100s, then 1000s of RCTs build a serious moat in their understanding of what really works across ad formats, CTAs, publishers, channels, product categories, and so on.
Of course, as I said at the top of this essay, all of these statistics are on shaky ground if your “testing” relies on synthetic controls, matched markets, propensity scores, machine learning, or other forms of quasi-experiments. Quasi-experiments can be useful if true RCTs are infeasible (rare in advertising) and/or if the expected effect size is very large. But if you’re chasing a 1-10% effect with a quasi-experiment, I wouldn’t bet the farm.
I wrote recently on LinkedIn that with enterprise AI systems poised to take over media planning, there’s a real danger those systems will train on quasi-experimental results — which pervade ROI measurement in advertising — and institutionalize bad learnings, creating a huge “knowledge debt” that will take years to unwind. Garbage in, garbage out.
Recommended further reading on the subject:
Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement (Gordon et al., 2022)
Predictive Incrementality by Experimentation (PIE) for Ad Measurement (Gordon et al., 2023)
Enterprise AI is coming, and it's about to learn all the wrong lessons about marketing effectiveness (my essay)
How to Design a Geographic Randomized Controlled Trial (a detailed, 50+ page whitepaper by Central Control, a big hit with experimentation experts)
NEW WHITEPAPER: How to Design a Geographic Randomized Controlled Trial
Download the whitepaper here (no registration required).
After years of advocating for better advertising measurement, I'm excited to share a comprehensive guide that doesn't just explain WHY geographic RCTs are superior for measuring true incremental ROI, it shows exactly HOW to implement them.
This 50+ page whitepaper includes:Step-by-step design frameworks for "Rolling Thunder" and other experiment designs
Detailed Python code examples for randomization, analysis, and power calculations
Statistical methodologies that deliver unbiased, transparent, replicable results
Implementation checklists to ensure experimental integrity
Real-world case studies demonstrating the approach in action
For too long, marketers have relied on observational and quasi-experimental methods like matched market tests, synthetic controls, and attribution models, which tend to systematically overstate performance. As I've written previously, these approaches might be "customer pleasing," but they're leading to billions in misallocated ad spend.
The truth is that geographic RCTs aren't comparatively difficult or expensive to implement, they're just unfamiliar to many practitioners. This guide demystifies the process and provides everything you need to start measuring true incremental impact.
Download the whitepaper here (no registration required).
If you're responsible for major advertising investments and want to know what's really working, this is your roadmap to better measurement and better results. For questions, training or support implementing these techniques, please reach out. I'm always happy to discuss how this methodology can transform your measurement approach and marketing effectiveness.
#MarketingMeasurement #Incrementality #ExperimentalDesign #AdvertisingEffectiveness #DataScience #iROAS
Enterprise AI is coming, and it's about to learn all the wrong lessons about marketing effectiveness
🚨 TL;DR: Most advertisers will train AI systems on flawed campaign measurement data, risking a generation of misinformed media planning.
By Rick Bruner, CEO, Central Control
Last week's I-COM Global summit was, once again, a highlight of the year for me. While its setting is always spectacular (Menorca was no exception) and the fine food and wine certainly helped, what truly distinguishes I-COM is the quality of the discussions, thanks to the seniority and expertise of attendees from across the ad ecosystem.
The dominant theme this year: large organizations preparing for enterprise-scale AI. We heard from brands including ASR Nederland, AXA, Bolt, BRF, Coca-Cola, Diageo, Dun & Bradstreet, Haleon, IKEA, Jaguar, Mars, Matalan, Nestlé, Reckitt, Red Bull Racing, Sonae, Unilever, and Volkswagen about using AI to optimize virtually every facet of marketing: user attention, content creation, creative assets, CRM, customer engagement, customer insights, customer journey, data governance, personalization, product catalogs, sales leads, social media optimization, and more.
Two standout keynotes, by Nestlé's Head of Data and Marketing Analytics, Isabelle Lacarce-Paumier, and Mastercard's Fellow of Data & AI, JoAnn Stonier, focused on a critical point: AI’s success hinges on the quality of its training data. Every analyst knows that 90% of insights work is cleaning and preparing the data.
The situation couldn’t be more urgent. My longtime friend Andy Fisher, one of the few industry experts with more experience running randomized controlled experiments than I, pointed out that most companies still don't use high-quality tests to measure advertising ROI. As a result, they’re about to embed flawed campaign conclusions into AI-driven planning tools, creating a knowledge debt that could take decades to rectify.
As I’ve written here recently, most advertisers still rely on quasi-experiments at best, or decade-old attribution models at worst. Even today’s favored quasi-methods — synthetic controls, debiased ML, stratified propensity scores — offer only the illusion of experimental rigor, often delivering systematically biased results.
By contrast, randomized controlled trials, especially large-scale geo tests, remain the most reliable evidence for determining media effectiveness. Yet they’re underused and underappreciated.
Why? Because quasi-experiments are “customer pleasing,” as an MMM expert recently put it. They skew positive, so much so that Meta now mandates synthetic control methods (via its GeoTest R package) for official experimentation partners, one told me the other day, because the results are reliably favorable to Meta's media.
That might be fine if those tests stayed in the drawer as one-off vanity metrics. But with enterprise AI, they won’t. They’ll be unearthed and fed, by the dozens or hundreds, into new automated planning systems, training the next generation of tools on false signals and leading to years of misallocated media spend.
The principle is simple: garbage in, garbage out.
Saying goodbye to the Mediterranean for another year, all the Manchego and saffron bulging in my suitcase couldn't soothe my unease about the future of marketing performance. It’s time to standardize on real evidence. Randomized experiments should be the norm, not the exception, for ROAS testing. The future of AI-assisted media planning depends on it.
Gain Market Share by Disrupting Bad Ad Measurement
"It is difficult to get a man to understand something when his salary depends upon his not understanding it."
- Upton Sinclair
Do you ever get the feeling that your advertising performance metrics are mostly baloney? If so, this article is for you.
The advertising industry desperately needs brave, skeptical individuals willing to demand better evidence of what actually works—from media partners, agencies, measurement firms, and industry bodies.
But does anyone really care? Currently, over 90% of media practitioners either don't grasp the shortcomings of their current practices, have conflicting incentives, or are content with CYA reporting theater, leading to poor investment decisions.
Can you trust big tech companies like Google and Meta to tell you what's effective? After all, they've got those smarty-pants kids from MIT and Stanford, right? Surely you jest. Do you really expect them — who already receive most of your budget — to suggest you spend more on Radio, Outdoor, or Snap? Read the receipts.
What about your ad agency — isn't that their job? Actually, no. Agencies get paid based on a percentage of ad spend, so why would they ever suggest spending less? And if they're compensated on a variable rate, e.g., 15% of programmatic spend vs. 5% of linear, it's no surprise they're steering budgets toward easier and higher-margin channels.
How about your in-house analytics experts? Well, first they'd need to imply they've been measuring iROAS poorly for years. Awkward. The head of search marketing, whose budget might shrink if real measurement proved poor performance? Unlikely. The CMO? Only if she's new and ready to shake things up—your best hope for disruption.
As Upton Sinclair aptly wrote, "It is difficult to get a man to understand something when his salary depends upon his not understanding it."
We're calling on:
Private equity firms seeking hockey-stick growth
CFOs skeptical of ad metrics, viewing marketing as a cost center
Media sellers, large and small, being strangled by Google and Meta’s duopoly
Mid-tier DTC brands not yet hypnotized by synthetic control methods or captive to big tech
Today's "performance" measurement is largely flawed, built on bad signals, lazy modeling, and vanity metrics. The consequences are poor decisions, wasted budgets, and lost market share.
Avoid these unreliable measurement approaches:
Matched market testing
Synthetic control methods
Attribution models
Quasi-experiments
Black-box "optimization" solutions like PMax and Advantage+
None of these offer solid evidence. Some might be "better than nothing," but that's hardly the standard you want for multimillion-dollar decisions. At worst, they're self-serving illusions from platforms guarding their own margins, not advertiser interests.
Our profession embraces these weaker standards when better measurement is easily attainable. Marketing scientists have never met a quasi-experimental method they don't like — dense statistics are so much fun! But randomized controlled trials (RCTs) — deemed the best evidence of causality by science, with straightforward math — are falsely labeled as "too hard."
RCTs remain the only reliable method to isolate causal impact and identify true sales lift. Claims about their complexity or expense are myths perpetuated by those benefiting from the status quo.
Today's trendiest method, Synthetic Control Method (SCM), is essentially matched market testing (DMA tests familiar since the 1950s) boosted by statistical steroids. You pick DMAs to represent a test group (the first mistake) and construct a Frankenstein control from a weighted mashup of donor DMAs.
SCM excels when an RCT truly isn't possible — like assessing minimum wage impacts, gun laws, or historical political policies such as German reunification. These scenarios can't ethically or practically be randomized. But advertising? Advertising is perhaps the easiest, most benign environment for RCTs. Countless campaigns run daily, media is easily manipulated, outcomes (sales) are straightforward, quick to measure, and economically significant. There are few valid reasons not to conduct RCTs in advertising.
The first rule of holes is "Stop digging." The first rule of quasi-experiments is "Use them when RCTs are unethical or infeasible." That's almost never true for ad campaigns.
Synthetic Control Method is problematic because it:
Offers weaker causal evidence than RCTs
Is underpowered for ad measurement: the academic literature on SCM cite social policies resulting in effect sizes >10%, much greater than large marketers can expect from advertising
Lacks transparency (requires advanced statistical knowledge)
Is not easily explainable to non-statisticians
Lacks replicability (each instance is a unique snowflake dependent on many choices)
Lacks generalizability (blending 15 DMAs to mirror Pittsburgh still doesn't reflect national performance)
I've yet to hear a compelling reason for choosing SCM over RCT for ad measurement that wasn't self-interested rationalization. For more on the how, why and when to use geo RCTs, see my previous essay in this series.
As a global economic downturn looms, many advertisers will unwisely slash budgets without knowing what's genuinely effective. Don't make mistakes that could threaten your company's future. Measure properly — cluster randomized trials are your best path to true advertising ROI.
Advertisers seeking accurate ROAS should use large-scale, randomized geo tests
Multi-armed Cluster Randomized Trial design using DMAs for national ad campaign
The smallest Fortune 500 companies have revenue on the order of $10 billion, meaning they are likely spending at least $1 billion on paid advertising. If your company is spending hundreds of millions on advertising, there is no excuse for optimizing ROI with half measures. Yet that is exactly what most companies do, relying on subpar techniques such as attribution modeling, matched market tests, synthetic control methods and other quasi-experimental approaches.
Quasi-experiments, by definition, lack random assignment to treatment or control conditions. The first rule of quasi-experiments, as the methodological literature consistently makes clear, is that they should be used when randomized controlled trials (RCTs) are "infeasible or unethical."
As Athey and Imbens (2017) state: "The gold standard for drawing inferences about the effect of a policy is a randomized controlled experiment. However, in many cases, experiments remain difficult or impossible to implement, for financial, political, or ethical reasons, or because the population of interest is too small."
For advertising measurement, it's rarely unethical to run an RCT, and it's almost always feasible. Decisions to opt for lesser standards, say the bronze or tin of quasi-experiments or attribution, usually stem from a lack of understanding of how significantly inferior those methods are for causal inference compared to RCTs, or misplaced priorities about the perceived cost of high-quality experiments.
As for the general accuracy of quasi-experimental methods, many practitioners assume they provide at least "directionally" correct results, but in reality such results can often be misleading or inaccurate to a degree that's difficult to quantify without benchmark randomized trials.
The seminal paper "Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement" (Gordon et al., 2022) demonstrated this. It took some 600 Facebook RCT studies and reanalyzed their results using double/debiased machine learning (DML) and stratified propensity score matching (SPSM), among the most popular forms of quasi-experiments, the types used by many panel-based ad measurement companies. In keeping with the journalistic adage that the answer to any headline posed as a question is “no,” the researchers found the quasi-experimental approaches were "unable to reliably estimate an ad campaign's causal effect."
Regarding the perceived cost of large-scale experiments: The primary cost to advertisers isn't in conducting rigorous RCTs, but rather in the ongoing inefficiency of misallocated media spend. For companies investing hundreds of millions annually in advertising, the opportunity cost of suboptimal allocation—both in wasted dollars and unrealized sales potential—can substantially outweigh the investment required for proper experimental design.
Witness Netflix, which, almost 10 years ago, assembled a Manhattan Project of foremost experts in incrementality experimentation. Their cumulative RCT findings led them to eliminate all paid search advertising. This counterintuitive but data-driven decision would likely never emerge from attribution modeling or quasi-experimental methods alone, highlighting the unique value proposition of rigorous experimental practice.
In terms of the best type of experiments for advertising effect, there has been a growing consensus among practitioners that geographic units are more reliable today than user, device or household units. For years, the myth of the internet's one-to-one targeting abilities gave rise to the belief that granular precision was synonymous with accuracy. While this was never really true, the rise of privacy concerns and the deterioration of granular units of addressability now mean that user-level experiments are less accurate than tests based on geographic units such as designated market areas (DMAs). The irony is that identity is not necessary for deterministic measurement.
Beyond privacy problems and the need for costly technology and intermediaries in the form of device graph vendors and clean rooms, the match rates of user-level units between the media where they run and the outcome data, e.g., sales, of the dependent variable are usually too weak for experimentation, e.g., below 80%. Even at higher match rates, measuring accurately for typical ad sales effects of around 5% or less is unachievable due to spillover, contamination and statistical power considerations.
Postal codes would be an ideal experiment unit, given that they number in the thousands in most countries, a level of granularity that fortifies the distribution of unobserved confounding variables and is a large enough base of units to measure small effect sizes. But, as I argued in an AdExchanger article a few months ago, publishers seem unwilling to make the necessary targeting reforms to make them a viable experimental unit to fix the morass of digital advertising measurement.
That leaves DMAs, which thankfully work well in the large US market as an experiment unit for national advertisers willing to run large-scale tests with them. Geographic experiments broadly fall into a class of RCT known as cluster randomized trials (CRT), where the human subjects of the ad campaigns are clustered by regions. A key benefit of geo experiments is that they correspond to ZIP codes already in the transactional data in many advertisers' first-party CRM databases and third-party panels, enabling researchers to read the lift effect without any data transfer, device graphs, clean rooms or privacy implications. And, importantly, no modeling required.
Although there are only 210 DMAs in the US, and they vary widely by population size and other factors, collectively they represent a population of roughly 350 million people. Randomization is the most effective way to control for biases and unknown confounders among those varying factors and deliver internal validity to test and control group assignments.
A parallel CRT, with one test and one equal sized control group based on all 210 DMAs, is the most statistically powerful kind of design, especially appropriate for testing a medium not currently or regularly in an advertiser's portfolio.
For a suppression test, where the marketer seeks to validate or calibrate the effect size of a medium where its default status is always-on advertising in a given channel, such as television or search, a stepped CRT design, where ads are turned off sequentially across a multi-armed experiment, is a good option. The stepped approach allows the advertiser to ease into a turn-off test and monitor sales, such that the test can be halted if the impact is severe. They also require less than 50% of media weight to be treated, albeit at the sacrifice of statistical power: the illustration here shows a design using multiple test and control arms that put only 18% of the total media weight into cessation treatment.
Multi-armed Rolling Stepped Cluster Randomized Trial design, withholding only 18% of media weight
Stratified randomization, statistical normalization of DMA sales rates, and validation of the randomization on covariates such as market share and key demographic variables can be incorporated into the design to strengthen confidence in the results. Ultimately, the best evidence of true sales impact is not a single test but a regular practice of RCTs, in complement with other inference techniques, such as market mix modeling and attribution modeling, so-called unified media measurement, or what we call MPE: models plus experiments.
This kind of geographic experimentation should not be confused with matched market tests (MMT) or its modern counterpart synthetic control methods (SCM). Matched market testing has been used for decades by marketers, and its shortcomings are well understood in the inability for one or a few DMAs in control to replicate the myriad exogenous conditions in the test that could otherwise explain sales differences between markets, such as differing weather conditions, supply chains and competitive mixes, to name a few.
SCM has gained a loyal following in recent years as a newer approach to the same end, using sophisticated statistical methods to improve the comparison of a few test and control geographies. The approach, however, is still fundamentally a quasi-experiment, subject to the inherent limitations of that class of causal inference, namely its reduced ability to control for unobserved confounds, limited generalizability to a national audience and comparatively weaker statistical power, based on a few DMAs compared to all DMAs in the case of large-scale randomized CRTs. While there's ongoing work to improve SCM, I haven't seen any methodological papers claiming SCM to be more reliable than RCT, despite claims to that effect from some advertising measurement practitioners.
As Bouttell et al. (2017) state, "SCM is a valuable addition to the range of approaches for improving causal inference in the evaluation of population level health interventions when a randomized trial is impractical."
Running tests with small, quasi-experimental controls is a choice, not a necessity for most advertising use cases. True, no one dies if an advertiser misallocates budget, as can be the case with health outcomes, where clinical trials (aka RCT) are often mandated for claims of causal effect. But companies lose market share, and marketers lose their jobs, particularly CMOs. When millions or billions in sales are at stake, cutting corners in the pursuit of knowing what works can be ruinous to a brand's performance in a market where outcomes are measurable and competition is zero sum.
Recession-Proofing Your Ad Mix: Measure Twice, Cut Once
The hedge fund guru Warren Buffett, speaking of naïve investment strategies, famously said, “Only when the tide goes out do you discover who's been swimming naked.”
The same could be said for marketers whose paid media investments rely on simplistic ROAS measurement.
Is a recession coming? No one can say for sure. A couple of years ago, most economic pundits thought so. Historically, recessions occur every 6–10 years, and it’s been 16 years since the last real one. The new administration’s shock-and-awe economic policies have people wondering again.
Would your organization be ready? When corporations shift into austerity mode, research departments are the first to go, followed by deep marketing budget cuts. But what would you cut?
Many marketers worry about "waste" in ad spending. Our work with clients suggests a far greater risk: cutting the wrong things—the investments actually driving the highest sales ROI. Faulty measurement often blinds leadership to what truly works.
Consider one case study: We worked with a Fortune 100 brand to measure the effectiveness of one of their biggest digital media channels. The long-time CMO had lost faith in their attribution analytics—and rightly so. He believed they were overpaying and wanted to slash the channel’s $100 million annual budget in half. But, wisely questioning his assumption, he brought us in.
Long story short, we found that the channel drove 3% of all new client acquisitions—but not where they expected. The impact was entirely through their offline sales channels, which accounted for the majority of their business. Yet all their attribution tracking was focused on online conversions.
Had they cut that $50 million in ad spend, they would have jeopardized $1.5 billion in sales. Only our rigorous, randomized experiment-based testing—leveraging anonymized first-party CRM sales data—revealed the truth. The digital media giants they were paying simply had no visibility into this.
Attribution models, clicks, propensity scores, quasi-experiments, synthetic controls, lab tests—I wouldn’t bet the farm on any of them.
Sometimes, the best wisdom comes from time-tested axioms, like the one passed down by carpenters and tailors: “Measure twice. Cut once.”
Don’t short-change your measurement. Jobs—yours included—and even your company’s survival are the stakes.
Media Companies’ Survival Depends on More Accurate ROAS Measurement
Guideline Spend is a census of all media dollars transacted through the biggest 12 agencies in the US market.
Do you hear that giant sucking sound? That’s the advertising economy draining billions into the bank accounts of Google, Meta, and Amazon
Why do advertisers keep shifting more and more of their budgets to these digital giants, year after year, at the expense of other media companies? Sure, they have eyeballs and engagement, but that’s only part of the story.
If this were a (very lousy) game show—Advertising Family Feud—the top answer would be obvious: “Measurability!” Ding, ding, ding!
For most media companies outside the Big Three, proving their ROI has been historically undervalued due to flawed ad performance measurement is now an existential priority. Engagement metrics, attention scores, brand impact studies, attribution models, and quasi-experimental lift reports won’t cut it. Neither will claims about “quality audiences” or other vague promises.
Even mediocre marketers are skeptical of these metrics. The smartest ones outright dismiss them as irrelevant to what matters most: incremental sales impact.
So, how can media companies bridge the gap and prove their value?
The Case for Geographic Experimentation
The easiest and most effective way to deliver what advertisers need—transparent and credible proof of ROI—is by implementing sound geographic experimentation. Better yet, develop an in-house center of excellence around this approach.
Big Digital has spent years perfecting its tools for “proving” (and often feigning) ad performance. These include:
Marketing Mix Modeling (MMM) tools like Google’s Meridian and Meta’s Robyn.
Auction-based randomized experiments, also known as “ghost ads,” which Apple’s privacy measures have impaired in recent years.
Black-box AI optimization engines like Google’s Performance Max and Meta’s Advantage+, which are difficult to validate independently.
These tools make it easy for advertisers to spend more, while other media companies struggle to offer competitive proof of performance. Resistance to Big Digital often feels futile—but it doesn’t have to be.
TV, Radio, and Outdoor: Still Undervalued
It’s likely that TV ROI is undervalued. The same goes for traditional channels like radio and outdoor, which still capture attention just as effectively as ever. But without credible evidence of their true impact, they’ve largely fallen off the marketing mix radar.
And what about other digital properties? Are they outperforming Meta and Google? With today’s ROAS measurement practices, it’s anyone’s guess.
DMA Testing: A Path Forward
DMA-based experiments could be the best hope for media companies looking to prove their value. As discussed in my recent AdExchanger article, ZIP-code-based targeting could take things a step further, but it would require heavier lifting by media companies, and I’m not holding my breath.
Large-scale, randomized DMA experiments are emerging as a powerful, scalable method for ROAS measurement. By randomizing all 210 DMAs into test and control groups, advertisers can conduct transparent, replicable tests that provide irrefutable ROI evidence. Compared with user-level experiments, large-scale geographic experiments are far easier to implement, much less costly, and more accurate, due to noise and matching-biases in user-level frameworks.
This was a key topic in our recent webinar with MASS Analytics, an MMM specialist, describing how MPE (Models Plus Experiments) is a growing practice to calibrate the accuracy of mix models with regular ROAS experiments.
The Bottom Line
Media companies that adopt these rigorous testing methodologies, or even make them simple for advertisers to implement independently, have their best shot at countering claims of superior performance Google, Meta, and Amazon. If your channel truly adds value, transparent experiments will prove it—turning your “measurability gap” into a competitive advantage.
In a world dominated by Big Digital, investing in accurate ROAS measurement isn’t just a strategy—it’s a survival necessity.
Post Script
Thanks to Guideline for permission to share this data.
As Josh Chasin and Susan Hogan pointed out in the comments of this post on LinkedIn, that 29% share of spending shown in the chart is an even smaller share than the Big Three command of the total ad market, as Guideline's data includes only the spending of the largest ad agencies. Even more of the total goes to the biggest players thanks to the "long tail" of the market that does not go through the biggest agencies (i.e., through smaller agencies, large advertisers in-housing, and smaller advertisers buying direct).
Thankfully, large agencies still diversify their budget across more publishers compared to smaller advertisers, but even there the trend is not promising for the majority of media companies.
Calibrating Mix Models with Ad Experiments (Webinar Video)
Click here to view this webinar video
The next big wave in advertising ROI analytics is here! 🚀
After MMM and MTA comes MPE: Models Plus Experiments, a powerful approach that combines the strengths of modeling with the precision of experiments to supercharge your ROI insights.
View the video of our LinkedIn Live webinar, attended by 160 professionals as three renowned experts in the field delve into the details and best practices for leveraging MPE to drive smarter, more actionable marketing decisions:
🎓 Dr. Ramla Jarrar, President of MMM specialist MASS Analytics
🧮 Talgat Mussin, geographic experiment guru from Incrementality.net
📈 Rick Bruner, CEO of Central Control, Inc., expert on experiments for advertising impact
Five Forces That Could Further Roil the Ad Industry in 2025
I’ve always been a big booster of the ad industry—its role in funding free speech (i.e., media outlets) and its multiplier effect on GDP by driving sales. For decades, I’ve encouraged young people to pursue great careers in this field.
Lately, I’m not so sure. It would be nice to start the new year with an upbeat post, but this is what’s on my mind instead.
Last year saw Google and Meta, the two biggest media companies, shrink their ad-related workforces, while layoffs have become numbingly common among traditional media, agencies, and many digital platforms.
Some of this reflects over-hiring during the ad bubble of 2021, but there are macro trends—AI, programmatic buying, and higher interest rates pushing profitability—that aren’t going away anytime soon.
As we enter 2025, here are five events and trends that could spell continued challenges for professionals in the field.
1. Omnicom / IPG Merger
And then there were five. Of the biggest agency holding companies—Publicis, Dentsu, WPP, Havas, Omnicom, and IPG—only Publicis is thriving. When adjusted for inflation, the collective spending of these media-buying groups on advertising has shrunk since 2017, even as Fortune 1000 revenues have grown significantly relative to inflation.
Why the disconnect? Advertising is supposedly more efficient now, driving more sales for less money, but that seems an overly optimistic explanation. More likely, it’s due to brands in-housing budgets, shifting dollars to boutique agencies for niches like retail media and podcasting, and the downward pricing pressure of programmatic exchanges.
One thing you can count on from this kind of consolidation is job cuts.
2. TikTok’s Peril
President Trump has asked the Supreme Court to delay the January 19 deadline for banning TikTok, but his whimsies don’t inspire job security.
TikTok’s U.S. staff is surprisingly small—under 10,000—but highly efficient. If those employees hit the job market suddenly, it could flood an already competitive field with more A-listers on the heels of all those laid-off Google and Meta experts.
Beyond that, TikTok plays a unique role in the ad ecosystem, supporting hundreds of thousands of creators and millions of niche advertisers. Its closure would ripple across the economy.
3. TV Networks on the Chopping Block
Comcast and Warner Bros. Discovery are moving several networks into new business units to make their core companies more attractive to investors. Paramount is reportedly considering similar moves.
With Netflix, Amazon, and Apple spending heavily on hit shows, movies, and live sports, the pressure on traditional networks to maintain ad share is intense.
4. Pharmaceutical Advertising Under Threat
President Trump’s pick for health secretary, Robert F. Kennedy Jr., wants to ban prescription drug ads on TV. Few other countries allow such ads. Legal challenges may prevent the ban. But anything seems possible these days. And, they did it to Big Tobacco, so there is a kind of precedent.
Pharma is one of the biggest ad categories after CPG, particularly for linear TV. Losing it would be a near-extinction event for some major media companies.
5. The Rise of AI-Driven ROI
To me, the most worrying trend is advertisers trusting platforms like Google’s Performance Max and Meta’s Advantage+ to spend ad dollars “efficiently” through black-box algorithms.
Say it out loud: “The biggest media companies will use secret formulas to spend my ad money most wisely.” That doesn’t sound fishy to you?
It’s like the sheep appointing the wolves their legal guardians.
The giveaway that this is a scam is they don’t even allow a backdoor for fair testing, such as an option to run geo experiments on top of the algos to independently confirm the so-called performance or advantage. No “trust, but verify”—only “trust us.”
Of course, it's no coincidence that the rise of this kind of auto-magic, self-reported ROI comes after years of agencies and smaller media companies cutting their own research departments to near zero.
But, aside from all that, Happy New Year! I’d love to hear comments about what I got wrong here and what silver linings I should be focusing on instead.
Meanwhile, if you want to figure out what’s really working in the mix, give us a call.
The only real ROAS is experiment-based, incremental ROAS
The only real return on ad spend (ROAS) is experiment-based, incremental ROAS.
Accept no quasi-experimental imitations.
ZIP Codes: The Simple Fix For Advertising ROI Measurement
(As published in AdExchanger.)
By Rick Bruner, CEO, Central Control
One of the hottest trends in advertising effectiveness measurement, especially with privacy concerns killing user-level online tracking, is geographic incrementality experiments. These experiments are cost-effective, straightforward and reliable, if done right.
Geo media experiments typically use large marketing areas, such as Nielsen’s Designated Market Areas (DMAs). Unlike traditional matched market testing, this modern approach involves randomizing DMAs, ideally all 210, into test and control groups. This way, advertisers with first-party data can measure true sales lift in house without external services. For those lacking in-house sales data, third-party panels, such as those from Circana and NielsenIQ, offer alternatives compatible with this kind of test design.
High-quality, randomized controlled trials (RCTs) – akin to clinical trials in medicine – are the best source of evidence of cause-and-effect relationships, including advertising’s impact on sales.
Statistical models, including synthetic users, artificial intelligence, machine learning, attribution, all manner of quasi-experiments and other observational methods are faster, more expensive and less transparent forms of correlation – not measurement of causation. They may be effective for audience targeting, but they are not for quantifying ROI.
Imagine, however, the potential for conducting geo experiments using ZIP codes instead of DMAs.
Targeting with ZIP codes
An advantage to DMAs is that they are universally compatible with all media types. ZIP codes, on the other hand, pose challenges to experiments in digital media. Targeting with ZIP codes online often relies on inference from IP addresses, which is unreliable and increasingly privacy-challenged. Geo-location signals from mobile devices also contribute ZIP codes to user profiles, which is bad for experiments, as a single device/account can be tagged with multiple ZIP codes based on where the user has recently visited.
A key to the reliability of this kind of geo experiment is ensuring that the ZIP codes used for randomized media exposures match the ZIP codes where audience members receive their bills, as recorded in company CRM databases. Each device and user should be targeted by only one ZIP code: their residential one.
To adopt this technique, media companies can take two transformative steps:
Use primary ZIP code targeting: Major players like Google and Meta already collect extensive user data, often appending multiple ZIP codes to a single device. For experiments, these companies should offer a “primary” zip code targeting option, based on the user’s profile or most frequently observed ZIP code for their devices.Implement anonymous registration with ZIP codes:
Publishers should require registration to access most free content, offering an “anonymous” account type that doesn’t require an email address. Users would provide a username, password and home ZIP code, enabling publishers to enhance audience profiles while maintaining user anonymity.
These strategies would significantly improve ROI measurement, offering a more powerful and simpler mechanism than cookies or other current alternatives. Unlike cookies, which were always unreliable for measuring ROI, these methods provide a privacy-centric, fraud-resistant solution that doesn’t require complex data exchanges, clean rooms, tracking pixels or user IDs.
Industry bodies like the IAB, IAB Tech Lab, MMA, ANA, MSI and CIMM should advocate for this approach, which would revolutionize advertising incrementality measurement.
With over 30,000 addressable ZIP codes compared to 210 DMAs, the potential for greater statistical power and more reliable ROI measurement is immense. As Randall Lewis, Senior Principal Economist at Amazon, told me, “the statistical power difference between user IDs and ZIP codes in intent-to-treat experiments can be small, with the right analysis methods.”
Adopting this approach would mark a significant leap forward, making high-quality experiments more accessible and reliable than ever before, ensuring a privacy-pure and fraud-proof approach to measuring advertising effectiveness.
The Future of Measuring Advertising ROI is "MPE": Models Plus Experiments
Marketing ROI measurement is going through a generational transformation right now. Following MMM (marketing/media mix models in the ‘90s), MTA (multi-touch attribution in the 2000s) now comes a new emerging best practice I call “MPE”: Models Plus Experiments.
MPE refers to a process of constant improvement to existing marketing ROI models, particularly MMM, by fine tuning model assumptions and coefficients with a practice of regular experiments for measuring incremental ROI.
The Gold Standard Is Not a Silver Bullet
Scientists consider the type of experiment known as a “randomized controlled trial” (RCT) to be the “gold standard” for measuring cause and effect, but the approach is not a silver bullet for advertisers.
Equivalent to “clinical trials” for proving efficacy in medicine – where outcomes of a test group are compared against those of a control group, where, critically, the test and control groups are assigned by a random process before the intervention of the experiment – RCT has a reputation among advertisers as being difficult to implement.
That concern is overstated, however. Running good media experiments is a lot easier than most advertising practitioners think. Industry measurement experts have made a lot of progress in the past decade, including the revolutionary ad-experimentation technique known as "ghost ads," which is widely deployed by most biggest digital media companies, generally for free. Central Control has also recently introduced a new RCT media experiment design, Rolling Thunder, which is simple to implement and can be used for many types of media by many types of advertisers.
Running a good experiment is certainly far easier than building a complex statistical model to try to explain what is driving the best ROI in the mix – an approach many advertisers spend much time and money on.
Another legitimate criticism of experiments is that it is hard to generalize what works in the mix from a single test. True enough. Experiments offer a snapshot of the effect of one campaign, at a point in time, in select media, with a given creative, a particular product offer, specific campaign targeting, and so forth. But, which of those factors mattered most to driving that lift?
In general, to assess that, you need to run more experiments.
According to the “hierarchy of evidence,” the ranking of different methods for measuring causal effect (from which comes the idea that RCT is the “gold standard”), the only practice that regularly outranks an RCT is a meta-analysis of the results from lots of RCT studies.
Think of a large set of benchmarks of advertising experiments, scored by the various factors within the control of advertisers, such as ad format type, publisher partners, media channels, and so on. Such a system of analysis is easily within reach of any large advertiser (or publisher or agency) that routinely executes lots of experiments. For marketers working in a Bayesian framework, the lifts measured by lots of similar experiments become ideal “priors” for estimating effect the effect of future campaigns.
The shrewdest advertisers are increasingly adopting this practice, dubbed “always-on experiments.”
The Best Models Are Wrong, But Useful: Experiments Make Them Better
But even such an RCT benchmark doesn’t take the place of a good model. As they say, all models are wrong, but some are useful. Models are good for the big picture, zooming in and out to different degrees of granularity about how the mix is understood to work. They provide forecasting and scenario-planning capabilities, cost/benefit trade-off planning, simple summaries for strategic planning and other merits that won’t be supplanted by practicing regular ROI experiments.
Regular experiments are, however, a critical missing factor in the ROI analysis for too many advertisers. Experiments enable analysts to make the models better by recalibrating assumptions in their models with better evidence.
That is what I mean by “Models Plus Experiments”: honing coefficients in MMM and MTA models through the practice of always-on experimentation. (And, when I say “experiments,” I specifically mean RCT.)
MediaVillage: Coca-Cola Leads Brave New World for Marketers
Nothing like this year has ever happened before. So many companies are down double digits in 2020 YOY revenues that CEOs must rapidly do something different or risk termination.
By Bill Harvey and Rick Bruner
Nothing like this year has ever happened before. So many companies are down double digits in 2020 YOY revenues that CEOs must rapidly do something different or risk termination.
Doing something different has not been Standard Operating Procedure (SOP) for a long time.
Marketing Mix Modeling (MMM) turned out to be the essential for C Suites over the past three decades in rationalizing their marketing decisions. MMM is not conducive to doing anything rapidly, nor differently. "Bayesian Priors" is a fancy name many marketing modelers use to mean "last year's MMM results", meaning that new MMM analyses are anchored to year-ago MMMs. MMM is a methodology that encourages tweaking around the edges rather than doing anything differently.
Which is not to say discard MMM. However, it is not going to be very useful right now in the crisis we are in. All of the slowly accumulated benchmarks and baselines are out the window. The 104-week gestation period is inimical to companies' survival.
Are we being dramatic? Obviously not, based on a quick study of what Coca-Cola is doing. A complete reorganization globally, 200 brands on the chopping block, the 200 brands being kept reorganized into a new reporting structure designed to "bring marketing closer to the consumer" with a new emphasis on innovation.
The NY Times quotes Coke's CEO, James Quincey, saying the company is "reassessing our overall marketing return on investment on everything from ad viewership across traditional media to improving effectiveness in digital."
Forbes: Randomized Control Testing: A Science Being Tested To Measure An Advertiser’s ROI
The ability to measure the impact advertising has on sales has been the holy grail for marketers, agencies and data scientists. It dates back to John Wanamaker (1838-1922), who has been credited with the adage, “Half the money I spend on advertising is wasted; the trouble is I don't know which half”. Since then, with more ad supported media choices and more brands, consumers have been bombarded by ad messages daily. Meanwhile, advertisers continue to seek better ways to measure the sales impact from their media campaigns.
The ability to measure the impact advertising has on sales has been the holy grail for marketers, agencies and data scientists. It dates back to John Wanamaker (1838-1922), who has been credited with the adage, “Half the money I spend on advertising is wasted; the trouble is I don't know which half”. Since then, with more ad supported media choices and more brands, consumers have been bombarded by ad messages daily. Meanwhile, advertisers continue to seek better ways to measure the sales impact from their media campaigns.
Attempting to measure the return-on-investment (ROI) of an ad campaign is not new. In the 1990s some prominent advertisers, including Procter & Gamble, began using media mix modeling (MMM) to measure the ROI of their ad campaigns. Although the media landscape was far less complex than today, MMM had several inadequacies including; not enough data points and it did not factor either the consumer journey or brand messaging. Nonetheless, MMM remained an industry fixture for many advertisers and agencies. Other marketers used recall and awareness studies as a substitute for sales to measure the advertising impact.
In July, the Advertising Research Foundation (ARF) began a new initiative called RCT21, (RCT stands for Randomized Control Testing), the latest iteration of measuring an ad campaign’s impact on sales. In the press release the ARF stated they “Will apply experimentation methods to measure incremental ROI of large ad campaigns run across multiple media channels at once, including addressable, linear TV and multiple major digital media platforms.”
ARF Insights Studio - RCT21: Advancing Cross-Platform ROI Analysis
The ARF, in collaboration with 605, Central Control and Bill Harvey Consulting, is championing a research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
The ARF, in collaboration with 605, Central Control and Bill Harvey Consulting, is championing a research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
ARF Announces Initiative To Advance Cross-Platform ROI Analysis Through Application Of Randomized Control Trial Measurement
The ARF, in conjunction with 605, Central Control and Bill Harvey Consulting, will identify best practices to help advertisers better pinpoint the contributing factors to digital and TV campaign effectiveness
The Advertising Research Foundation (ARF), the industry leader in advertising research among brand advertisers, agencies, research firms, and media, today announced a collaborative research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
The ARF today announced a collaborative research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
ARF, New York - Businesswire, July 14
[Subheadline] The ARF, in conjunction with 605, Central Control and Bill Harvey Consulting, will identify best practices to help advertisers better pinpoint the contributing factors to digital and TV campaign effectiveness
The Advertising Research Foundation (ARF), the industry leader in advertising research among brand advertisers, agencies, research firms, and media, today announced a collaborative research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
The ARF today announced a collaborative research initiative designed to establish methods for applying randomized control testing (RCT) to cross-platform advertising impact analysis.
The proof-of-concept study, named RCT21, will apply experimentation methods to measure incremental ROI of large ad campaigns run across multiple media channels at once, including addressable linear TV and multiple major digital media platforms.
The ARF is conducting the research in collaboration with 605, Central Control and Bill Harvey Consulting. A select number of national advertisers are also being recruited to join in the project.