Skip to main content

Perfecting the art of data storytelling

Person touching graph chart and data on digital virtual screen analyze planning business and financial growth up concept.
Image Credit: Unya-MT/Getty

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Data is a powerful storytelling source. We often uncover new truths, undiscovered trends, even potential warnings when using data to tell a story. But as powerful as data can be, it can also be misleading and filled with bias.  

Here’s what experience teaches about the pitfalls of biased data analysis, and how we can protect ourselves as researchers and analysts: 

Focus on the context around the data set 

Data sets can tell a wide variety of stories, but accurate stories are achieved only when context is considered. Analyzed against a particular backdrop, data can tell a specific tale. But if that backdrop inaccurately reflects the captured data, the resulting story will be misleading, if not false.  

Take for example, our recent data analysis associated with the Super Bowl. The objective for this was to uncover how Super Bowl commercial ad spend impacted advertisers’ overall ad spend behaviors. We approached this by analyzing our proprietary data around advertising transactions in the weeks leading up to the most watched American sporting event. 

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

Since Super Bowl advertisers are a small sampling of our total advertiser pool, it became imperative to avoid extrapolating or generalizing these behaviors en masse to other one-off events or other advertisers. We couldn’t speculate, for example, that the results of this analysis would also hold true for the World Cup or a boxing match with varying levels of online viewership. Results indicated that the use of private marketplaces (PMP) rose notably, week-over-week, three weeks prior to the Super Bowl. Within that period, 38% of PMP spending concentrated almost exclusively on Super Bowl week. This makes sense as PMPs provide publishers more control over which buyers have access to their audiences and inventory. 

These behaviors centered on this particular event, within a specific advertiser segment: advertisers advertising during the Super Bowl via digital campaigns. If we ignored that context, we would be dismissing the nuances uniquely associated with the Super Bowl. In this case, digital consumption remains nascent and mostly consumed on linear TV – and the results clearly reflected this.  

Be aware of your sample bias 

While nipping sample bias in the bud is generally a good rule of thumb, it’s not always possible. Being mindful that sample bias exists is critical for understanding data results and ultimately the story the data wants to tell. If the data skews towards one sample result, the data story will almost certainly point to that bias. Without acknowledging that concept, or taking it into consideration, the story will ultimately be flawed.  

In our Super Bowl analysis, a strong temptation emerged to analyze advertisers’ spending behavior by boxing them into vertical categories to understand why, for example, retail behaved differently from automotive advertisers. However, due to this sample’s unique nature, this was not achievable. There were significantly more automotive, food & drink and personal finance advertisers during the Super Bowl than other categories to make a fair comparison. Instead, we analyzed the data across platforms and channels, reaching several key conclusions. 

 Mobile web ad share spending fell prior to the Super Bowl in favor of increased connected TV (CTV) spending, then sharply returned to average levels during Super Bowl week. Lastly, while overall digital ad spend remained fairly consistent, ad spend impacted execution through the Super Bowl, as noted above with PMP usage. By analyzing the data across platforms and channels instead of industries, we were able to avoid a framing bias due to the narrow sample of categories. While the data was rich enough to be dissected across platforms and channels, one thing we kept in mind was that the set of Super Bowl advertisers were still skewed towards a concentrated set of industry categories. 

Chart, line chart

Description automatically generated

In short, analyzing data at narrow granularities may present faulty interpretations and misleading conclusions in future decision making. 

Explore the data with an open mind  

Many researchers carry out analyses with a predetermined idea or hypothesis. By strictly seeking out data that supports a hypothesis, a confirmation bias will inevitably lead you astray. The goal isn’t to prove a hypothesis correct; the goal is to uncover the truth. The hypothesis simply serves as the testing ground.  

When we started gathering the Super Bowl data, it made intuitive sense to believe that digital advertisers would reduce online spending in favor of linear TV due to the weight of the latter’s historical importance during the Super Bowl. However, this proved surprisingly untrue. The analysis actually uncovered that, while advertisers spent healthy budgets on linear TV, they did not dial back their online presence. Ad spend online in fact remained consistent leading up to the game. Advertisers allocated most of their digital spend during Super Bowl week, due partly to video ad spend, which increased 26% week over week. Had we only reviewed the data with the mindset of proving our assumption, we wouldn’t have uncovered these important results.            

Chart, line chart

Description automatically generated

How a data analysis is communicated matters. A single data set can tell a variety of stories, especially when preconceived notions come into play. Like science, data analysis must always contain a hint of skepticism. Otherwise, perfectly sound data becomes vulnerable to bias, ultimately destroying the entire purpose behind accurate data collection and critical decision-making based on that data. Remaining vigilantly mindful of data bias may therefore the most valuable tool to possess throughout the entire data gathering and analyzing process. 

Susan Wu is senior director of marketing research at PubMatic.  

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.