HUMAN is Named a Leader and Earns Top Scores in Nine Criteria in the Forrester Wave™: Bot Management Software, Q3 2024
HUMAN Blog

Programmatic Audio Roundtable

Remember Serial? It would be presumptuous to say the investigative spinoff of This American Life put podcasting on the proverbial map, but it’s not much of a stretch to say it helped elevate the medium to the public zeitgeist when it captured the nation’s attention in 2014. Separately, the rise of streaming music (and the tiered free/paid models that most providers adopted) almost single handedly brought about the rise of programmatic audio advertising.

Mikhail Venkov, whom you met this past May, is HUMAN’s subject matter expert on audio (and video) on our detection team. Louisa Abel leads audio research on the data & operations research team, and Francis Kitrick works closely with audio partners such as SXM/Pandora as a principal solutions engineer. We brought them together for a brief roundtable on programmatic audio and its challenges and promise. 

What makes verification in programmatic audio a challenge?

Francis: Programmatic audio is following a very similar trajectory to what we’ve seen before in CTV.  Early entrants set the table and experiment with the best ways to incorporate advertising, and the lessons they learn (often publicly) inform the decisions later entrants make.

While the music industry has accepted streaming as the path forward, it’s still nascent in terms of ad measurement and verification. For example, measuring an impression for a podcast ad placement is calculated as “a download of a podcast from a web-server.” Among the questions we can’t answer yet: was the ad even played? Where was it played? How many people were listening? Was the ad listened to all the way through, or was it skipped? These are difficult questions not to have easy answers to.

We’ve begun seeing spoofing of audio ads quite a bit, because audio ads are such a high value and low signal environment. We’re also seeing some of the same issues as we do on social. For example, bad actors are spoofing artist likes and follows. Unique to audio, though, is the concern of spin fraud. This is a scheme in which streamed listens are spoofed or incentivized to collect royalty revenue money or to “seed” popularity of music to push a song or album up the charts. Spin fraud goes hand in hand with audio ad fraud, as it leads to audio ad impressions that are invalid. 

Louisa: Audio is everywhere, on any device. People are listening at home, in their cars, on an airplane, or in a grocery store. Unlike environments like Mobile App and CTV,  “audio” is not specific to a type of device or application. And some of those environments, like podcasts, are the ultimate low signal environment, even more so than CTV or Digital Out of Home. Audio isn’t just podcasts and connected audio (i.e. Alexa and Sonos), though. The majority of audio traffic we see in our products are still served on Mobile or Desktop, such as music and radio streaming. This diversity makes the term “audio” unique.

What is not unique to audio, however, are the actors. Audio is following the same path we’ve seen play out in other formats, and that includes the same bad actors moving into this space with new variations on their old methods. This helps us to find patterns in the actors’ technology & techniques in the low-signal audio environment. 

Mikhail: Audio and CTV have a lot in common; both rely on similar technology to serve content, and both rely on similar technology to verify it. Neither CTV nor audio platforms support Javascript tags, limiting the amount of information available to verification partners. Audio—like CTV—is also often served via Server Side Ad Insertion (SSAI), a low signal environment. Audio additionally has podcasts which are even lower in signal. 

While a video is streamed and an ad can be verified at the time of viewing, a podcast is often downloaded, and programmatic ads are inserted at the time of the download. Ads are listened to offline, if it all, meaning the metrics gathered are only about whether the ad was even available, let alone listened to. 

Additionally, audio can be delivered through Connected Audio devices which run on their own firmware, presenting a challenge for identification and setting expectations of normal behavior. However, as noted above, the vast majority of audio traffic that we see is served through Mobile App and Desktop Web environments.

How are verification partners solving for the unique challenges of audio? 

Louisa: Since it’s the same actors who are participating in the Audio and Video advertising ecosystem, we can more easily identify the TTPs of fraud across formats. Our research team is also analyzing the data for emerging threat patterns to develop new SIVT detection techniques. In the past year, we have seen a significant increase in our threat detection in audio, which now has an even higher IVT rate (as a percentage of total bid requests) than CTV.  

Mikhail: Our expertise in CTV ad fraud detection helps us stay ahead for audio, as well, as they share similar threat vectors. SSAI spoofing is a key threat model that we see on both audio and video streaming impressions. So we can apply our knowledge and experience from CTV to Audio.

Francis: Spin fraud is a big threat in the audio space. If a song is played over and over, either by robots or incentivized listeners, ads are also played, which makes spin fraud a brand and budget concern for advertisers (not to mention its impact on the charts). The fact that we are detecting real humans on the ad side of things helps position us well to detect accompanying spin fraud. We have partners like SXM who are using HUMAN to protect their site from logged in users from committing spin fraud, and then are protected again from selling advertisers fraudulent listeners. This multilayered protection is key and should be expected as brands vet programmatic partners. 

What can the industry do to support fraud detection in audio advertising? 

Mikhail: Same as CTV; the use of SSAI servers degrades the amount of signal verification providers like HUMAN can get. If the verification beacon fires on the server, all of the signals that partners get can be spoofed. 

Additionally, SSAI servers often send bad data in the form of hard coded UAs and app IDs in addition to missing endpoint IP information, making verification very challenging. So the most important thing the industry can do is to ensure that all measurement is fired client side. This makes it a lot harder to spoof device and app information as well as making it more likely partners will get clean data from good actors. Both of these will make the ecosystem more secure and it would be huge in terms of fraud detection. 

Francis: Direct measurement is key. We want measurement to come directly from the device. We work with Roku, who has been an industry leader in pushing toward firing client side. Each publisher has client side telemetry, in addition to their watermark attestation. We have also been partnering with audio experts like SXM to utilize their client side audio signals to verify and enhance our fraud signals, as well as identify emerging audio threats.

Louisa: Mikhail and Francis are spot on, the asks for client side measurement could take up a whole page and that’s the most important piece. While more of a concern for future growth, it’s also important to continue to talk and think about podcasts and connected audio, for example improving standards for user agents; those are the smallest pieces of audio traffic right now, but they’re growing spaces and now is our chance to standardize them, just like the industry has worked to do with CTV.