DIY TV Attribution

Sami Yabroudi

May 9, 2025

A client asked us to validate what their marketing agency was reporting in terms of attributable visitors and purchases.

‍

This is a basic, easily executed how-to for estimating TV-attributable traffic numbers immediately after a commercial airing. The main prerequisite is an events clickstream or some other timestamped log of visitors as they first enter and then navigate the site. It is not necessary to have separate landing pages or source referral tracking.

TV commercials tend to drive a high amount of traffic within a relatively concentrated period of time. They also drive a long tail of site traffic over the next day/days/weeks.

‍

One of the first steps for TV commercial site traffic attribution is to identify spikes in traffic. This article outlines the first iteration of an algorithm we’ve previously used to identify said spikes. It’s usually sufficient to provide early order-of-magnitude estimates.

‍

The most advanced mathematical function used here is convolution, used for signal smoothing. You can definitely employ this algorithm without understanding the mechanics of how convolution works.

‍

The Starting Anomaly Detection Algorithm: Z Score Thresholding

‍

Begin with the “smoothed z score algorithm” (see also: python implementation). With a name far more complex than its implementation, this algorithm boils down to 25 lines of code and is based only on the concepts of average and standard deviation. The trickiest aspect is tuning the 3 parameters for optimal performance.

‍

At any position in the signal, the smoothed z score algorithm essentially computes a trailing average of and a trailing standard deviation of the preceding window of data points specified by the lag parameter. If a point within the lag window is considered to be part of an identified spike/anomaly, this point is not given full value, but rather weighted only as much as the influence parameter. If the point at the current position has a value outside of trailing average +/- ( threshold * trailing standard deviation) then it is considered to be part of an anomaly and marked as such.

‍

Creating A More Robust Thresholding Algorithm

‍

The Z Score algorithm is fairly robust for flat signals that experience intermittent spikes. However, traffic signals usually have a non-constant base (i.e. a daily cycle):

Example 7-day traffic with commercial spikes; we chose a granularity of 1 x-axis step = 3 minutes of time span

Attempting to use the basic Z Score algorithm results in either oversensitivity or undersensitivity (depending on how the parameters are tuned), with little middle ground.

The best possible tuning of the forward pass still has serious issues. Sensitivity is either too low or too high within the same data set, regardless of tuning.

Step1: Apply the Z Score Algorithm in Reverse

‍

The smoothed z score algorithm moves from left to right. If we apply the algorithm from right to left, we will get a different result which we can then compare against the original version.

‍

Here is what this looks like in python:

‍

Forward pass (top) vs. reverse pass (bottom)

Step 2: Compute a New Baseline Signal From the Forward and Reverse Z Score Algorithm Results

‍

The inputs here from previous steps are:

‍

the forward pass moving average, computed previously
the reverse pass moving average, computed previously
the list of identified anomaly points from the forward pass, computed previously
the list of identified anomaly points from the reverse pass, computed previously

‍

New parameters introduced are:

‍

multiple_cutoff
yes_weight_within_cutoff
yes_weight_outside_cutoff

‍

For every point in the original signal:

Here’s what this looks like as python code:

Step 3: Convolve the New Baseline Signal to Smooth It

‍

Our use case for convolution is essentially to smooth our new signal in a way that is better than a simple trailing average (which only incorporate data to the left of the point being smoothed). Here we’ve applied a short horizontal line as the convolution signal — the sum of the values of each of the points in the signal is 1. For our use case the convolution length (which can be thought of as another tunable parameter) worked well as 40 (meaning that we’re smoothing every point with 20 points from the left and 20 points from the right).

‍

In python, convolving can be done with the numpy library:

The convolution is applied to both the new baseline and the new standard deviation signals.

‍

Step 4: Re-identify Anomaly Points Using the New Smoothed Baseline

‍

This is quite basic. Given our new baseline signal and standard deviation signal, any points in the actual signal that are threshold standard deviations above the baseline are considered to be part of an anomaly spike.

‍

The logic to do so is here:

Grouping Anomaly Points Into Spikes, Filtering Spikes Using TV Logs

‍

The new input here is:

‍

the tv commercial log

‍

How you proceed at this point is dependent on the desired granularity of attribution. On the simplest level, we just want to make sure that identified anomalies are within a reasonable range of an aired commercial — an anomaly that is 1 hr before/after the nearest commercial airing is in all likelihood not commercial related.

‍

In our use case we were most interested in this analysis to get a top level figure of traffic/conversions that came in from television every week. This top level figure was compared to what our TV buying agency was reporting in order to give us confidence in their numbers. Once we had established confidence in this top level number we were happy to let them break down which channels/spots were the most efficient.

‍

Splitting Anomaly Groups Into Spikes

‍

Depending on how your anomalies look, how concurrent your tv commercials are, and what level of granularity you are aiming for you may also need to consider the need for splitting certain anomalies into multiple, overlapping spikes.

‍

The simplest way to split up an anomaly group is to look for local minima (i.e. where the first derivative is 0) — these mark points in time where descending traffic from an earlier spike is overtaking by increasing traffic from a subsequent spike.

‍

Here is a python implementation:

Top: Signal anomalies divided into distinct spikes — consecutive spikes have different colors; the black dots on the bottom represent tv spots airing — the majority of commercials were not run exactly concurrently with each other, but in some cases there were 2 (and even 3 and 4) commercials running concurrently | Bottom: Each spike is given an overall magnitude, equal to the total number of sessions attributed to the spike above the baseline signal.

A zoomed in version of segmenting anomaly groups into distinct spikes, each with a different color; the baseline signal and standard deviation signals are illustrated with dashes; the black dots represent tv commercials, and a black dot on the lower tier represents two commercials running within the same time window

‍

It is important to note that these local minima don’t identify the divisions between spikes, which are in fact overlapping; rather, they identify the point in time where the later spike “overpowers” the earlier one.

‍

Extension: Granular Value Attribution for Concurrent Commercials

‍

If you are

‍

looking to assign specific values to individual commercials and
are running lots of commercials within minutes/seconds of each other and can’t simply match every commercial with the nearest spike

‍

you will need a more sophisticated solution to commercial<>spike matching. One contender here is constrained optimization (see also: constrained optimization in python).

‍

The objective would be:

‍

maximize the number of spikes attributed to commercial airings + the number of commercial airings attributed to a spike

‍

Constraints would include:

‍

a spike must begin within a defined period of time from the nearest commercial airing (ex: 3 minutes)
a single commercial can only yield a single spike, though a single spike may be the result of multiple commercials

‍