Skip to main content

Unified Attribution Model for impressions and clicks

A unified approach in marketing attribution for impressions and clicks

Marketing Attribution is a crucial topic for those companies who need to allocate budget among different marketing channels. Every day billions of dollars are moved between the marketing channels the companies use to advertise their products or services with the aim of increasing sales and/or marketing efficiency.

When customer journeys are available only for some marketing channels or they are not available at all then the Media-mix model (MMM) seems to be the only methodology available to perform marketing attribution. But if the aggregated number touchpoints at fixed time instants is available for all the channels involved then Unified Attribution Model (UAM), offers a new alternative.

1. Attribution methodologies

The first statistical methodology introduced for marketing attribution is the Marketing-Mix or Media-mix model which was introduced at the end of 1940. It is a regression model that takes as input the marketing spending over time of each channel involved in the marketing activity of a company and a target variable represented by the number of sales or by the revenue. Other variables not related to media or external factors can be introduced in the model to improve the model’s accuracy. For an introduction to the Media-mix model, you can refer to this article.

With the advent of browser web cookies, companies started to track users, and a new class of attribution methodologies was introduced at the beginning of 2010. The attention moved towards methodologies that could take advantage of customer journeys. The era of Multi-touch attribution began and logistic regression, Shapley value, and Markov model started to be applied to perform attribution of digital channels. For an introduction to Multi-touch attribution, you can refer to this article.

In recent years, due to privacy restrictions, some browsers started to block third-party cookies. By the end of 2023, Google Chrome which controls 60% of the global web browser market, will block third-party cookies too. The end of third-party cookies could represent the end of the availability of the rich information given by customer journeys. Fortunately, companies have started to move to first-party cookie systems to continue to collect information on users’ journeys. It means that it will not be the death of multi-touch attribution methodologies as some have prophesized but inevitably, cookie restrictions will increase the need for methodologies that can deal with aggregated time series data. Until now, the Media-mix model (MMM) has been the predominant methodology utilized to model this type of data.

2. Introduction to UAM

Differently from MMM, which fits a regression model between the target variable (amount sold or the number of conversions) and the independent variables (marketing spending on each channel, seasonal effect, etc.), UAM uses impressions and clicks to evaluate the contribution of each marketing channel in the observed incremental number of conversions through a reward model inspired by Shapley value.

It's important to note that UAM requires touchpoints (impressions and/or clicks) for each involved channel in aggregated or customer journey form. Thus, UAM can be applied to digital or traditional channels that can be expressed as an aggregated number of clicks or impressions at fixed time intervals or as a customer journeys.

UAM can be used to perform attribution when:

  • no customer journeys are available but only the aggegated number of touchpoints for each channel at fixed time intervals

  • customer journey are available for some channels while an aggregated number of touchpoints are available for the other channels

3. UAM when no customer journeys are available

If customer journeys are not available, then the two inputs we need to run a UAM are:

  • the aggregated number of conversions observed the observed traffic on each marketing channel considered at each specific time point:
timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:591901101210840255
2019-01-02 00:00:002019-01-02 23:59:591201601100820224
2019-01-03 00:00:002019-01-03 23:59:591501501345660220
.....................

In previous table we see we have channels: A, B, C, D and for each of them the number of touchpoints is stored at fixed time intervals.

  • the time series of the observed click-through rates for each channel expressed as number of clicks:

timestamp_fromtimestamp_toABCD
2019-01-01 00:00:002019-01-01 23:59:590.15NANA0.12
2019-01-02 00:00:002019-01-02 23:59:590.17NANA0.11
2019-01-03 00:00:002019-01-03 23:59:590.21NANA0.10

In the table above we infer that channels A and D are expressed as number of clicks since their click-through are not missing rates while B and C are expressed as number of impressions considering that all the click-through rates are none. UAM is then able to mix clicks and impressions.

Using the previous two inputs the first operation that UAM performs is converting the number of clicks into the number of impressions so every channels is expressed in the same unit of measure (that we will call the number of touchponts).

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:5919110 / 0.151210840255 / 0.12
2019-01-02 00:00:002019-01-02 23:59:5912160 / 0.171100820224 / 0.11
2019-01-03 00:00:002019-01-03 23:59:5915150 / 0.211345660220 / 0.10
.....................

and we get for each channel and each timestamp the number of touchpoints:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:591973312108402125
2019-01-02 00:00:002019-01-02 23:59:591294111008202036
2019-01-03 00:00:002019-01-03 23:59:591571413456602200
.....................

Then a table with first difference is built:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59-----
2019-01-02 00:00:002019-01-02 23:59:59-7+208-110-20- 89
2019-01-03 00:00:002019-01-03 23:59:59+3-227+245-160+164
.....................

The table below is the input for the reward model. The idea behind the model is simple: at each time instant we see how much the conversions are increased (decreased) and give a positive reward only to the channels for which the number of touchpoints has increased (decreased). We give a null reward when the number of touchpoints increases (decreases) while the number of conversions decreases (increases).

First of all, for each row, we give null rewards by comparing the sign of the value in the column 'conversions' with the sign of each value in the other columns. We set to zero all the values for which the signs do not match:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59-----
2019-01-02 00:00:002019-01-02 23:59:59-70-110-20-89
2019-01-03 00:00:002019-01-03 23:59:59+30+2450+164
.....................

Now we take the absolute value for all the values:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59-----
2019-01-02 00:00:002019-01-02 23:59:59701102089
2019-01-03 00:00:002019-01-03 23:59:59302450164
.....................

Now we calculate the rewards for each time instant and for each channel. Let tt be a generic time instant and kk a generic channel. Let number_touchpointst,knumber\_touchpoints_{t,k} the observed number of touchpoints for channels kk at time tt. Let number_conversionstnumber\_conversions_{t} the observed number of conversions at time tt. The reward function is the following:

rewardt,k=min(abs_delta_number_conversionst,abs_delta_number_touchpointst,kavg_number_touchpoints_for_one_conversion)t=1,,Tk=1,K\text{reward}_{t,k} = \min ( \text{abs\_delta\_number\_conversions}_{t}, \frac{\text{abs\_delta\_number\_touchpoints}_{t,k}}{\text{avg\_number\_touchpoints\_for\_one\_conversion}}) \quad t=1,\dots,T \quad k=1,\dots K

where

abs_delta_number_conversionst=abs(number_conversionstnumber_conversionst1),t\text{abs\_delta\_number\_conversions}_{t}=\text{abs} (\text{number\_conversions}_{t}-\text{number\_conversions}_{t-1}), \quad \forall t

abs_delta_number_touchpointst,k=abs(number_touchpointst,knumber_touchpointst1,k),t,k\text{abs\_delta\_number\_touchpoints}_{t,k}=\text{abs} (\text{number\_touchpoints}_{t,k}-\text{number\_touchpoints}_{t-1,k}), \quad \forall t, \quad \forall k

avg_number_touchpoints_for_one_conversion=t,knumber_touchpointst,ktnumber_conversionst×1number_channels\text{avg\_number\_touchpoints\_for\_one\_conversion} = \frac{\sum_{t,k} \text{number\_touchpoints}_{t,k}}{\sum_{t} \text{number\_conversions}_{t}}\times \frac{1}{ \text{number\_channels}}

Suppose that we found:

avg_number_touchpoints_for_one_conversion=10\text{avg\_number\_touchpoints\_for\_one\_conversion}=10

then:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59-----
2019-01-02 00:00:002019-01-02 23:59:597min(7,0/10)min(7,110/10)min(7,20/10)min(7,89/10)
2019-01-03 00:00:002019-01-03 23:59:593min(3,0/10)min(3,245/10)min(3,0/10)min(3,164/10)
.....................
timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59-----
2019-01-02 00:00:002019-01-02 23:59:5970727
2019-01-03 00:00:002019-01-03 23:59:5930303
.....................

In the previous table, columns A, B, C, and D are the rewards assigned to each channel in the conversion process.

Now we sum the rewards for each channel to obtain normalized weights. Suppose we found:

channelreward
A1,200
B5,400
C3,500
D1,800
TOTAL11,900

We calculate normalized weights:

channelnormalized_weight
A(1,200 / 11,900) = 0.10
B(5,400 / 11,900) = 0.45
C(3,500 / 11,900) = 0.30
D(1,800 / 11,900) = 0.15

Then atttribution is performed with the following formula:

final_attributiont,k=attribution_weightt,kkattribution_weightt,k×number_conversionsttk\text{final\_attribution}_{t,k} = \frac{\text{attribution\_weight}_{t,k}}{\sum_{k}\text{attribution\_weight}_{t,k}}\times \text{number\_conversions}_{t} \quad \forall t \quad \forall k

where

attribution_weightt,k=min(number_conversionst,number_touchpointst,k)×normalized_weightktnumber_touchpointst,ktk\text{attribution\_weight}_{t,k} = \min ( \text{number\_conversions}_{t},\text{number\_touchpoints}_{t,k})\times \frac{\text{normalized\_weight}_{k}}{\sum_{t} \text{number\_touchpoints}_{t,k}} \quad \forall t \quad \forall k

Now suppose we have:

channelnumber_touchpoints
A1,200
B14,350
C11,256
D3,450

then attribution weights are:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:59190min(190,110) x (0.10/1200)min(190,1210) x (0.45/14350)min(190,840) x (0.30/11256)min(190,255) x (0.15/3450)
2019-01-02 00:00:002019-01-02 23:59:59120min(120,160) x (0.10/1200)min(120,1100) x (0.45/14350)min(120,820) x (0.30/11256)min(120,224) x (0.15/3450)
2019-01-03 00:00:002019-01-03 23:59:59150min(150,150) x (0.10/1200)min(150,1345) x (0.45/14350)min(150,660) x (0.30/11256)min(150,220) x (0.15/3450)
.....................

and final attribution is:

timestamp_fromtimestamp_toconversionsABCD
2019-01-01 00:00:002019-01-01 23:59:5919061.2239.7933.8155.17
2019-01-02 00:00:002019-01-02 23:59:5912054.1120.3617.3028.23
2019-01-03 00:00:002019-01-03 23:59:5915067.6325.4521.6335.28
.....................

4. UAM when customer journeys are available only for some channels

When customer journeys are available for some channels while an aggregated number of clicks or impressions are available for the other channels, UAM performs attribution by mixing the results from the reward model executed on the aggregated data with the results of a Markov model executed on the customer journey data.

Remember that in the previous paragraph we got from the reward model the following weights:

channelweight
A0.10
B0.45
C0.30
D0.15

Now suppose that for channel A and B we have customer journeys:

id_pathtimestampchannel
02019-01-01 00:19:05A
02019-01-01 00:29:18B
12019-01-01 00:39:20A
12019-01-01 00:44:37A
12019-01-01 00:49:34((CONV))
22019-01-01 00:19:31B
22019-01-01 00:24:38B
22019-01-01 00:29:44A
22019-01-01 00:31:08B
.........

Using the previous customer journeys we can run a Markov model and return for each channel a conversion rate:

channelconversion_rate
A0.10
B0.05

Using the conversion rates in the previous table we can adjust weights for channels A and B in the reward model:

channelweight
A(0.10 + 0.45) x ((0.10 x 0.10) / ((0.10 x 0.10) + (0.45 x 0.05)))
B(0.10 + 0.45) x ((0.45 x 0.05) / ((0.10 x 0.10) + (0.45 x 0.05)))
C0.30
D0.15
channelnormalized_weight
A0.17
B0.38
C0.30
D0.15

Finally, we can use the normalized weights to perform attribution like the previous paragraph.

5. Differences between MMM and UAM

Frequestist MMMBayesian MMMUAM
Parametric approach based on Linear model.Parametric approach based on Bayesian Linear model.Non-parametric approach inspired by Shapley value.
Requires long time series.Requires long time series.Works well with short time series.
Small channels are penalized. The estimated effects of small channels are 0.Small channel coefficients estimation benefit from prior distributions.Small channels estimated effects benefit from the implicit assumption that, before observing data, every channel has the same effect on conversions. It is as if UAM has a prior distribution assumption.
Feasible only for a few channels. If the number of channels is high a very long time series is required.Feasible only for a few channels. If the number of channels is high the Bayesian approach is slow and it requires a very long time series.Feasible for a high number of channels.
It needs a lot of time to be implemented and fine-tuned.It needs a lot of time to be implemented and fine-tuned. It also requires subjective assumptions on the choice of the prior distributions.Automatic approach.