Unified Attribution Model for impressions and clicks

A unified approach in marketing attribution for impressions and clicks

Marketing Attribution is a crucial topic for those companies who need to allocate budget among different marketing channels. Every day billions of dollars are moved between the marketing channels the companies use to advertise their products or services with the aim of increasing sales and/or marketing efficiency.

When customer journeys are available only for some marketing channels or they are not available at all then the Media-mix model (MMM) seems to be the only methodology available to perform marketing attribution. But if the aggregated number touchpoints at fixed time instants is available for all the channels involved then Unified Attribution Model (UAM), offers a new alternative.

1. Attribution methodologies

The first statistical methodology introduced for marketing attribution is the Marketing-Mix or Media-mix model which was introduced at the end of 1940. It is a regression model that takes as input the marketing spending over time of each channel involved in the marketing activity of a company and a target variable represented by the number of sales or by the revenue. Other variables not related to media or external factors can be introduced in the model to improve the model’s accuracy. For an introduction to the Media-mix model, you can refer to this article.

With the advent of browser web cookies, companies started to track users, and a new class of attribution methodologies was introduced at the beginning of 2010. The attention moved towards methodologies that could take advantage of customer journeys. The era of Multi-touch attribution began and logistic regression, Shapley value, and Markov model started to be applied to perform attribution of digital channels. For an introduction to Multi-touch attribution, you can refer to this article.

In recent years, due to privacy restrictions, some browsers started to block third-party cookies. By the end of 2023, Google Chrome which controls 60% of the global web browser market, will block third-party cookies too. The end of third-party cookies could represent the end of the availability of the rich information given by customer journeys. Fortunately, companies have started to move to first-party cookie systems to continue to collect information on users’ journeys. It means that it will not be the death of multi-touch attribution methodologies as some have prophesized but inevitably, cookie restrictions will increase the need for methodologies that can deal with aggregated time series data. Until now, the Media-mix model (MMM) has been the predominant methodology utilized to model this type of data.

2. Introduction to UAM

Differently from MMM, which fits a regression model between the target variable (amount sold or the number of conversions) and the independent variables (marketing spending on each channel, seasonal effect, etc.), UAM uses impressions and clicks to evaluate the contribution of each marketing channel in the observed incremental number of conversions through a reward model inspired by Shapley value.

It's important to note that UAM requires touchpoints (impressions and/or clicks) for each involved channel in aggregated or customer journey form. Thus, UAM can be applied to digital or traditional channels that can be expressed as an aggregated number of clicks or impressions at fixed time intervals or as a customer journeys.

UAM can be used to perform attribution when:

no customer journeys are available but only the aggegated number of touchpoints for each channel at fixed time intervals
customer journey are available for some channels while an aggregated number of touchpoints are available for the other channels

3. UAM when no customer journeys are available

If customer journeys are not available, then the two inputs we need to run a UAM are:

the aggregated number of conversions observed the observed traffic on each marketing channel considered at each specific time point:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	190	110	1210	840	255
2019-01-02 00:00:00	2019-01-02 23:59:59	120	160	1100	820	224
2019-01-03 00:00:00	2019-01-03 23:59:59	150	150	1345	660	220
...	...	...	...	...	...	...

In previous table we see we have channels: A, B, C, D and for each of them the number of touchpoints is stored at fixed time intervals.

the time series of the observed click-through rates for each channel expressed as number of clicks:

timestamp_from	timestamp_to	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	0.15	NA	NA	0.12
2019-01-02 00:00:00	2019-01-02 23:59:59	0.17	NA	NA	0.11
2019-01-03 00:00:00	2019-01-03 23:59:59	0.21	NA	NA	0.10

In the table above we infer that channels A and D are expressed as number of clicks since their click-through are not missing rates while B and C are expressed as number of impressions considering that all the click-through rates are none. UAM is then able to mix clicks and impressions.

Using the previous two inputs the first operation that UAM performs is converting the number of clicks into the number of impressions so every channels is expressed in the same unit of measure (that we will call the number of touchponts).

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	19	110 / 0.15	1210	840	255 / 0.12
2019-01-02 00:00:00	2019-01-02 23:59:59	12	160 / 0.17	1100	820	224 / 0.11
2019-01-03 00:00:00	2019-01-03 23:59:59	15	150 / 0.21	1345	660	220 / 0.10
...	...	...	...	...	...	...

and we get for each channel and each timestamp the number of touchpoints:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	19	733	1210	840	2125
2019-01-02 00:00:00	2019-01-02 23:59:59	12	941	1100	820	2036
2019-01-03 00:00:00	2019-01-03 23:59:59	15	714	1345	660	2200
...	...	...	...	...	...	...

Then a table with first difference is built:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	-	-	-	-	-
2019-01-02 00:00:00	2019-01-02 23:59:59	-7	+208	-110	-20	- 89
2019-01-03 00:00:00	2019-01-03 23:59:59	+3	-227	+245	-160	+164
...	...	...	...	...	...	...

The table below is the input for the reward model. The idea behind the model is simple: at each time instant we see how much the conversions are increased (decreased) and give a positive reward only to the channels for which the number of touchpoints has increased (decreased). We give a null reward when the number of touchpoints increases (decreases) while the number of conversions decreases (increases).

First of all, for each row, we give null rewards by comparing the sign of the value in the column 'conversions' with the sign of each value in the other columns. We set to zero all the values for which the signs do not match:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	-	-	-	-	-
2019-01-02 00:00:00	2019-01-02 23:59:59	-7	0	-110	-20	-89
2019-01-03 00:00:00	2019-01-03 23:59:59	+3	0	+245	0	+164
...	...	...	...	...	...	...

Now we take the absolute value for all the values:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	-	-	-	-	-
2019-01-02 00:00:00	2019-01-02 23:59:59	7	0	110	20	89
2019-01-03 00:00:00	2019-01-03 23:59:59	3	0	245	0	164
...	...	...	...	...	...	...

Now we calculate the rewards for each time instant and for each channel. Let $t$ be a generic time instant and $k$ a generic channel. Let $number\_touchpoints_{t,k}$ the observed number of touchpoints for channels $k$ at time $t$ . Let $number\_conversions_{t}$ the observed number of conversions at time $t$ . The reward function is the following:

$\text{reward}_{t,k} = \min ( \text{abs\_delta\_number\_conversions}_{t}, \frac{\text{abs\_delta\_number\_touchpoints}_{t,k}}{\text{avg\_number\_touchpoints\_for\_one\_conversion}}) \quad t=1,\dots,T \quad k=1,\dots K$

where

$\text{abs\_delta\_number\_conversions}_{t}=\text{abs} (\text{number\_conversions}_{t}-\text{number\_conversions}_{t-1}), \quad \forall t$

$\text{abs\_delta\_number\_touchpoints}_{t,k}=\text{abs} (\text{number\_touchpoints}_{t,k}-\text{number\_touchpoints}_{t-1,k}), \quad \forall t, \quad \forall k$

$\text{avg\_number\_touchpoints\_for\_one\_conversion} = \frac{\sum_{t,k} \text{number\_touchpoints}_{t,k}}{\sum_{t} \text{number\_conversions}_{t}}\times \frac{1}{ \text{number\_channels}}$

Suppose that we found:

$\text{avg\_number\_touchpoints\_for\_one\_conversion}=10$

then:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	-	-	-	-	-
2019-01-02 00:00:00	2019-01-02 23:59:59	7	min(7,0/10)	min(7,110/10)	min(7,20/10)	min(7,89/10)
2019-01-03 00:00:00	2019-01-03 23:59:59	3	min(3,0/10)	min(3,245/10)	min(3,0/10)	min(3,164/10)
...	...	...	...	...	...	...

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	-	-	-	-	-
2019-01-02 00:00:00	2019-01-02 23:59:59	7	0	7	2	7
2019-01-03 00:00:00	2019-01-03 23:59:59	3	0	3	0	3
...	...	...	...	...	...	...

In the previous table, columns A, B, C, and D are the rewards assigned to each channel in the conversion process.

Now we sum the rewards for each channel to obtain normalized weights. Suppose we found:

channel	reward
A	1,200
B	5,400
C	3,500
D	1,800

TOTAL	11,900

We calculate normalized weights:

channel	normalized_weight
A	(1,200 / 11,900) = 0.10
B	(5,400 / 11,900) = 0.45
C	(3,500 / 11,900) = 0.30
D	(1,800 / 11,900) = 0.15

Then atttribution is performed with the following formula:

$\text{final\_attribution}_{t,k} = \frac{\text{attribution\_weight}_{t,k}}{\sum_{k}\text{attribution\_weight}_{t,k}}\times \text{number\_conversions}_{t} \quad \forall t \quad \forall k$

where

$\text{attribution\_weight}_{t,k} = \min ( \text{number\_conversions}_{t},\text{number\_touchpoints}_{t,k})\times \frac{\text{normalized\_weight}_{k}}{\sum_{t} \text{number\_touchpoints}_{t,k}} \quad \forall t \quad \forall k$

Now suppose we have:

channel	number_touchpoints
A	1,200
B	14,350
C	11,256
D	3,450

then attribution weights are:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	190	min(190,110) x (0.10/1200)	min(190,1210) x (0.45/14350)	min(190,840) x (0.30/11256)	min(190,255) x (0.15/3450)
2019-01-02 00:00:00	2019-01-02 23:59:59	120	min(120,160) x (0.10/1200)	min(120,1100) x (0.45/14350)	min(120,820) x (0.30/11256)	min(120,224) x (0.15/3450)
2019-01-03 00:00:00	2019-01-03 23:59:59	150	min(150,150) x (0.10/1200)	min(150,1345) x (0.45/14350)	min(150,660) x (0.30/11256)	min(150,220) x (0.15/3450)
...	...	...	...	...	...	...

and final attribution is:

timestamp_from	timestamp_to	conversions	A	B	C	D
2019-01-01 00:00:00	2019-01-01 23:59:59	190	61.22	39.79	33.81	55.17
2019-01-02 00:00:00	2019-01-02 23:59:59	120	54.11	20.36	17.30	28.23
2019-01-03 00:00:00	2019-01-03 23:59:59	150	67.63	25.45	21.63	35.28
...	...	...	...	...	...	...

4. UAM when customer journeys are available only for some channels

When customer journeys are available for some channels while an aggregated number of clicks or impressions are available for the other channels, UAM performs attribution by mixing the results from the reward model executed on the aggregated data with the results of a Markov model executed on the customer journey data.

Remember that in the previous paragraph we got from the reward model the following weights:

channel	weight
A	0.10
B	0.45
C	0.30
D	0.15

Now suppose that for channel A and B we have customer journeys:

id_path	timestamp	channel
0	2019-01-01 00:19:05	A
0	2019-01-01 00:29:18	B
1	2019-01-01 00:39:20	A
1	2019-01-01 00:44:37	A
1	2019-01-01 00:49:34	((CONV))
2	2019-01-01 00:19:31	B
2	2019-01-01 00:24:38	B
2	2019-01-01 00:29:44	A
2	2019-01-01 00:31:08	B
...	...	...

Using the previous customer journeys we can run a Markov model and return for each channel a conversion rate:

channel	conversion_rate
A	0.10
B	0.05

Using the conversion rates in the previous table we can adjust weights for channels A and B in the reward model:

channel	weight
A	(0.10 + 0.45) x ((0.10 x 0.10) / ((0.10 x 0.10) + (0.45 x 0.05)))
B	(0.10 + 0.45) x ((0.45 x 0.05) / ((0.10 x 0.10) + (0.45 x 0.05)))
C	0.30
D	0.15

channel	normalized_weight
A	0.17
B	0.38
C	0.30
D	0.15

Finally, we can use the normalized weights to perform attribution like the previous paragraph.

5. Differences between MMM and UAM

Frequestist MMM	Bayesian MMM	UAM
Parametric approach based on Linear model.	Parametric approach based on Bayesian Linear model.	Non-parametric approach inspired by Shapley value.
Requires long time series.	Requires long time series.	Works well with short time series.
Small channels are penalized. The estimated effects of small channels are 0.	Small channel coefficients estimation benefit from prior distributions.	Small channels estimated effects benefit from the implicit assumption that, before observing data, every channel has the same effect on conversions. It is as if UAM has a prior distribution assumption.
Feasible only for a few channels. If the number of channels is high a very long time series is required.	Feasible only for a few channels. If the number of channels is high the Bayesian approach is slow and it requires a very long time series.	Feasible for a high number of channels.
It needs a lot of time to be implemented and fine-tuned.	It needs a lot of time to be implemented and fine-tuned. It also requires subjective assumptions on the choice of the prior distributions.	Automatic approach.