uam
# Perform attribution with Unified Attribution Model (UAM)
## You will learn how to perform attribution with UAM model on two use cases:
## Use case 1: perform attribution when no customer journeys are available
## Use case 2: perform attribution mixing customer journeys and aggregated touchpoints
## Use case 3: perform attribution of the revenue
## Use case 4: perform attribution when for one or more channels expressed as number of clicks, the click-through rates are not available
### Load ChannelAttribution Pro
import pandas as pd
from ChannelAttributionPro import *
### Set your token
token="yourtoken"
## USE CASE 1 - Attribution when no customer journeys are avaible
### In the following use case, you will learn how performing attribution on channels for which only aggregated touchpoints (clicks and/or impressions are available).
#### We have daily data on 6 channels: A, B , C, D, E, F collected from 2019-01-01 to 2019-12-31.
#### A, D, E, F are number of clicks while B and C are number of impressions.
#### We load the data frame including the time series observed conversions and of observed touchpoints for each channel.
df_aggr = pd.read_csv("https://app.channelattribution.io/data/data_aggregated.csv",sep=";")
print(df_aggr)
#### We load the data frame including the time series of observed click-through rates for each channel expressed as number of clicks.
df_ctr = pd.read_csv("https://app.channelattribution.io/data/data_ctr.csv",sep=";")
print(df_ctr)
#### Columns B and C are NaN because channels B and C are impressions and a click-through rate is not needed. Channels measured in the number of impressions must have a click-through rate column set to missing so the algorithm can understand that they are impressions.
#### Now we are ready to run UAM on our data
res=uam(df_aggr=df_aggr,df_ctr=df_ctr,df_paths=None,verbose=1)
print(res['attribution'])
#### The output of the model is the attribution of the available conversions between the available channels. So the sum of A+B+C+D+D+E+F for each row is equal to the value in column "conversions" in the corresponding row.
## USE CASE 2 - Attribution mixing customer journeys and aggregated touchpoints
### In the following use case, you will learn how to perform attribution when for some channels are availble customer journeys while for other channels only aggragated touchpoints are available.
#### We have aggregate data on 6 channels: A, B , C, D, E collected from 2019-01-01 to 2019-12-31.
#### For channel C, D, E, F we know customer journeys
#### For channel A and B we only know aggragegated touchpoints for each day
#### A, D, E, F are number of clicks while B and C are number of impressions.
#### We load the data frame including the time series of observed conversions and observed touchpoints for each channel.
df_aggr = pd.read_csv("https://app.channelattribution.io/data/data_aggregated.csv",sep=";")
print(df_aggr)
#### We load the data frame including the time series of observed click-through rates for each channel expressed as number of clicks.
df_ctr = pd.read_csv("https://app.channelattribution.io/data/data_ctr.csv",sep=";")
print(df_ctr)
#### Columns B and C are NaN because channels B and C are impressions and a click-through rate is not needed. Channels measured in the number of impressions must have a click-through rate column set to missing so the algorithm can understand that they are impressions.
#### Finally, we load the data frame including customer journeys. It includes 3 columns: "id_path" is an integer wthat univocally identify the customer journey, "timestamp" is the timestamp of the visited channel and "channel" is the visited channel. Channel equal to "((CONV))" indicates a conversion.
df_paths = pd.read_csv("https://app.channelattribution.io/data/data_paths.csv",sep=";")
print(df_paths)
#### Now we are ready to run UAM on our data.
res=uam(df_aggr=df_aggr,df_ctr=df_ctr,df_paths=df_paths,channel_conv_name="((CONV))",order=1,verbose=1)
print(res['attribution'])
#### The output of the model is the attribution of the available conversions between the available channels. So the sum of A+B+C+D+D+E+F for each row is equal to the value in column "conversions" in the correspondant row.
#### Now we will show how to return path-level attribution for the MTA part
df_paths_t=df_paths.copy()
df_paths_t['total_conversions']=0
df_paths_t.loc[df_paths_t['channel']=='((CONV))','total_conversions']=1
df_paths_t['total_conversions'] = df_paths_t.groupby('id_path')['total_conversions'].transform('sum')
df_paths_t=df_paths_t.loc[df_paths_t['channel']!='((CONV))']
df_paths_t = df_paths_t.groupby('id_path').agg({
'channel': lambda x: ' > '.join(x),
'total_conversions': 'first'
}).reset_index()
df_paths_t.rename(columns={'channel': 'path'},inplace=True)
res_path_attr=new_paths_attribution(Data=df_paths_t, var_path="path", var_conv="total_conversions", Dparams=res['parameters_mta'], var_value=None, row_sep=";", cha_sep=">", flg_write_nulls=1, flg_write_paths=0)
print(res_path_attr["attribution"])
## USE CASE 3 - Attribution of the revenue
### In the following use case, you will learn how performing attribution of the revenue associated to the observed conversions
#### We load the data frame including the time series observed conversions, revenue (value) and of observed touchpoints for each channel.
df_aggr = pd.read_csv("https://app.channelattribution.io/data/data_aggregated_w_value.csv",sep=";")
print(df_aggr)
#### We load the data frame including the time series of observed click-through rates for each channel expressed as number of clicks.
df_ctr = pd.read_csv("https://app.channelattribution.io/data/data_ctr.csv",sep=";")
print(df_ctr)
### (Optional) Then we load the data frame including customer journeys.
df_paths = pd.read_csv("https://app.channelattribution.io/data/data_paths.csv",sep=";")
print(df_paths)
#### Now we are ready to run UAM on our data.
channels=[x for x in df_aggr.columns if x not in ['timestamp_from','timestamp_to','conversions','value']]
res=uam(df_aggr=df_aggr[['timestamp_from','timestamp_to','conversions']+channels],df_ctr=df_ctr,df_paths=None,channel_conv_name="((CONV))",order=1,nsim_start=1e5,max_step=None,ncore=1,nfold=10,seed=1234567,conv_par=0.05,rate_step_sim=1.5,verbose=1)
res=res["attribution"]
#### Finally, we perform attribution on revenue
res=pd.melt(res, id_vars=['timestamp_from','timestamp_to','conversions'], var_name='channel', value_name='attribution')
res=pd.merge(res,df_aggr[['timestamp_from','timestamp_to','value']],how='inner',on=['timestamp_from','timestamp_to'])
res['attribution_value']=res['value']*res['attribution']/res['conversions']
res=res.rename(columns={'attribution':'attribution_conversions'})
res=res[['timestamp_from','timestamp_to','conversions','value','channel','attribution_conversions','attribution_value']]
print(res)
## USE CASE 4: perform attribution when for one or more channels expressed as number of clicks, the click-through rates are not available.
#### We have aggregate data on 6 channels: A, B , C, D, E collected from 2019-01-01 to 2019-12-31.
#### For channel C, D, E, F we know customer journeys
#### For channel A and B we only know aggragegated touchpoints for each day
#### A, D, E, F are number of clicks while B and C are number of impressions.
#### For channel A click-through rates are note available
#### We load the data frame including the time series of the observed conversions and of observed touchpoints for each channel.
df_aggr = pd.read_csv("https://app.channelattribution.io/data/data_aggregated.csv",sep=";")
print(df_aggr)
#### First of all we perform attribution on channel A. We create two artificial channels: the total number of clicks and the total number of impressions.
df_aggr_1=df_aggr.copy()
df_aggr_1["total_impressions"]=df_aggr_1["B"]+df_aggr_1["C"]
df_aggr_1["total_clicks"]=df_aggr_1["A"]+df_aggr_1["D"]+df_aggr_1["E"]+df_aggr_1["F"]
df_aggr_1=df_aggr_1[['timestamp_from','timestamp_to','conversions','total_impressions','total_clicks']]
print(df_aggr_1)
#### We load the data frame with the click-through rates and delete the click-through rates for channel A because we are supposing we don't know them.
df_ctr = pd.read_csv("https://app.channelattribution.io/data/data_ctr.csv",sep=";")
del df_ctr['A']
#### We estimate the click-through rate for "total_clicks" with the weighted average of the click-through rates for channels D, E, F. Then we set to NaN the click- through rates for "total_impressions".
df_ctr_1=pd.merge(df_aggr[['timestamp_from','timestamp_to','D','E','F']],df_ctr[['timestamp_from','timestamp_to','D','E','F']],on=['timestamp_from','timestamp_to'],how='outer')
df_ctr_1['sum_x']=df_ctr_1['D_x']+df_ctr_1['E_x']+df_ctr_1['F_x']
df_ctr_1['total_clicks']=df_ctr_1['D_y']*df_ctr_1['D_x']/df_ctr_1['sum_x']+df_ctr_1['E_y']*df_ctr_1['E_x']/df_ctr_1['sum_x']+df_ctr_1['F_y']*df_ctr_1['F_x']/df_ctr_1['sum_x']
df_ctr_1=df_ctr_1[['timestamp_from','timestamp_to','total_clicks']]
df_ctr_1['total_impressions']=np.nan
df_ctr_1=df_ctr_1[['timestamp_from','timestamp_to','total_impressions','total_clicks']]
print(df_ctr_1)
#### We perform attribution for "total_clicks" and "total_impressions".
res_attr_1=uam(df_aggr=df_aggr_1,df_ctr=df_ctr_1,df_paths=None,channel_conv_name="((CONV))",order=1,nsim_start=1e5,max_step=None,ncore=1,nfold=10,seed=1234567,conv_par=0.05,rate_step_sim=1.5,verbose=1)
res_attr_1=res_attr_1['attribution']
print(res_attr_1)
#### We load the data frame including customer journeys. It includes 3 columns: "id_path" is an integer wthat univocally identify the customer journey, "timestamp" is the timestamp of the visited channel and "channel" is the visited channel. Channel equal to "((CONV))" indicates a conversion.
df_paths = pd.read_csv("https://app.channelattribution.io/data/data_paths.csv",sep=";")
print(df_paths)
#### Now we perform attribution on channels expressed as number of clicks to find the attribution for channel A.
df_aggr_clicks=pd.merge(res_attr_1[['timestamp_from','timestamp_to','total_clicks']],df_aggr[["timestamp_from","timestamp_to","A","D","E","F"]],on=['timestamp_from','timestamp_to'],how='outer')
df_aggr_clicks=df_aggr_clicks.rename(columns={'total_clicks':'conversions'})
#we set all the click-through rates to 1, since all the channels are expressed in number of clicks
df_ctr_clicks=df_aggr_clicks[['timestamp_from','timestamp_to','conversions',"A","D","E","F"]]
df_ctr_clicks['A']=1
df_ctr_clicks['D']=1
df_ctr_clicks['E']=1
df_ctr_clicks['F']=1
del df_ctr_clicks['conversions']
res_attr_clicks=uam(df_aggr=df_aggr_clicks,df_ctr=df_ctr_clicks,df_paths=df_paths,channel_conv_name="((CONV))",order=1,nsim_start=1e5,max_step=None,ncore=1,nfold=10,seed=1234567,conv_par=0.05,rate_step_sim=1.5,verbose=1)
res_attr_clicks=res_attr_clicks['attribution']
del res_attr_clicks['conversions']
res_attr_clicks=res_attr_clicks.rename(columns={'A':'A_conversions'})
print(res_attr_clicks)
#### "A_conversions" is the final attribution for channel A.
#### Now we perform attribution for all the other channels. We subtract to the total conversions, the conversions attributed to channel A.
df_aggr_2=pd.merge(df_aggr,res_attr_clicks[["timestamp_from","timestamp_to","A_conversions"]],on=['timestamp_from','timestamp_to'],how='outer')
df_aggr_2['conversions']=df_aggr_2['conversions']-df_aggr_2['A_conversions']
df_aggr_2=df_aggr_2[['timestamp_from','timestamp_to','conversions','B','C','D','E','F']]
print(df_aggr_2)
#### We perform attribution for the other channels.
res_attr=uam(df_aggr=df_aggr_2,df_ctr=df_ctr,df_paths=df_paths,channel_conv_name="((CONV))",order=1,nsim_start=1e5,max_step=None,ncore=1,nfold=10,seed=1234567,conv_par=0.05,rate_step_sim=1.5,verbose=1)
res_attr=res_attr['attribution']
del res_attr['conversions']
print(res_attr)
#### Finally we merge the final data frame with the attribution we got for A.
res_attr=pd.merge(res_attr_clicks[['timestamp_from','timestamp_to','A_conversions']],res_attr,on=['timestamp_from','timestamp_to'],how='outer')
res_attr=res_attr.rename(columns={'A_conversions':'A'})
res_attr=pd.merge(df_aggr[['timestamp_from','timestamp_to','conversions']],res_attr,on=['timestamp_from','timestamp_to'],how='outer')
print(res_attr)