Feature-Rich Markov Attribution: A Context-Aware Approach

July 11, 2025 · 4 min read

Traditional Markov attribution models analyze how people move between marketing channels and estimate the contribution of each channel to conversion. While helpful, these models usually treat every customer the same—ignoring important context like user region, segment, or behavior over time.

In this article, we describe an approach that builds one Markov model for each feature (e.g. region, segment, time to conversion), and combines them into a final, feature-aware attribution score.

🧭 Why Context Matters in Attribution

A typical attribution model might give you this insight:

“Email drives 18% of conversions.”

But is that true across all users? Maybe it’s 30% for returning customers, and just 5% for new users.
If we don’t consider context, we miss important patterns.

🧍 Comparison of Two User Journeys

User A (New, from UK):
→ Paid Search → Social → Conversion

User B (Returning, from Italy):
→ Paid Search → Social → Conversion

🟡 Both users took the same path—but their behavior and background are different. Treating them the same reduces model accuracy.

🧮 Step-by-Step: How Feature-Rich Attribution Works

1. Start with Enriched Path Data

Your journey data includes more than just channels:

user_id	step	channel	region	segment	seconds_to_last_touch	position
1001	1	Email	UK	New	1420	first
1001	2	Social	UK	New	800	middle
1002	1	Paid Search	Italy	Returning	215	first
1002	2	Direct	Italy	Returning	0	last

You might have:

Categorical features: region, segment, position
Numerical features: seconds to last touch

2. Discretize Numerical Features

Before modeling, numerical variables like seconds_to_last_touch are binned to turn them into categorical values:

seconds_to_last_touch	→	time_bin
1420	→	bin_15
800	→	bin_10
215	→	bin_5
0	→	bin_0

3. Create Artificial Channels from Each Feature

We now create “channel + feature” combinations to isolate feature-specific behavior.
Example using region:

user_id	step	channel_region
1001	1	Email_UK
1001	2	Social_UK
1002	1	PaidSearch_Italy
1002	2	Direct_Italy

Original Path:
Email → Social → Conversion

Transformed Paths:
Email_UK → Social_UK → Conversion

🔁 We now have multiple transformed views of the same path, customized per feature.

4. Fit One Markov Model per Feature

Each artificial channel is modeled independently.
For example, for the feature region, we fit separate models for:

Email_UK → Social_UK → Conversion
PaidSearch_Italy → Direct_Italy → Conversion
Email_FR → Social_FR → Conversion
...and so on.

These models estimate the probability of conversion given a channel in a specific context.
From each model, we compute the **odds ** for each channel:

\text{Odds(Channel)} = \frac{\Pr(\text{Conversion via Channel})}{1 - \Pr(\text{Conversion via Channel})}

These odds quantify how strongly a channel contributes to conversion, and are used to assign attribution credit along the path.

5. Evaluate Model Performance

Each feature-specific model is evaluated using AUC-PR (Area Under the Precision-Recall Curve)—a better choice than ROC-AUC for imbalanced datasets (few conversions vs. many non-conversions).

We assign a weight to each model based on how much it improves prediction compared to a baseline (the overall conversion rate):

\text{Weight}(\text{Model}) = \begin{cases} \text{AUC-PR}, & \text{if AUC-PR} > \text{CR} \\\\ 0, & \text{otherwise} \end{cases}

Where:

AUC-PR is the precision-recall performance of the model
CR is the global conversion rate in the dataset

Only models that outperform the baseline are retained.

6. Final Attribution = Weighted Mean

The final attribution score is computed as a weighted average of the feature-specific models.

Each model contributes in proportion to its performance:

Example:

weight(Region) = 0.25
weight(Segment) = 0.38
weight(Seconds to Last Touch) = 0.12
weight(Position) = 0.22

The final attribution score for a channel aggregates contributions across all relevant features.

✅ Why This Works

👁️ You gain multiple perspectives on customer behavior
🎯 You only keep models that improve predictive accuracy
⚖️ You combine models in a data-driven way—not based on guesswork or static rules

💡 When to Use This Approach

This method is especially useful when:

You have rich journey data with structured features
You need transaction-level attribution
You want interpretable results, not just black-box predictions
You want to understand how attribution shifts across audience segments

📚 Learn More

If you'd like to see how this approach works in practice, visit our documentation or contact us to discuss your use case.

🧭 Why Context Matters in Attribution​

🧍 Comparison of Two User Journeys​

🧮 Step-by-Step: How Feature-Rich Attribution Works​

1. Start with Enriched Path Data​

2. Discretize Numerical Features​

3. Create Artificial Channels from Each Feature​

4. Fit One Markov Model per Feature​

5. Evaluate Model Performance​

6. Final Attribution = Weighted Mean​

✅ Why This Works​

💡 When to Use This Approach​

📚 Learn More​