Skip to main content

Feature-Rich Markov Attribution: A Context-Aware Approach

· 4 min read

Traditional Markov attribution models analyze how people move between marketing channels and estimate the contribution of each channel to conversion. While helpful, these models usually treat every customer the same—ignoring important context like user region, segment, or behavior over time.

In this article, we describe an approach that builds one Markov model for each feature (e.g. region, segment, time to conversion), and combines them into a final, feature-aware attribution score.


🧭 Why Context Matters in Attribution

A typical attribution model might give you this insight:

“Email drives 18% of conversions.”

But is that true across all users? Maybe it’s 30% for returning customers, and just 5% for new users.
If we don’t consider context, we miss important patterns.


🧍 Comparison of Two User Journeys

User A (New, from UK):
→ Paid Search → Social → Conversion

User B (Returning, from Italy):
→ Paid Search → Social → Conversion

🟡 Both users took the same path—but their behavior and background are different. Treating them the same reduces model accuracy.


🧮 Step-by-Step: How Feature-Rich Attribution Works

1. Start with Enriched Path Data

Your journey data includes more than just channels:

user_idstepchannelregionsegmentseconds_to_last_touchposition
10011EmailUKNew1420first
10012SocialUKNew800middle
10021Paid SearchItalyReturning215first
10022DirectItalyReturning0last

You might have:

  • Categorical features: region, segment, position
  • Numerical features: seconds to last touch

2. Discretize Numerical Features

Before modeling, numerical variables like seconds_to_last_touch are binned to turn them into categorical values:

seconds_to_last_touchtime_bin
1420bin_15
800bin_10
215bin_5
0bin_0

3. Create Artificial Channels from Each Feature

We now create “channel + feature” combinations to isolate feature-specific behavior.
Example using region:

user_idstepchannel_region
10011Email_UK
10012Social_UK
10021PaidSearch_Italy
10022Direct_Italy

Original Path:
Email → Social → Conversion

Transformed Paths:
Email_UK → Social_UK → Conversion

🔁 We now have multiple transformed views of the same path, customized per feature.


4. Fit One Markov Model per Feature

Each artificial channel is modeled independently.
For example, for the feature region, we fit separate models for:

  • Email_UK → Social_UK → Conversion
  • PaidSearch_Italy → Direct_Italy → Conversion
  • Email_FR → Social_FR → Conversion
    ...and so on.

These models estimate the probability of conversion given a channel in a specific context.
From each model, we compute the **odds ** for each channel:

Odds(Channel)=Pr(Conversion via Channel)1Pr(Conversion via Channel)\text{Odds(Channel)} = \frac{\Pr(\text{Conversion via Channel})}{1 - \Pr(\text{Conversion via Channel})}

These odds quantify how strongly a channel contributes to conversion, and are used to assign attribution credit along the path.


5. Evaluate Model Performance

Each feature-specific model is evaluated using AUC-PR (Area Under the Precision-Recall Curve)—a better choice than ROC-AUC for imbalanced datasets (few conversions vs. many non-conversions).

We assign a weight to each model based on how much it improves prediction compared to a baseline (the overall conversion rate):

Weight(Model)={AUC-PR,if AUC-PR>CR0,otherwise\text{Weight}(\text{Model}) = \begin{cases} \text{AUC-PR}, & \text{if AUC-PR} > \text{CR} \\\\ 0, & \text{otherwise} \end{cases}

Where:

  • AUC-PR is the precision-recall performance of the model
  • CR is the global conversion rate in the dataset

Only models that outperform the baseline are retained.


6. Final Attribution = Weighted Mean

The final attribution score is computed as a weighted average of the feature-specific models.

Each model contributes in proportion to its performance:

Example:

  • weight(Region) = 0.25
  • weight(Segment) = 0.38
  • weight(Seconds to Last Touch) = 0.12
  • weight(Position) = 0.22

The final attribution score for a channel aggregates contributions across all relevant features.


✅ Why This Works

  • 👁️ You gain multiple perspectives on customer behavior
  • 🎯 You only keep models that improve predictive accuracy
  • ⚖️ You combine models in a data-driven way—not based on guesswork or static rules

💡 When to Use This Approach

This method is especially useful when:

  • You have rich journey data with structured features
  • You need transaction-level attribution
  • You want interpretable results, not just black-box predictions
  • You want to understand how attribution shifts across audience segments

📚 Learn More

If you'd like to see how this approach works in practice, visit our documentation or contact us to discuss your use case.