Building a custom marketing attribution model requires more than theoretical knowledge. You need sufficient data volume. You need clear validation criteria. You need realistic resource expectations. Most teams attempting custom attribution fail for a specific reason. They lack analytical skills is not the cause. They start building before confirming minimum requirements. They fail to ensure statistically valid results are possible.
Key Takeaways
• Evaluate your organization's data volume, technical capacity, and resources before investing in custom attribution model development to ensure statistical validity.
• Structure your attribution data schema by defining customer touchpoints, channels, and conversion events with clear naming conventions and consistent tracking across systems.
• Centralize all marketing and customer data into a unified repository, then validate data completeness by checking for missing values and tracking inconsistencies.
• Configure your attribution window by determining how many days before conversion to analyze and which touchpoints to include in your model calculations.
• Choose between Shapley Value and Markov Chain models based on your business needs, data availability, and whether you prioritize fairness or computational efficiency.
• Most custom attribution projects fail due to insufficient data volume and unclear validation criteria rather than lack of analytical or technical skills.
This guide walks through the complete build process. It covers data structure requirements, model selection criteria, and implementation code. It includes validation metrics and decision gates. These gates tell you whether to build or stick with rule-based alternatives. By the end, you'll know what it takes to move from platform-reported conversions to a unified attribution system. This system accurately credits each touchpoint.
Can You Build a Custom Attribution Model? (Decision Framework)
Before investing months in custom attribution development, evaluate whether you meet the minimum thresholds for statistical validity. Custom data-driven models require specific data volumes, technical capacity, and time horizons that many teams don't have when they start.
| Criterion | Minimum Threshold | If You Don't Meet It |
|---|---|---|
| Monthly conversions | 300+ for Markov, 500+ for Shapley | Use rule-based models (last-click, linear, time-decay) until you accumulate sufficient volume |
| Unique conversion paths | 30+ for Markov, 50+ for Shapley | Your model will overfit to a handful of dominant paths, producing unstable results |
| Marketing channels tracked | 5+ paid channels with meaningful spend | If you only run 1-2 channels, platform analytics already tell you what works—no model needed |
| Data engineering capacity | 1 full-time engineer for 6-8 weeks | Consider attribution platforms (Improvado, SegmentStream, HockeyStack) that handle infrastructure |
| Time horizon tolerance | 3-4 months to production-ready model | If leadership needs attribution answers in 2-4 weeks, start with W-shaped or linear models immediately |
| Complete path visibility | >80% of conversions have trackable touchpoint sequence | Fix tracking gaps first—models trained on incomplete data produce systematically biased credit assignments |
If you fail two or more criteria, building a custom model will waste engineering time. It will also produce unreliable results. The better path is to implement rule-based attribution immediately. Use W-shaped attribution for B2B. Use linear attribution for e-commerce. Fix data collection gaps. Accumulate 3-6 months of clean conversion path data. Then attempt data-driven models.
For teams with insufficient data volume, the W-shaped attribution model offers a practical middle ground. It assigns 30% credit to first touch. It assigns 30% credit to lead creation. It assigns 30% credit to conversion touchpoints. The remaining 10% is distributed across other interactions. This captures the full funnel. It doesn't require the statistical complexity of Markov or Shapley methods.
Three Steps to Build a Custom Marketing Attribution Model
Custom attribution model development follows a sequential three-stage process: data aggregation, schema validation, and model configuration. Each stage has specific completion criteria that gate progression to the next phase.
Step #1. Structure Your Attribution Data Schema
Attribution models require conversion path data in a specific tabular format. Your data warehouse must contain a table where each row represents one touchpoint in a customer's journey, structured with the following required fields:
| Field Name | Data Type | Purpose | Example Value |
|---|---|---|---|
user_id | String | Links all touchpoints for one customer | usr_8k2m9n |
timestamp | Datetime | Orders touchpoints chronologically | 2026-03-15 14:32:18 |
touchpoint_id | String | Unique identifier for this interaction | tp_fb_ad_12345 |
channel | String | Marketing channel attribution target | paid_social, organic_search, email |
conversion_flag | Boolean | 1 if this touchpoint resulted in conversion | 0 or 1 |
revenue | Decimal | Transaction value (NULL for non-conversions) | 1299.00 |
Here's what three complete conversion paths look like in this schema:
| user_id | timestamp | channel | conversion_flag | revenue |
|---|---|---|---|---|
| usr_001 | 2026-03-01 09:15 | paid_search | 0 | NULL |
| usr_001 | 2026-03-03 14:22 | organic_search | 0 | NULL |
| usr_001 | 2026-03-05 16:40 | 0 | NULL | |
| usr_001 | 2026-03-06 11:08 | paid_social | 1 | 2499.00 |
| usr_002 | 2026-03-02 10:30 | display | 0 | NULL |
| usr_002 | 2026-03-04 15:45 | organic_search | 0 | NULL |
| usr_002 | 2026-03-07 09:12 | paid_search | 1 | 899.00 |
| usr_003 | 2026-03-01 13:20 | paid_social | 0 | NULL |
| usr_003 | 2026-03-02 11:05 | 0 | NULL | |
| usr_003 | 2026-03-03 16:30 | organic_search | 0 | NULL |
| usr_003 | 2026-03-04 10:15 | display | 0 | NULL |
| usr_003 | 2026-03-08 14:50 | paid_search | 1 | 3200.00 |
Before you can train an attribution model, validate that your dataset meets these data quality thresholds:
| Data Quality Metric | Minimum Threshold | If Below Threshold |
|---|---|---|
| Complete conversion paths | >80% of conversions have all touchpoints captured | Audit tracking implementation—missing touchpoints bias credit toward visible channels |
| UTM parameter completeness | <10% of paid channel clicks missing UTM tags | Paid traffic gets misattributed to organic—fix campaign tagging before proceeding |
| Cross-device match rate | >60% of users identified across devices | Implement deterministic identity resolution (login tracking) or probabilistic matching |
| Attribution window coverage | >90% of first touchpoints fall within window | Extend your attribution window or accept that model ignores early-funnel influence |
Cross-Device and Multi-Stakeholder Attribution for B2B
B2B attribution faces a unique challenge. Conversions aren't driven by individual users. Instead, buying committees of 11-20 stakeholders across different roles and devices drive them. A CFO researches ROI on their work laptop. A CMO reviews case studies on mobile during commute. A procurement officer finalizes purchase on desktop. All appear as separate users without proper identity resolution.
Two approaches solve cross-device tracking:
• Deterministic matching uses login events to definitively link devices. When users authenticate, you can tag all activity from that session with their account ID. This produces 70-80% match rates for products with login requirements, but misses anonymous browsing sessions.
• Probabilistic matching uses behavioral fingerprinting—IP address, user agent, browsing patterns, timestamp sequences—to infer when different sessions belong to the same person. Machine learning models score the likelihood of a match. This captures anonymous sessions but achieves only 40-60% accuracy, creating false positives.
For B2B scenarios with multiple stakeholders per account, you need account-level aggregation rather than user-level tracking. Here's a concrete example showing three personas influencing one $50,000 software purchase:
| account_id | persona | timestamp | channel | conversion |
|---|---|---|---|---|
| acct_789 | CMO | 2026-01-10 | LinkedIn ad | 0 |
| acct_789 | CMO | 2026-01-15 | Webinar | 0 |
| acct_789 | CFO | 2026-01-18 | Google search (pricing) | 0 |
| acct_789 | CFO | 2026-01-20 | ROI calculator landing page | 0 |
| acct_789 | CMO | 2026-02-01 | Case study email | 0 |
| acct_789 | Procurement | 2026-02-10 | Sales demo | 0 |
| acct_789 | Procurement | 2026-02-15 | Contract review | 0 |
| acct_789 | CFO | 2026-03-01 | Direct (contract signed) | 1 |
To aggregate these interactions into one account-level conversion path for attribution modeling, use SQL logic like this (PostgreSQL/Snowflake syntax):
WITH account_paths AS (
SELECT
account_id,
STRING_AGG(channel, ' > ' ORDER BY timestamp) AS conversion_path,
MAX(CASE WHEN conversion = 1 THEN revenue ELSE 0 END) AS deal_value
FROM b2b_touchpoints
WHERE timestamp >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY account_id
)
SELECT * FROM account_paths WHERE deal_value > 0;This query collapses all touchpoints from all personas within an account. It converts them into a single path string (e.g., ). This string becomes one row in your attribution model training dataset. LinkedIn ad > Webinar > Google search > ROI calculator > Case study email > Sales demo > Contract review > Direct
Improvado's attribution solution uses an identity graph data structure. It automatically maps interactions from different stakeholders within target organizations. This eliminates the need for custom SQL aggregation scripts. It ensures that B2B marketing teams can track complex buying journeys across multiple decision-makers. Teams no longer need manual data engineering work.
Capturing Conversion and Revenue Data
To complete your marketing attribution model, you will also need to pull data on the conversions and the respective revenue that is likely found in your CRM. These data points will help you calculate each marketing channel's conversions, revenue, and ROI.
A great help in pulling large volumes of data from a CRM is the . ETL automatically pulls all your data. It applies transformations, for example, unifying disparate naming conventions. Then it loads the data into a data warehouse, BI tool, visualization tool, or another destination. extract, transform, load (ETL) solution
Step #2. Centralize Your Data and Validate Completeness
To create a custom attribution model, you need to import all the data we described above. Import it into one place, ideally a database or . marketing data warehouse
An easy and efficient way to achieve this is by using an ETL solution. It enables you to connect all your marketing data in minutes. This saves massive amounts of time and developer resources. This is particularly beneficial for enterprise companies. They have complex marketing ecosystems and numerous data sources to manage.
Once you have all the metrics and dimensions required for the attribution model imported into your database, validate that you have sufficient data volume before proceeding to model training. Use these thresholds as go/no-go gates:
| Attribution Model Type | Minimum Conversions/Month | Minimum Unique Paths | If You Don't Meet Threshold |
|---|---|---|---|
| Rule-based (Linear, Time-Decay, W-Shaped) | 100+ | No requirement | Start here—works with any volume |
| Markov Chain | 300+ | 30+ | Use linear or time-decay until you accumulate 3 months of data |
| Shapley Value | 500+ | 50+ | Markov is more data-efficient—start there instead |
Run this diagnostic SQL query to check if your dataset meets minimum requirements:
SELECT
COUNT(DISTINCT user_id) AS total_converting_users,
COUNT(DISTINCT CASE WHEN conversion_flag = 1 THEN user_id END) AS converters_last_30d,
COUNT(DISTINCT STRING_AGG(channel, '>' ORDER BY timestamp)) AS unique_paths,
AVG(touchpoints_per_user) AS avg_path_length
FROM (
SELECT user_id, channel, timestamp, conversion_flag,
COUNT(*) OVER (PARTITION BY user_id) AS touchpoints_per_user
FROM attribution_data
WHERE timestamp >= CURRENT_DATE - INTERVAL '30 days'
) subquery;If converters_last_30d is below 300, you don't have enough data for Markov models yet. If unique_paths is below 30, your attribution results will be dominated by 2-3 path patterns and won't generalize. If avg_path_length is below 2.5, most customers are converting on first or second touch—attribution modeling provides minimal value over last-click.
Step #3. Configure Your Attribution Window
An attribution window has several names. It is also called a conversion window or lookback window. It is the time frame during which a conversion should be credited to a touchpoint. That touchpoint must have happened within that period.
Let's say you launch a social ad campaign for your new product. A user seeing this ad may not show an immediate intent action—conversion due to timing, need for a partner's opinion, or general concerns. After two weeks, the user comes across your video ad, looks through it, but still doesn't purchase a product. But in seven more days, this same user goes directly to the website and completes a purchase. Not considering a conversion window, a company may attribute this conversion to the organic channel. And when the marketing team decides to optimize non-converting touchpoints, video ads will be abolished.
When deciding on an attribution window, take into account historical data and business considerations. Consider the purchase cycle for your products and industry norms. For example, a customer takes much longer deciding on a vacation package worth thousands of dollars. This is longer than deciding to buy an inexpensive t-shirt. Meanwhile, decision-making in a B2B sector can take months. It may involve 11 to 20 stakeholders.
Attribution Window Sensitivity Analysis
Before committing to a specific attribution window length, test how much credit allocation changes. Vary the window to see the differences. If a 30-day window credits Facebook with 25% of conversions, note this baseline. A 60-day window credits it with 45% instead. Your attribution model is highly sensitive to this parameter choice. You need a data-driven way to pick the right value. Guessing is not sufficient.
Run this analysis: Calculate channel credit under three different windows (30, 60, 90 days). Use the same attribution model type. Then measure percentage point swings for each channel.
| Channel | 30-Day Window Credit | 60-Day Window Credit | 90-Day Window Credit | Max Swing |
|---|---|---|---|---|
| Paid Search | 35% | 32% | 30% | 5 pp |
| Paid Social | 18% | 26% | 28% | 10 pp |
| 22% | 20% | 19% | 3 pp | |
| Organic Search | 15% | 14% | 15% | 1 pp |
| Display | 10% | 8% | 8% | 2 pp |
Decision rule: if any channel shows >20 percentage point swing across windows, your window is arbitrary and undermines model reliability. The fix is to run cohort analysis: group converters by first-touch date, then plot conversion rate by days elapsed. The point where the curve flattens (e.g., 95% of eventual conversions happen by day 67) is your statistically justified attribution window.
For B2B marketing, industry benchmarks suggest 60-90 day attribution windows due to multi-stakeholder decision cycles. Research shows B2B buying committees involve 11-20 people, and enterprise deals often require 3-6 months of nurturing. A 30-day window will systematically under-credit top-of-funnel awareness channels like LinkedIn ads, content syndication, and webinars that influence early-stage research but don't directly precede conversions.
Note: Advanced attribution modeling solutions can acknowledge multiple users within one business account. They follow them as a single unit. This is crucial for tracking B2B buying cycles. The customer is a company, not a single role within it. Improvado marketing attribution solution uses an identity graph data structure. It maps interactions from different stakeholders within a target organization. This allows marketers to better understand the collective decision-making process. It also more accurately attributes credit to marketing efforts. These efforts influence the buying journey of the entire organization. Tailor your marketing strategy to the unique dynamics of B2B sales cycles with Improvado.
Analytics tools like Google Analytics help you decide on attribution windows. Google Analytics has a default 30-day conversion window, or as the company calls it a "lookback window", for acquisition conversion events and a default 90-day window for other events. A user is free to set custom windows for as little as seven days.
- →Account-level B2B attribution that tracks 11-20 stakeholders across devices and maps them to closed deals
- →Multiple attribution models (Markov, Shapley, W-shaped, linear) with side-by-side comparison and audit trails
- →Full SQL access for custom analysis—export raw path data, verify calculations, build proprietary weighting rules
Select a Data-Driven Attribution Model as a Foundation
There are two widely accepted data-driven models for attribution: Shapley value model and Markov chain model. The inputs needed for both models are the touchpoints and conversions, which, as stated above, are part of the data that you will import into your database.
Using the Shapley Value Attribution Model
The Shapley value model is named after Nobel Prize-winning economist Lloyd Shapley. It is a game theory model for cooperative problems. It assigns conversion credit to different contributing parties. It determines the total value each party contributed. Marketing attribution models answer a similar question. They ask how much credit each marketing channel deserves. They measure credit for user conversions along the conversion process.
The Shapley value model is used by Google for their own data-driven attribution model in . However, creating a custom attribution model offers greater control. You will avoid biases that Google Analytics might have. These biases include giving more credit to Google Search. Google Analytics 360
In order to calculate the contribution of a channel under the Shapley value model, we compare all different permutations of paths and touchpoints that occurred. For example, we take two paths that differ by a single touchpoint. We assign the difference in total value to that extra touchpoint. It is the only difference between the two. Then we compute all the permutations. We assign conversion credit to each channel accordingly. Thus, the model calculates the probability of conversion. This occurs when a specific channel is present in the conversion path.
- Computational Complexity and Data Requirements: Shapley requires calculating all possible path permutations, which creates O(n!) computational complexity—factorial growth that limits practical use to fewer than 15 channels. With 10 channels, the model must evaluate 3.6 million permutations; with 15 channels, that number explodes to 1.3 trillion calculations.
- When to Choose Markov vs. Shapley: Use Markov over Shapley when channel order matters—for example, when you have distinct awareness → consideration → conversion stage sequences and need to understand how early-funnel touchpoints enable later conversions. Markov's first-order chain structure explicitly models transition probabilities between channels, making it better suited for sequential influence.
- Minimum Data Requirements: Markov models need 300+ conversions per month and 30+ unique path variations for statistical reliability. Below these thresholds, rare path patterns get disproportionate weight and credit assignments become volatile.
Implementing Markov Attribution in Python
The implementation code for Markov is nearly identical to Shapley. The same function calculates removal effects. However, it uses transition matrices instead of permutation comparisons. markov_model()
import pandas as pd
from ChannelAttribution import markov_model
df = pd.DataFrame({
'path': [
'paid_search > email > paid_social',
'organic_search > paid_search',
'paid_social > email > organic_search > paid_search'
],
'conversions': [15, 22, 8],
'revenue': [18750, 27500, 10000]
})
# order=1 means first-order Markov (each transition depends only on previous state)
results = markov_model(df, 'path', 'conversions', var_value='revenue',
sep='>', order=1)
print(results) # Channel-level attributed conversions and revenueThe key parameter is order: order=1 (first-order Markov) assumes each touchpoint's influence depends only on the immediately preceding touchpoint. order=2 (second-order) considers the two preceding touchpoints but requires significantly more data—generally 1,000+ conversions per month for stability.
Shapley Value and Markov Chain vs. Rule-Based Attribution Models
In both the Shapley and the Markov models, the output is a matrix of all marketing channels. This matrix includes a probability or credit for all conversions. These conversions occur thanks to each of those channels.
| Channel | Last-click conversions | Data-driven model conversions |
|---|---|---|
| Paid search | 5 | 3.7 |
| Social media | 3 | 4.4 |
| Newsletter | 2 | 1.9 |
| Total | 10 | 10 |
The above table is an example of the output of a custom attribution model compared to a last-click model. Note that the total number of conversions is the same for both models, but what changes is the allocation between the channels. the custom attribution models can have fractional conversions, since credit for a conversion is given to multiple channels.
You can also calculate the revenue and ROI for each of the channels since you have conversions, revenue and marketing cost in your database. This will help you allocate your marketing budget across channels.
Choosing Between Shapley and Markov: Decision Matrix
Most practitioners struggle with model selection because both Shapley and Markov are described in purely theoretical terms without practical decision criteria. Use this matrix to determine which model fits your data and business context:
| Scenario | Recommended Model | Reasoning |
|---|---|---|
| 300-500 conversions/month, 5-10 channels | Markov | More data-efficient, handles moderate volume better than Shapley |
| 500+ conversions/month, 5-15 channels | Either (test both) | Sufficient data for both—compare results and pick more stable one |
| Clear funnel stages (awareness→consideration→purchase) | Markov | Transition probabilities capture sequential influence better |
| Touchpoint order is arbitrary (e.g., multi-channel retargeting) | Shapley | Permutation-invariant credit assignment doesn't overweight order |
| 15+ channels tracked | Markov | Shapley's factorial complexity becomes computationally prohibitive |
| Subscription renewal attribution | Shapley or segmented model | Markov overweights retargeting in renewal scenarios; segment new vs. renewal |
| Many rare/unique paths (high diversity) | Markov | Handles rare path variations better than Shapley's permutation approach |
| <100 conversions/month | Rule-based (W-shaped, linear) | Insufficient data for any data-driven model—use predetermined weights |
When Attribution Models Fail: Common Failure Modes and Fixes
Attribution models break in predictable ways when underlying assumptions are violated or data quality degrades. Recognizing failure symptoms early prevents incorrect budget decisions based on flawed credit assignments.
Failure Mode #1: Insufficient Conversion Volume
• Symptom: Model assigns 60-80% of credit to a single channel despite multi-channel campaigns running. Credit allocation shifts by >20 percentage points when retraining on the next month's data.
• Root cause: Fewer than 300 monthly conversions creates statistical noise that overwhelms true channel effects. The model overfits to dominant path patterns rather than learning generalizable influence.
• How to detect: Calculate coefficient of variation (standard deviation ÷ mean) for each channel's monthly attributed conversions over 3 months. If CV > 0.5 for multiple channels, your model is unstable.
Aggregate data across 2-3 months to increase sample size. Alternatively, switch to rule-based attribution. Use linear, time-decay, or W-shaped models. Continue until conversion volume grows above threshold. Fix:
Failure Mode #2: New Channel Launch Bias
Newly launched channel (e.g., TikTok ads starting January 1st) receives near-zero attribution credit in January. This occurs despite driving traffic. The channel also appears in conversion paths. Symptom:
• Root cause: Data-driven models learn channel weights from historical conversion patterns. A channel with no prior history has no statistical basis for credit assignment—the model defaults to minimal weight.
• How to detect: Compare new channel's appearance frequency in paths (e.g., present in 25% of conversions) vs. attributed credit (e.g., 2% of conversions). If credit is <50% of appearance frequency, the model is underweighting it.
• Fix: For the first 30-60 days after launch, assign new channels credit using a rule-based proxy model (e.g., linear or position-based). After 60+ days of data accumulation, retrain your data-driven model to incorporate the new channel's actual influence patterns.
Failure Mode #3: Seasonality Distortion
• Symptom: Black Friday, Cyber Monday, or holiday shopping periods show 10x normal conversion volume. Channels that drove this spike (e.g., promotional email) get permanently over-credited in annual models.
• Root cause: Data-driven models weight all conversions equally. If 40% of annual conversions happen in November-December, those months' channel mix dominates credit assignment for the entire year.
• How to detect: Plot monthly conversion volume and check for >3x spikes. If spike months represent >25% of annual volume, seasonality is distorting your model.
• Fix: Train separate attribution models for high-season (e.g., Q4) and normal-season (Q1-Q3) periods. Apply the appropriate model when forecasting or optimizing budgets for each period.
Failure Mode #4: Attribution Window Too Short for Purchase Cycle
A B2B software company uses a 30-day attribution window. Paid search dominates credit at 70%+. LinkedIn ads and webinars receive minimal credit. However, the sales team reports differently. Most deals start with LinkedIn or webinar engagement. Symptom:
• Root cause: B2B purchase cycles often span 60-180 days. A 30-day window only captures bottom-funnel conversions and systematically excludes top-of-funnel touchpoints that happened 45-60 days before deal close.
• How to detect: Query your CRM for average time from first touchpoint to closed-won deal. If this exceeds your attribution window by >30%, you're systematically under-crediting early-stage channels.
• Fix: Extend attribution window to 60-90 days for B2B. Run sensitivity analysis (as described in Step #3) to find the window that captures 90%+ of pre-conversion touchpoints.
Failure Mode #5: Missing Impression Data (Social/Display Blindness)
• Symptom: Facebook and Instagram receive 2-5% attribution credit despite brand lift studies showing they drive 30-40% awareness. Display retargeting gets near-zero credit.
• Root cause: Click-based attribution models only capture touchpoints that generate URL visits. Users who see Facebook ads 5-10 times but don't click until a later Google search appear as single-touchpoint conversions, with Facebook receiving zero credit.
• How to detect: Compare attributed conversions vs. platform-reported view-through conversions. If Facebook Ads Manager reports 400 view-through conversions but your model credits Facebook with 50 conversions, impression influence is missing.
• Fix: Run quarterly lift tests (described below) for impression-heavy channels. Use lift test results to apply a calibration factor: if lift tests show Facebook drives 35% incrementality but your model credits 15%, multiply Facebook's attributed credit by 2.33× to correct for missing impression influence.
How to Run a "Lift Test"
In the models and data mentioned above, we talked about capturing touchpoints via UTM tags. UTM tags occur through clicks, which means that there are channels (mainly social media) that will be underrepresented due to the lack of impressions as a parameter.
This also has a similar impact on display advertising. Visitors mostly convert after viewing your display ads multiple times. These views occur across different content networks.
In order to incorporate impressions into your model, you should consider running lift tests. Channels like Facebook and Instagram rely on impressions more than other channels.
A lift test is a randomized control test. You randomize an audience into a test group and a control group. Ads are shown only to the test group. The difference in conversions between the two groups is known as lift or incrementality. This represents the real impact of a channel's ads on the audience. This approach is based on randomized control trials. It incorporates the concept of causality. This means we know the ads caused the extra conversions.
A good practice is to regularly run lift tests. For example, run them once a quarter. This helps you see the effect of Facebook and Instagram on the conversion journey. Other impression-heavy channels should also be evaluated. Calibrate your attribution model accordingly based on results.
Note: In cases where you run ads and purchases happen on your website, Google allows you to track view-through conversions. You can push them into your data warehouse in aggregated format through Ads Data Hub. On the other hand, if you run an ad campaign but purchases occur on a marketplace (like ), then marketing mix modeling is your go-to option. Use it for assessing your campaign's efficacy. Amazon
With and Without Lift Tests
Both attribution models and lift tests are useful and should work in conjunction to give the best possible results. Here are some of the advantages and limitations of both tools.
| Running Lift Test | Attribution Model without Lift Tests |
|---|---|
| Accurate results | Approximation based on model |
| One data point in time | Can be used on a daily basis |
| Holdout | No holdout |
| Based on results, not on arbitrary rules | Rule-based unless you build a data-driven model |
| Causality | Correlation |
| Baseline (organic, brand effect) is taken into account | Gives little to no credit to organic |
| Impressions are taken into account (but not segregated) | Impressions hard to track (depends on data quality) |
Build vs. Buy: True Cost of Custom Attribution Implementation
Building a custom attribution model in-house requires substantial investment beyond the initial development sprint. Most teams underestimate ongoing maintenance costs, data infrastructure expenses, and opportunity cost of delayed insights.
| Cost Category | DIY Build | Attribution Platform |
|---|---|---|
| Initial build time | 12-18 weeks (1 FTE data engineer) | Implementation in days with platforms like Improvado |
| Upfront development cost | $80,000-$120,000 (engineer salary + opportunity cost) | Included in annual subscription |
| Data warehouse storage | $200-$800/month (Snowflake/BigQuery for path-level data) | Included or minimal incremental cost |
| ETL/data pipeline maintenance | 8-12 hours/month (connector breaks, schema changes) | Zero—vendor maintains 1,000+ connectors |
| Model retraining & validation | 4-6 hours/month (retrain on new data, validate outputs) | Automated—runs nightly or weekly |
| Analyst training | 8-16 hours (SQL queries, model interpretation) | 2-4 hours (platform UI training) |
| Opportunity cost of delayed insights | 3-4 months without attribution data during build | Immediate—attribution reports within first week |
| Total first-year cost | $95,000-$140,000 | Contact sales for custom pricing |
Hidden costs that teams frequently overlook:
• API rate limits: Facebook, Google, LinkedIn APIs throttle requests. Building retry logic and handling rate limit errors adds 20-30% to development time.
• Schema changes: Ad platforms change their data schemas 2-4 times per year. Each breaking change requires emergency fixes to prevent data gaps.
• Cross-device identity resolution: Building deterministic ID matching requires login tracking infrastructure. Probabilistic matching requires machine learning models and continuous calibration.
• When attribution results look wrong (e.g., retargeting credited with 80% of conversions), tracing the error takes significant time. Data pipelines, model logic, and calculation steps must be examined. This debugging process typically requires 8-16 hours per incident. Model debugging:
• Stakeholder trust: Leadership questions custom models ("Why should we trust your black box?"). Building audit trails, documentation, and validation reports to earn buy-in takes significant time.
For teams with >$2M annual marketing spend, attribution platforms deliver ROI within 60-90 days. They surface optimization opportunities that DIY builds wouldn't discover for 6+ months. The 3-4 month implementation delay for custom builds often costs more. This cost comes from suboptimal budget allocation. It exceeds the platform subscription fee.
Model Conflict Resolution: When Attribution Models Disagree
One of the most frustrating practitioner problems is when different attribution approaches produce conflicting channel credit. Shapley assigns Facebook 40% of conversions, Markov assigns 25%, and a recent lift test measured 30% incrementality—which number should guide budget decisions?
This happens because each method measures a different aspect of channel influence:
• Shapley value: Measures average marginal contribution across all possible channel combinations
• Markov chain: Measures removal effect—how many conversions disappear if you eliminate this channel from paths
• Lift tests: Measures true causal incrementality via randomized experiment
When models conflict by <20 percentage points, they're all approximating the same underlying truth and you can use any of them confidently. When models conflict by >20 percentage points, follow this decision tree:
| Scenario | Which Model to Trust | Reasoning |
|---|---|---|
| You have a recent lift test (<90 days old) | Lift test result | Randomized experiments measure causality; attribution models measure correlation. Causality wins. |
| No recent lift test; Shapley and Markov agree within 10pp | Average of Shapley & Markov | Convergence between methods suggests reliable estimate; averaging reduces model-specific bias. |
| Markov credits retargeting 60%+, Shapley credits 30% | Trust Shapley; discount Markov | Markov overweights last-touch channels like retargeting; Shapley is position-invariant. |
| Shapley credits new channel 5%, but it appears in 25% of paths | Ignore Shapley for new channel; use linear proxy | Shapley underweights channels without conversion history; wait 60 days then retrain. |
| Impression-heavy channel (Facebook/display) gets <5% credit | Run lift test to calibrate | Click-based models miss impression influence; only lift tests capture view-through impact. |
| Models trained on <300 conversions/month | Don't trust any data-driven model | Insufficient data for statistical validity; use rule-based (W-shaped, linear) until volume grows. |
When in doubt, use attribution models as directional signals. Don't treat them as absolute truth. If Markov says Facebook drives 30% of conversions and Shapley says 40%, the takeaway is simple. Facebook is a major contributor (30-40%). You don't need to determine if it's 32.7% or 38.4%. Optimize for the right order of magnitude. Avoid false precision.
Setting Up a Custom Attribution Model with Improvado
For enterprise marketing teams, Improvado provides a managed attribution solution. It combines data integration, model training, and visualization in one platform. This eliminates the need for 12-18 weeks of custom development. Teams get production-ready attribution immediately.
How Improvado's Attribution Solution Works:
Improvado connects to 1,000+ marketing and sales data sources. These include Google Ads, Meta, LinkedIn, Salesforce, HubSpot, and analytics platforms. Pre-built connectors automatically extract campaign performance data. They also extract conversion events and revenue data. The platform aggregates this data into a unified schema. It includes proper user-level tracking and touchpoint sequencing. This eliminates the manual ETL work described in Steps 1-2 above.
The attribution engine supports multiple model types: rule-based models (first-touch, last-touch, linear, time-decay, U-shaped, W-shaped) and algorithmic data-driven models (Markov chain, Shapley value). You can switch between models in the UI to compare results without rewriting code or retraining manually.
Key differentiators for enterprise use cases:
• Improvado maps interactions from different personas within a target account. These personas include CMO, CFO, and procurement. It attributes conversions at the account level. It does not attribute conversions at the individual user level. This solves the B2B buying committee problem described earlier. Identity graph for B2B multi-stakeholder attribution:
• Unlike black-box platforms, Improvado provides SQL database access. Analysts can verify exactly how credit is calculated. They can audit model logic. They can build custom attribution rules on top of the base model. Fully auditable attribution:
• No-code for marketers, full-code for analysts: Non-technical users access pre-built attribution dashboards. Data teams can write custom SQL queries, export raw path-level data, and integrate attribution results into existing BI tools (Looker, Tableau, Power BI).
• If you need attribution data from proprietary internal tools or niche platforms, Improvado builds custom connectors in days. This eliminates the need for your engineering team to maintain bespoke integrations. Custom connector builds:
Implementation timeline: Improvado customers are typically operational within a week—connecting data sources, validating data quality, and generating initial attribution reports. This compares to 12-18 weeks for DIY builds or 4-8 weeks for partial vendor solutions that still require significant custom development.
Like all managed platforms, Improvado's attribution models have fixed algorithmic implementations. If your business requires highly specialized attribution logic, you may need to export raw data. Custom weighting rules based on customer segment, product category, or rep seniority exemplify this need. You would build those customizations in SQL on top of Improvado's base models. However, for 90%+ of enterprise use cases, the pre-built model library covers necessary requirements. Custom code is not needed for these implementations. Limitations:
For teams evaluating build-vs-buy decisions, Improvado offers rapid deployment. It provides multi-model flexibility and enterprise data governance. Improvado holds SOC 2 Type II, HIPAA, GDPR, and CCPA certifications. This makes it a strong alternative to multi-month custom development projects. to see how attribution models perform on your actual marketing data. Get a demo
Conclusion
Building a custom marketing attribution model is not a weekend project. It's a multi-month engineering initiative. It requires clean data infrastructure, sufficient conversion volume, and ongoing maintenance. Most teams should start by validating minimum thresholds. You need 300+ conversions/month, 5+ channels, and 80%+ complete path tracking. Meet these before committing resources to custom model development.
For teams that pass the feasibility gate, the process follows three stages. First, structure attribution data with proper schema (user_id, timestamp, channel, conversion_flag, revenue). Second, centralize that data in a warehouse with volume validation. Third, configure attribution windows based on actual purchase cycle data rather than guesses. Data-driven models like Markov chains and Shapley values provide more accurate credit assignment than rule-based alternatives. However, they only work when trained on sufficient data. They must also be validated against lift tests.
The most common failure mode is starting too early. Attempting to build attribution with 100 conversions per month undermines results. Similarly, 3 marketing channels alone gets unreliable outcomes. This damages stakeholder trust. If you don't meet minimum thresholds, implement rule-based attribution immediately. Use W-shaped models for B2B. Use linear models for e-commerce. Fix data collection gaps now. Accumulate 3-6 months of clean conversion paths. Then revisit data-driven models.
For enterprise teams with complex multi-channel ecosystems, attribution platforms like Improvado eliminate the 12-18 week build timeline. They also reduce ongoing maintenance burden. These platforms provide immediate access to multiple attribution models. They offer audit trails and B2B account-level tracking. The opportunity cost of delayed insights often exceeds the cost of a managed solution. Four months without attribution data means four months of suboptimal budget allocation.
Whether you build or buy, the goal is the same. Move from "which platform reported this conversion?" to "which touchpoints statistically increased conversion probability?" That shift requires more than software. It requires clean data, statistical rigor, and honest acknowledgment of what your models can and cannot tell you about causality.
.png)
.png)


.png)
