Attribution is the marketing analyst’s most frequently-asked and most-poorly-defined task. "Which channel drove the purchase?" — five stakeholders, five different mental models, one dashboard that has to pick one. This post is the SQL side of that problem: how each of the five common attribution models actually gets computed, where each one is honest, where each one lies, and the three biases that warp every model regardless of which one you pick.
Shared data model for every example: fact_sessions(session_id, customer_id, channel, session_at) tracks every touchpoint; fact_purchases(purchase_id, customer_id, amount, purchased_at) tracks conversions. A "customer journey" is the ordered set of sessions for a given customer up to a given purchase.
Attribution is a definitional question, not a SQL question
Before you write any SQL, write down the definition you’re being asked to operationalize. "Which channel drove the purchase?" is not a definition — it’s a debate. The five models below are each a concrete answer to that debate, and the model you pick will change your dashboard numbers by 30–60% on the same underlying data.
Last-touch and first-touch
The two simplest models, and by far the most common in ad-hoc reporting. Last-touch assigns the full conversion credit to the most recent channel before the purchase; first-touch assigns it to the channel that opened the journey.
-- Last-touch attribution by channel
WITH journey AS (
SELECT
p.purchase_id,
p.amount,
s.channel,
ROW_NUMBER() OVER (
PARTITION BY p.purchase_id
ORDER BY s.session_at DESC
) AS rn
FROM fact_purchases p
JOIN fact_sessions s
ON s.customer_id = p.customer_id
AND s.session_at <= p.purchased_at
)
SELECT channel, SUM(amount) AS attributed_revenue
FROM journey
WHERE rn = 1 -- most recent session only
GROUP BY channel
ORDER BY attributed_revenue DESC;First-touch is structurally identical — swap ORDER BY s.session_at DESC to ASC. Both are trivially fast, deeply biased, and the right answer for specific questions. Last-touch over-credits bottom-of-funnel channels (direct, brand search). First-touch over-credits awareness channels (display, content). Neither tells you the truth about a multi-touch journey.
Linear attribution
Linear gives every touchpoint in the journey equal credit. If a purchase had 4 sessions before it — Google, email, direct, paid social — each channel gets 25% of the revenue. Harder to manipulate than first/last-touch, much harder to defend ("why is email getting credit for a direct-type-in purchase?") but structurally more honest about multi-touch reality.
-- Linear attribution: divide the purchase amount across every touch
WITH journey AS (
SELECT
p.purchase_id,
p.amount,
s.channel,
COUNT(*) OVER (PARTITION BY p.purchase_id) AS touches
FROM fact_purchases p
JOIN fact_sessions s
ON s.customer_id = p.customer_id
AND s.session_at <= p.purchased_at
)
SELECT
channel,
SUM(amount * 1.0 / touches) AS attributed_revenue
FROM journey
GROUP BY channel
ORDER BY attributed_revenue DESC;Position-based and time-decay
Position-based (aka "U-shaped") gives 40% to first-touch, 40% to last-touch, and splits the remaining 20% across middle touches. Time-decay weights recent touches more heavily using a decay function (typically half-life of 7 days). These are the "compromise" models; they don’t make anyone fully happy but they don’t enrage anyone either.
-- Time-decay attribution: weight by exp(-ln(2) * days_before_purchase / 7)
-- Half-life of 7 days means a touch 7 days before the purchase gets 50%
-- of the credit of a touch on the purchase day.
WITH journey AS (
SELECT
p.purchase_id,
p.amount,
s.channel,
EXTRACT(EPOCH FROM (p.purchased_at - s.session_at)) / 86400.0 AS days_before
FROM fact_purchases p
JOIN fact_sessions s
ON s.customer_id = p.customer_id
AND s.session_at <= p.purchased_at
),
weighted AS (
SELECT
purchase_id,
channel,
amount,
EXP(-0.693 * days_before / 7.0) AS raw_weight
FROM journey
),
normalized AS (
SELECT
purchase_id,
channel,
amount,
raw_weight / SUM(raw_weight) OVER (PARTITION BY purchase_id) AS weight
FROM weighted
)
SELECT channel, SUM(amount * weight) AS attributed_revenue
FROM normalized
GROUP BY channel
ORDER BY attributed_revenue DESC;The normalization step is the one that’s easy to forget. If you don’t divide by the sum of weights per purchase, your total attributed revenue won’t equal total revenue — and a CFO who spot-checks that exact sum will catch you immediately.
The three biases every model inherits
Regardless of which model you pick, the underlying data has three biases that your SQL can’t fix. Knowing about them is the difference between a trusted attribution number and a questioned one.
- The direct-type-in problem. Sessions with
channel = 'direct'are a mix of genuine brand loyalty, expired UTM parameters, iOS privacy stripping, and apps that don’t pass referrer info. Direct is systematically over-counted in every model except first-touch. - The cross-device problem. A customer researches on phone, buys on desktop. Your sessions table has two customer_ids that look like two separate journeys. De-duplication is imperfect — and every attribution model on an un-dedupled sessions table is structurally biased away from awareness channels that run on one device and toward conversion channels that run on the other.
- The view-through problem. Display impressions that didn’t get clicked but influenced the decision don’t show up in
fact_sessionsat all. If your company spends heavily on display, first-touch and linear are both under-counting it, sometimes dramatically.
For every purchase, compute both last-touch AND first-touch attributed channel in one query. Then compute the percentage of purchases where the two models disagree. A high disagreement rate is the signal that your choice of attribution model matters a lot for this business.
fact_purchases(purchase_id, customer_id, amount, purchased_at), fact_sessions(session_id, customer_id, channel, session_at).
A single scalar: the share of purchases where first-touch channel ≠ last-touch channel, across all purchases that had at least one prior session.
Show solution
WITH journey AS (
SELECT
p.purchase_id,
FIRST_VALUE(s.channel) OVER (
PARTITION BY p.purchase_id ORDER BY s.session_at
) AS first_channel,
FIRST_VALUE(s.channel) OVER (
PARTITION BY p.purchase_id ORDER BY s.session_at DESC
) AS last_channel
FROM fact_purchases p
JOIN fact_sessions s
ON s.customer_id = p.customer_id
AND s.session_at <= p.purchased_at
),
deduped AS (SELECT DISTINCT purchase_id, first_channel, last_channel FROM journey)
SELECT
AVG(CASE WHEN first_channel <> last_channel THEN 1.0 ELSE 0 END) AS disagreement_rate
FROM deduped;A disagreement rate above 40% means the model choice is reshaping a material fraction of your reported numbers. At that point the right analyst move is not to pick a model — it’s to report two numbers side-by-side and force the conversation about which question you’re actually answering.