Marketing Analytics SQL Interview Questions
Marketing analyst interviews are not generic SQL screens. They test whether you can stitch a session to a purchase, calculate ROI against an attribution window you have to defend, build a cohort retention matrix without inflating the rate, and reason about the gap between what an ad platform reports and what your site actually measured. This guide walks the SQL patterns that show up in marketing analyst take-homes and whiteboards, with worked examples on a realistic five-table marketing schema and a 30-mission path that drills each pattern end to end.
If you are interviewing for a growth analyst seat, a CRM or lifecycle analytics role, a performance-marketing analyst at a DTC brand, or a demand-gen analytics seat on a B2B SaaS RevOps team, the patterns below are the ones that show up in the screening round and on the job.
What marketing analytics data actually looks like
A marketing data warehouse is a small star schema with three fact streams hanging off two dimensions. The dimensions are who (the customer) and what (the campaign). The facts are what we sent them (email or ad sends), what they did on the site (sessions), and what they bought (purchases). Almost every marketing SQL question is a join across two of those three facts, grouped by one of the dimensions, with an aggregation on top.
The caseSQL Marketing schema models that shape in five tables: dim_customers (500 rows), dim_campaigns (9 rows with channel and budget), fact_sends (1,012 email sends with opened and clicked flags), fact_sessions (5,000 web sessions with source and landing_page), and fact_purchases (914 orders, some of which are refunds — negative amount rows).
That fifth table is the trap. Refunds are not in a separate returns table the way a textbook would draw it — they are negative rows mixed into the same fact. A junior candidate sums amount and reports gross revenue; a senior candidate notices the negative values and reports net. Most of the missions in this path turn on a similar gap between what the data looks like at first glance and what it actually means.
Question patterns you’ll be asked
Across DTC, B2B, and agency marketing analyst interviews, eight patterns dominate. Each has a distinct SQL shape, and once you can recognize the shape you can almost always write the query. The worked examples below are deliberately adjacent to the exercises in the mission path — same skill, different angle — so you can read the example, then go practice the variant.
Campaign ROI and ROAS
“Which campaign has the best return on spend?” is the most common opener. ROAS is revenue divided by spend; ROI is profit divided by spend. The skill being tested is a LEFT JOIN from campaigns to purchases (so zero-revenue campaigns still appear), a SUM that respects refunds, and a divide-by-zero guard.
SELECT c.campaign_name,
c.channel,
c.budget,
ROUND(SUM(p.amount), 2) AS net_revenue,
ROUND(SUM(p.amount) / NULLIF(c.budget, 0), 2) AS roas
FROM dim_campaigns c
LEFT JOIN fact_purchases p ON c.campaign_id = p.campaign_id
GROUP BY c.campaign_id, c.campaign_name, c.channel, c.budget
ORDER BY roas DESC NULLS LAST;Practice this in Mission #12 — same divide-and-rank skill with an explicit CASE guard for campaigns with zero spend.
Email attribution within a window
“How many purchases did our last email actually drive?” is a date-arithmetic question with a hidden judgment call. The window — seven days, fourteen, thirty — is not a default. It is a definition you have to defend. The technical skill is a join with a temporal predicate; the senior skill is naming the choice out loud.
SELECT COUNT(DISTINCT p.purchase_id) AS attributed_purchases
FROM fact_sends s
JOIN fact_purchases p
ON p.customer_id = s.customer_id
AND p.purchase_date::date BETWEEN
s.sent_at::date AND s.sent_at::date + INTERVAL '7 days'
WHERE s.campaign_id = 1;Drill the seven-day version on Mission #16 — same temporal-join skill with an open question in the debrief about why a fourteen-day window would change the answer.
Multi-touch attribution: last-click vs first-click vs linear
Three attribution models on the same set of conversions, side by side, so marketing and finance can argue on the same data. The SQL pattern is parallel CTEs, each emitting one row per conversion with a different credit-assignment rule, then a final pivot. ROW_NUMBER() partitioned by customer gives you first and last touch; an even split across all touches gives you linear.
WITH touches AS (
SELECT p.purchase_id, p.amount, s.source,
ROW_NUMBER() OVER (PARTITION BY p.purchase_id ORDER BY s.session_start) AS rn_first,
ROW_NUMBER() OVER (PARTITION BY p.purchase_id ORDER BY s.session_start DESC) AS rn_last,
COUNT(*) OVER (PARTITION BY p.purchase_id) AS touch_count
FROM fact_purchases p
JOIN fact_sessions s
ON s.customer_id = p.customer_id
AND s.session_start <= p.purchase_date
)
SELECT source AS channel,
ROUND(SUM(CASE WHEN rn_last = 1 THEN amount END), 2) AS last_click_revenue,
ROUND(SUM(CASE WHEN rn_first = 1 THEN amount END), 2) AS first_click_revenue,
ROUND(SUM(amount / touch_count), 2) AS linear_revenue
FROM touches
GROUP BY source
ORDER BY last_click_revenue DESC;None of these three models is causal. They are credit-allocation rules. A senior candidate volunteers that distinction without being asked — last-click overweights bottom-funnel channels (brand search, retargeting), first-click overweights top-funnel (paid social, content), and linear smooths the difference without telling you which is right. The only causal measurement is incrementality (geo lift or holdout).
Build the full three-way comparison on Mission #27 — Master tier, with the channel-mix shift broken out so you can see which model flatters which channel.
Cohort retention matrix
The canonical growth-analyst question. Bucket users by the month of their first purchase, then count how many came back in each subsequent month. The shape is a triangle: each cohort gets a row, each months-since-first gets a column. The senior-vs-junior tell is the join type — a LEFT JOIN preserves the denominator; an INNER JOIN silently drops non-retained users and inflates the rate.
WITH first_purchase AS (
SELECT customer_id,
DATE_TRUNC('month', MIN(purchase_date::date)) AS cohort_month
FROM fact_purchases
WHERE amount > 0
GROUP BY customer_id
),
activity AS (
SELECT DISTINCT customer_id, DATE_TRUNC('month', purchase_date::date) AS active_month
FROM fact_purchases
WHERE amount > 0
)
SELECT fp.cohort_month,
(EXTRACT(YEAR FROM a.active_month) - EXTRACT(YEAR FROM fp.cohort_month)) * 12
+ (EXTRACT(MONTH FROM a.active_month) - EXTRACT(MONTH FROM fp.cohort_month))
AS months_since_first,
COUNT(DISTINCT a.customer_id) AS active_customers
FROM first_purchase fp
LEFT JOIN activity a ON a.customer_id = fp.customer_id
WHERE a.active_month >= fp.cohort_month
GROUP BY fp.cohort_month, months_since_first
ORDER BY fp.cohort_month, months_since_first;Build the full matrix on Mission #26 — Master tier, with a debrief on why the bottom-right triangle is empty (right-censoring: the latest cohorts have not had time to come back yet).
LTV decomposition by acquisition channel
“Are the customers we acquired through paid social worth what we paid for them?” The skill is a first-touch attribution snapshot (the source of the session that preceded a customer’s first purchase) joined to lifetime revenue, grouped by channel. The judgment call is whether to use revenue or margin — at 50% gross margin, a 3:1 revenue LTV-to-CAC ratio is a 1.5:1 gross-profit ratio. Most candidates miss this.
WITH first_session AS (
SELECT customer_id, source,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY session_start) AS rn
FROM fact_sessions
)
SELECT fs.source AS channel,
COUNT(DISTINCT p.customer_id) AS customers,
ROUND(SUM(p.amount), 2) AS total_revenue,
ROUND(SUM(p.amount) * 1.0
/ NULLIF(COUNT(DISTINCT p.customer_id), 0), 2) AS ltv_per_customer,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN p.purchase_id IS NOT NULL
AND p.amount > 0
AND p.customer_id IN (
SELECT customer_id FROM fact_purchases
GROUP BY customer_id HAVING COUNT(*) > 1)
THEN p.customer_id END)
/ NULLIF(COUNT(DISTINCT p.customer_id), 0), 1) AS repeat_rate_pct
FROM first_session fs
LEFT JOIN fact_purchases p ON p.customer_id = fs.customer_id
WHERE fs.rn = 1
GROUP BY fs.source
ORDER BY ltv_per_customer DESC;Mission #28 walks the full decomposition with a cleaner repeat-rate computation and a debrief on margin vs revenue LTV.
Send fatigue: engagement decay by send rank
Do customers open less the more we email them? Bucket each customer’s sends by their send sequence number, then compute open rate per bucket. The SQL skill is a window function (ROW_NUMBER() partitioned by customer, ordered by sent_at) plus a CASE bucket. The currency caveat is bigger than the SQL: since Apple Mail Privacy Protection launched in September 2021, open rates have been inflated for Apple Mail users — the mail app pre-fetches the tracking pixel whether or not the user looked at the message. Most senior teams now report opens with an asterisk and use clicks as the primary engagement signal.
WITH ranked AS (
SELECT customer_id, opened, clicked,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY sent_at) AS send_rank
FROM fact_sends
)
SELECT CASE
WHEN send_rank = 1 THEN '1'
WHEN send_rank BETWEEN 2 AND 3 THEN '2-3'
WHEN send_rank BETWEEN 4 AND 6 THEN '4-6'
ELSE '7+'
END AS send_bucket,
COUNT(*) AS sends,
ROUND(100.0 * SUM(opened) / COUNT(*), 1) AS open_rate_pct,
ROUND(100.0 * SUM(clicked) / NULLIF(SUM(opened), 0), 1) AS click_through_open_pct
FROM ranked
GROUP BY send_bucket
ORDER BY MIN(send_rank);Mission #29 is the Master-tier version with the four buckets and a click-through-open metric that is honest about the MPP problem.
UTM source normalization
Real session data has “facebook,” “Facebook,” “fb,” and “FB_Ads” all pointing at the same channel. Reporting them separately is wrong; collapsing them is a CASE + LOWER + TRIM normalization step. Skipping it is a recurring cause of wrong board reports.
SELECT CASE
WHEN LOWER(TRIM(source)) IN ('facebook', 'fb', 'fb_ads') THEN 'facebook'
WHEN LOWER(TRIM(source)) IN ('google', 'google_ads') THEN 'google'
WHEN source IS NULL OR TRIM(source) = '' THEN 'direct'
ELSE LOWER(TRIM(source))
END AS clean_source,
COUNT(*) AS session_count
FROM fact_sessions
GROUP BY clean_source
ORDER BY session_count DESC;Practice on Mission #17 — same normalization skill with the full set of dirty values your seed data actually contains.
Duplicate-customer detection
When the CRM ingests from two systems without dedupe, the same human ends up with two customer_ids and one email. Every per-customer metric ends up double-counting them. The SQL pattern is GROUP BY email HAVING COUNT(*) > 1, but the interview pivot is what to recommend — collapse, retain the older record, or flag for ops review. A senior candidate gives an answer; a junior candidate gives a row count.
SELECT email, COUNT(*) AS duplicate_count
FROM dim_customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC, email;Mission #21 walks the full detection plus a debrief on why this matters for cohort retention and LTV.
Common gotchas
Five traps catch candidates who are otherwise solid SQL writers. Each comes from assuming marketing data behaves like a generic transactional system; it does not. Most interviewers do not state these as rules — they watch which assumption you make and then ask follow-up questions to see whether you noticed. Knowing the traps in advance is most of the battle.
Refunds are negative rows, not a separate table
SUM(amount) on fact_purchases gives you net revenue because refunds carry negative values. If the question is gross revenue, you need SUM(CASE WHEN amount > 0 THEN amount END). For most marketing reporting net is the right answer, but interviewers ask both and watch which one you produce by default. A high-return SKU (apparel, supplements) can quietly destroy LTV-to-CAC if you compute LTV on gross.
Open rates have been broken since 2021
Apple Mail Privacy Protection pre-loads tracking pixels on the server, so every Apple Mail user counts as an “open” whether or not they actually looked at the email. The industry-wide effect is that open rates are inflated and Apple Mail cohorts look identical regardless of subject line. Senior practice is to report opens with that caveat and use clicks (or click-to-open rate) as the engagement signal that actually moves with quality. If a mission brief asks “is the subject line working,” clicks is the answer.
“Direct” traffic is not a marketing channel
A session with a NULL or empty source is labeled “direct,” but most direct traffic is not someone typing the URL from memory. It is a Slack or iMessage link that strips the referrer, an email click from a client that drops UTMs, or a brand-search organic click your tracking lost. Treating the direct bucket as a marketing channel inflates its apparent value and erodes the budget of the channel that actually drove the visit. The interview pivot is recommending UTM hygiene and brand-search separation, not just reporting the number.
Attribution windows are a definition, not a default
Seven days for email attribution. Ninety days for ad-platform click attribution. One hundred and twenty days for B2B pipeline. None of these are right; all of them are choices the analyst makes and then has to defend. A query that hard-codes seven days without surfacing the choice produces the wrong revenue number for half the stakeholders in the room. When an interviewer asks “why seven?” the right answer is the business rationale, not “that’s what was in the spec.”
Platform-reported and site-measured conversions do not match
Meta’s conversion count, Google Ads’ conversion count, and your own fact_purchases count will not agree. Post-iOS-14.5 and post-Consent Mode v2 (enforced July 2025), the gap is routinely 20 to 60 percent. The platforms model the missing conversions; the site measures only what fired. Senior practice is to report site-measured numbers in board reporting and use platform-reported numbers only for in-platform bid optimization. A candidate who claims one number is “right” without naming this gap is signaling they have not seen real data.
Schema diagram
The five-table star at a glance. Two dimensions ring three fact streams; every mission in this path is a join across one or two of the facts grouped by a dimension.
Dashed lines are dimension joins. dim_customers joins to all three facts via customer_id; dim_campaigns joins to sends and purchases via campaign_id. Sessions intentionally carry only a customer link, not a campaign link — sessions come in through source, which is the dirty UTM column.
What’s changed since 2024
The measurement landscape has shifted more between 2023 and 2026 than in the prior decade. Interview questions are catching up. Candidates who can speak to the following shifts signal that they have been reading the field, not just grinding puzzle SQL.
Google Consent Mode v2 enforcement (July 2025)
Google began actively disabling remarketing, conversion tracking, and demographic reporting for accounts not transmitting ad_user_data and ad_personalization signals on EEA and UK traffic. Conversions are now modeledfor non-consenting users — typical recovery is 30 to 70 percent of “lost” conversions, but only with Advanced Consent Mode and enough daily volume. For any analyst working with European traffic, this is now table-stakes knowledge.
MMM renaissance via open source
Google’s Meridian (released January 2025) and Meta’s Robyn have brought marketing mix modeling out of six-figure consulting engagements and into the warehouse. Both are Bayesian regression frameworks with adstock and saturation curves. The interview implication is not that you need to build an MMM — it is that you should understand what MMM produces (channel contribution percentages, saturation curves) and how that output differs from multi-touch attribution (MMM measures incremental lift; MTA assigns rule-based credit). When MMM says paid social drove 22 percent and MTA says 38 percent, both can be right under their own definitions.
Cookie deprecation was reversed, but it does not matter
Google formally abandoned Chrome third-party cookie deprecation in April 2025 and deprecated the Privacy Sandbox API stack in October 2025. But Safari and Firefox already block third-party cookies; roughly 30 percent of browser traffic was already cookieless before counting ad blockers. The practical effect is that cross-site tracking is functionally broken even though Chrome did not ship the change. The industry has moved on to first-party data, server-side tracking, and incrementality. If a candidate brings up “the cookie change” as a hypothetical, they have been off the field for a year.
Retail media at scale
US retail media ad spend hit roughly $60 billion in 2025 and is projected to reach $100 billion by 2028. Amazon, Walmart Connect, Roundel, Kroger Precision Marketing and 200-plus smaller networks now feed closed-loop attribution from retailer first-party data back to advertisers. For CPG and DTC analyst roles (Sephora, P&G, agency seats), fluency with retail-media measurement and iROAS is increasingly the seat-defining skill. The schemas are proprietary and the methodologies vary by retailer, so most of the work is reconciliation.
LLM-assisted SQL is the workflow, not the threat
Per dbt Labs’ 2026 State of Analytics Engineering, 72 percent of analytics teams now prioritize AI-assisted coding and 77 percent of leaders do. The interview implication is that writing the SQL is no longer the differentiator — explaining why this SQL and not that SQL is. Picking the right attribution window, naming the grain, defending the definition, recommending an action — those are the senior skills, and they are exactly what generic puzzle platforms cannot teach.
Full mission list
All 30 Marketing missions are listed above ↑ on this page, ordered from starter SQL through the Master-tier capstone. Each mission opens an interactive editor wired to a real SQLite copy of the schema described in this guide — same fact tables, same row counts, same dimension columns. Pick the pattern you want to drill, click in, and the briefing tells you exactly which KPI you are computing and which stakeholder is asking.
FAQ
What SQL skills do marketing analyst interviews actually test?
The recurring set is conditional aggregation (ROI, open rate, conversion rate), date-difference math (attribution windows, cohort retention), window functions (multi-touch attribution, send fatigue, LTV by first touch), and data-cleaning judgment (UTM normalization, duplicate emails, refunds as negative rows). Senior screens add definitional defense — naming the attribution window you chose and explaining why none of the multi-touch models are causal.
How do you calculate campaign ROI in SQL?
SUM(amount) from purchases grouped by campaign_id, divided by the campaign budget, with a NULLIF guard for zero spend. Use a LEFT JOIN so campaigns with no purchases still appear in the report — they are usually the interesting rows. Mission #12 walks the full version.
How do you build a multi-touch attribution comparison in SQL?
Number each customer’s sessions with ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY session_start) ascending and descending to identify first and last touch. Count touches per conversion with COUNT(*) OVER for the linear-credit divisor. Then pivot revenue across the three models in a single GROUP BY. Mission #27 has the canonical setup.
What is a cohort retention matrix in SQL?
Group customers by the month of their first purchase, then count how many came back in each subsequent month. The output is a triangular matrix — rows are cohorts, columns are months-since-first, and the bottom-right triangle is empty because the latest cohorts have not had time to come back yet. Use a LEFT JOIN to preserve the denominator; an INNER JOIN drops non-retained users and inflates the rate. Mission #26 builds the full matrix.
Why are open rates considered unreliable now?
Since Apple Mail Privacy Protection launched in September 2021, Apple Mail pre-fetches the tracking pixel on every email regardless of whether the user opened it. That inflates the open count for any cohort with significant Apple Mail share — which is most consumer audiences. Senior practice is to report opens with a caveat and use clicks (or click-to-open rate) as the primary engagement signal.
How is marketing analyst SQL different from generic SQL interviews?
Generic SQL screens test mechanics on a synthetic schema. Marketing SQL questions test whether you understand a specific data flow — send to session to purchase — and the operational meaning of the numbers you produce. The same SUM(CASE WHEN...) that gets full marks on LeetCode will lose points here if you compute net revenue without realizing refunds are negative rows, or claim a multi-touch model is causal when it is not.
What does a marketing analyst SQL take-home look like?
Most are 60 to 90 minutes against a marketing star-schema mock. Expect three or four tasks: a campaign performance roll-up with ROI or ROAS, a funnel or attribution question, a cohort or LTV decomposition, and one stretch question that involves a window function or a definitional defense (“our analyst says last-click; finance says first-touch; reconcile”).
Where do B2C and B2B marketing analyst interviews diverge?
B2C interviews center on session-to-purchase attribution, cohort retention, LTV by acquisition channel, and email or paid-social efficiency. B2B interviews center on the MQL to SQL to opportunity to closed-won funnel, marketing-sourced versus marketing-influenced pipeline, and Salesforce-shaped data (accounts, leads, opportunities, campaign members). The SQL skills overlap heavily; the framing and the grain change.
Ready to drill the patterns?
Thirty missions, real SQLite schema, instant feedback. Walk from the customer-table preview through multi-touch attribution, the cohort retention matrix, and the year-end board KPI capstone.