Marketing

SQL for Marketing Analysts: The Complete Guide

The queries, schemas, and mental models marketing analysts use every week — with exercises against a realistic campaign database.

9 min read

Marketing analytics lives in a narrow band: big enough that spreadsheets buckle, small enough that the data team doesn’t prioritize it. Which means the person closest to the numbers — you — needs to pull them yourself. SQL is how. This guide is the shape of the role distilled to its query-writing parts: the five reports you’ll run every week, the way attribution actually gets computed, and the data-quality gotchas that make senior analysts cautious and junior analysts confident.

Everything below assumes a star schema that looks like most modern marketing warehouses: dim_customers, dim_campaigns, fact_sends, fact_sessions, fact_purchases. If you’ve worked with a dbt project, a Snowflake data model, or the caseSQL mission database, the shape will be familiar.

Why Excel runs out before SQL does

Excel is an excellent tool up to about 100,000 rows and one dimension of analysis. Past that, three things break at once: pivot performance, referential integrity (lookups silently fail when the sheet exceeds a cache boundary), and shareability — the file becomes the single source of truth for something it was never designed to own.

SQL solves each of those, but the real reason to learn it is different: SQL lets you ask the next question. In Excel, the fastest path is to pick a pivot, format it, and move on. In SQL, you can always go one layer deeper — group by a new dimension, join a new table, filter by a cohort — in the same window. That’s what senior analysts do all day, and it’s the reason recruiters screen for SQL before Tableau: the query language is the skill; the dashboard tool is the output.

The five queries every marketing analyst writes weekly

Every marketing analytics job has five query templates that come up over and over. Mastering the templates (not memorizing exact SQL) is 80% of what makes the role feel mechanical after the first three months.

  1. Campaign performance snapshot — sends, opens, clicks, conversions, revenue, CPA, ROAS per campaign, last 30 days.
  2. Channel mix over time — revenue by channel, week-over-week, so leadership can see if email is drifting while paid picks up.
  3. Cohort retention — customers acquired in a month, tracked across their first N weeks of activity.
  4. Segment deep-dive — a specific audience (new vs returning, region, plan tier) compared against the overall baseline.
  5. Funnel drop-off — session to add-to-cart to checkout to purchase, ideally split by traffic source.

The first one is the foundation. If you can write it cleanly, you can adapt it to the other four by swapping the GROUP BY column, adding a time dimension, or filtering on a segment.

Exercise

Write a query that shows the top 5 campaigns by total revenue in the last 30 days. Include campaign name, total sends, total purchases, total revenue, and revenue per send (a rough efficiency metric).

Schema hint

fact_sends(campaign_id, customer_id, sent_at), fact_purchases(customer_id, campaign_id, purchase_amount, purchased_at), dim_campaigns(id, name)

Expected

Five rows, ordered by revenue descending. Columns: campaign_name, sends, purchases, revenue, revenue_per_send.

Show solution
SELECT
  c.name AS campaign_name,
  COUNT(DISTINCT s.customer_id) AS sends,
  COUNT(DISTINCT p.customer_id) AS purchases,
  COALESCE(SUM(p.purchase_amount), 0) AS revenue,
  ROUND(COALESCE(SUM(p.purchase_amount), 0) * 1.0 / NULLIF(COUNT(DISTINCT s.customer_id), 0), 2) AS revenue_per_send
FROM dim_campaigns c
LEFT JOIN fact_sends s
  ON s.campaign_id = c.id
 AND s.sent_at >= DATE('now', '-30 days')
LEFT JOIN fact_purchases p
  ON p.campaign_id = c.id
 AND p.purchased_at >= DATE('now', '-30 days')
GROUP BY c.name
ORDER BY revenue DESC
LIMIT 5;

Two subtle details to notice. First, COUNT(DISTINCT customer_id) not COUNT(*) — sends and purchases both have duplicate-row risk if a customer was hit twice. Second, NULLIF(..., 0) on the denominator so a zero-send campaign doesn’t crash the query; it just returns null. Those two habits alone mark the gap between a mid-level and a senior analyst.

Attribution in SQL: last-touch, first-touch, and reality

Attribution is where marketing analytics stops being arithmetic and starts being policy. The question "which campaign got credit for this purchase" has no mathematically correct answer — only a convention your org agrees on. The three conventions you’ll implement: last-touch (credit the last campaign before purchase), first-touch (credit the first campaign the customer ever saw), and linear (split credit evenly across all touches).

In SQL, all three are window-function problems. For last-touch, you want the most recent fact_sends row before each fact_purchases row. ROW_NUMBER() partitioned by customer, ordered by send date descending, filtered to rows before the purchase, take the first one. Once you’ve done that pattern three times, it becomes muscle memory.

Exercise

Compute last-touch attribution: for each purchase, identify the campaign of the most recent send to that customer before the purchase. Return purchase_id, customer_id, campaign_name, and send_to_purchase_days.

Schema hint

fact_sends(campaign_id, customer_id, sent_at), fact_purchases(id, customer_id, purchased_at), dim_campaigns(id, name). Assume each purchase has at least one preceding send.

Expected

One row per purchase, joined to its last-touch campaign. send_to_purchase_days is the integer gap between send and purchase.

Show solution
WITH ranked_sends AS (
  SELECT
    s.customer_id,
    s.campaign_id,
    s.sent_at,
    p.id   AS purchase_id,
    p.purchased_at,
    ROW_NUMBER() OVER (
      PARTITION BY p.id
      ORDER BY s.sent_at DESC
    ) AS rn
  FROM fact_purchases p
  JOIN fact_sends s
    ON s.customer_id = p.customer_id
   AND s.sent_at    <= p.purchased_at
)
SELECT
  r.purchase_id,
  r.customer_id,
  c.name AS campaign_name,
  CAST(JULIANDAY(r.purchased_at) - JULIANDAY(r.sent_at) AS INTEGER) AS send_to_purchase_days
FROM ranked_sends r
JOIN dim_campaigns c ON c.id = r.campaign_id
WHERE r.rn = 1
ORDER BY r.purchased_at DESC;

First-touch is the same query with ORDER BY sent_at ASC. Linear attribution is trickier: you count how many touches a purchase had, then divide the purchase amount by that count. If your org ever asks for time-decay or U-shaped attribution, you’re implementing a weighted version of linear — same SQL skeleton, different weights.

Data quality is half the job

Here’s what tutorials don’t tell you: most of the time a marketing query looks wrong, the query is right and the data is wrong. Refunded purchases left in the fact table. Campaign IDs that don’t match dim_campaigns because the campaign was renamed. Customers who show up twice because the CRM merged an old record. Timezone drift so a "last 30 days" window is actually 29.5.

Senior analysts assume the data lies until proven honest. Before they report a number upward, they run three checks:

  • Scale check — does the order of magnitude match what I expect? If last week was $500K and this query says $5K, something is filtering too aggressively.
  • Delta check — is the week-over-week change believable? A 300% spike without a launch is almost always a join duplicating rows.
  • Edge check — what does this look like for the top customer / top campaign / most recent day? Outliers expose bugs faster than averages.

From queries to insights: the analyst-sense layer

The skill that separates a junior marketing analyst from a senior one isn’t writing more complex SQL. It’s noticing which number in a report matters and which is noise. That judgment comes from pattern recognition, which comes from seeing the same query under lots of different conditions — normal weeks, holiday spikes, bug weeks, refund waves, data-outage Wednesdays.

The fastest way to build that pattern-recognition is to write one query, then run 10 variations of it. What if I segment by new vs returning? What if I split paid vs organic? What if I look at the last 7 days instead of 30? Each variation teaches you what the numbers normally look like, so when they don’t, you notice.

Exercise

Segment the top campaigns from exercise 1 by customer tenure: split each campaign’s revenue between new customers (first purchase ever) and returning customers. Which campaigns over-index on new-customer revenue vs the baseline?

Schema hint

Use a subquery or CTE to find each customer’s first purchase date from fact_purchases. Compare it to the purchase date in the main query.

Expected

One row per campaign, columns: campaign_name, new_customer_revenue, returning_customer_revenue, new_pct (share of revenue from new customers).

Show solution
WITH first_purchase AS (
  SELECT
    customer_id,
    MIN(purchased_at) AS first_purchased_at
  FROM fact_purchases
  GROUP BY customer_id
),
tagged AS (
  SELECT
    p.campaign_id,
    p.purchase_amount,
    CASE
      WHEN p.purchased_at = fp.first_purchased_at THEN 'new'
      ELSE 'returning'
    END AS customer_segment
  FROM fact_purchases p
  JOIN first_purchase fp ON fp.customer_id = p.customer_id
  WHERE p.purchased_at >= DATE('now', '-30 days')
)
SELECT
  c.name AS campaign_name,
  SUM(CASE WHEN t.customer_segment = 'new'       THEN t.purchase_amount ELSE 0 END) AS new_customer_revenue,
  SUM(CASE WHEN t.customer_segment = 'returning' THEN t.purchase_amount ELSE 0 END) AS returning_customer_revenue,
  ROUND(
    SUM(CASE WHEN t.customer_segment = 'new' THEN t.purchase_amount ELSE 0 END) * 100.0
    / NULLIF(SUM(t.purchase_amount), 0),
    1
  ) AS new_pct
FROM tagged t
JOIN dim_campaigns c ON c.id = t.campaign_id
GROUP BY c.name
ORDER BY new_customer_revenue DESC;

When you run this against a real dataset, two things happen. First, you start to develop a feel for what a "normal" new-customer share looks like for your org (usually 10–30%). Second, when a campaign comes in at 70%, you know immediately whether it’s a genuinely great acquisition campaign or whether the campaign ID got re-used and the "customers" are really old ones mislabeled. That reflex — does this number behave? — is what your manager is paying for.

If you want to build that reflex against a realistic marketing star schema with planted data-quality issues, the Marketing Analytics path on caseSQL is the fastest way. 25 missions, each one a real analyst ask, validated against multiple accepted approaches so you see the trade-offs rather than a single "right answer."

A checklist you can steal

Before any marketing query leaves your laptop, run this three-line mental pass. It takes 20 seconds and catches 80% of the bugs that make junior analysts look unreliable:

  1. Row fanning — do any of your joins multiply rows? (Run the query, then re-run it with COUNT(*) instead of your aggregate. If the count is higher than the row count of your primary table, a join is fanning out.)
  2. Window semantics — does "last 30 days" mean 30 calendar days in UTC, 30 days in the stakeholder’s local time, or "since the 1st of last month"? Match the number your dashboard already reports.
  3. Null arithmetic — does your ratio have NULLIF(denominator, 0)? If not, one zero-row partition crashes the query or returns the wrong row.

None of these three are advanced SQL. All three are the difference between a query that looks right and a query that is right.

Build these reflexes against real data.

caseSQL runs 100+ missions against a realistic star schema with planted data-quality issues. Free, in-browser, no account needed.

Keep reading