If you’re measuring your SQL progress, there’s a single gate that separates analysts who can handle a real job from analysts who stall: window functions. Before window functions, you’re writing aggregates and self-joins; after, you’re writing queries that would have taken a junior two stabs and a coffee break in 15 lines of readable SQL. This post covers the three window-function patterns that do 90% of real analyst work, by use case — running totals, top-N-per-group, and LAG/LEAD for growth math.
Why window functions are the career milestone
The question every window function answers is: for each row, what’s the value of some expression over a related group of rows? "Related" is flexible — it might be "all rows for the same customer," or "all rows before this one chronologically," or "all rows in the same month." Once you have that frame, you can attack every real-world analytics problem that SQL can solve.
Before window functions, analysts solved these problems with self-joins, correlated subqueries, or by pulling data into Python for post-processing. All three work; all three are either slow, ugly, or both. Window functions replace all of them with a syntax that runs on the database engine, scales to any row count the engine can handle, and reads in order from top to bottom the way your brain wants to read SQL.
PARTITION BY: the one mental model
Everybody learning window functions eventually has the same "oh" moment. It goes like this: PARTITION BY is just GROUP BY that doesn’t collapse the rows. In a GROUP BY query, you get one row per group. In a PARTITION BY query, you get all the original rows back, each annotated with its group’s aggregate.
That’s the whole mental model. Once it clicks, every window function makes sense. Want a running total of revenue by customer? Partition by customer, order by date, sum the amount. Want each customer’s total revenue duplicated onto every one of their purchase rows? Partition by customer, no order by, sum the amount. Want the largest single purchase for each customer stamped onto every row? Same pattern with MAX.
For every purchase, return the purchase amount alongside a running total of that customer’s spend, ordered by purchase date. Also return the customer’s lifetime total (same on every row) and the purchase’s percent of lifetime.
fact_purchases(customer_id, purchase_amount, purchased_at). You need two window specs on the same SELECT.
One row per purchase. Columns: customer_id, purchased_at, amount, running_total, lifetime_total, pct_of_lifetime.
Show solution
SELECT
customer_id,
purchased_at,
purchase_amount AS amount,
SUM(purchase_amount) OVER (
PARTITION BY customer_id
ORDER BY purchased_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
SUM(purchase_amount) OVER (PARTITION BY customer_id) AS lifetime_total,
ROUND(
purchase_amount * 100.0
/ NULLIF(SUM(purchase_amount) OVER (PARTITION BY customer_id), 0),
1
) AS pct_of_lifetime
FROM fact_purchases
ORDER BY customer_id, purchased_at;The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the default when you include ORDER BY in a window with a cumulative aggregate, but making it explicit is worth it — partly for readability, partly because the default is different across dialects and you’ll forget which is which.
ROW_NUMBER vs RANK vs DENSE_RANK
The second canonical window-function pattern is "top N per group." Top 3 campaigns by revenue in each quarter. Top 5 customers by spend in each region. Most recent purchase per customer. All of them are some flavor of: partition, order, pick the top rows.
The three ranking functions differ only in how they handle ties:
- ROW_NUMBER — always gives distinct integers even when values tie. Row with the later timestamp wins (or whatever tiebreaker the ORDER BY has).
- RANK — ties get the same number, and the next non-tied row skips ahead. Two people tied at rank 1? Next person is rank 3.
- DENSE_RANK — ties get the same number, but the next rank is sequential. Two people tied at rank 1? Next person is rank 2.
99% of the time, you want ROW_NUMBER. The only common case for RANK is when you’re asked for "the top 3" and want to include all ties at rank 3. DENSE_RANK mostly shows up in sports-ranking problems and very rarely in real analyst work.
For each campaign, return its top 3 purchases by amount. Ties broken by purchased_at (earlier wins). Use ROW_NUMBER so ties don’t give you more than 3 rows per campaign.
fact_purchases(campaign_id, customer_id, purchase_amount, purchased_at). Standard "filter on the window" pattern requires a CTE or subquery — you can’t put window functions in a WHERE clause directly.
Up to 3 rows per campaign. Columns: campaign_id, customer_id, purchase_amount, purchased_at, rank.
Show solution
WITH ranked AS (
SELECT
campaign_id,
customer_id,
purchase_amount,
purchased_at,
ROW_NUMBER() OVER (
PARTITION BY campaign_id
ORDER BY purchase_amount DESC, purchased_at ASC
) AS rn
FROM fact_purchases
)
SELECT
campaign_id,
customer_id,
purchase_amount,
purchased_at,
rn AS rank
FROM ranked
WHERE rn <= 3
ORDER BY campaign_id, rn;The CTE-then-filter pattern is worth burning into your fingers. You’ll use it for every top-N-per-group query, first-per-customer query, and most-recent-per-entity query for the rest of your career. Note that the WHERE clause filters on the computed rn column — SQL doesn’t allow window functions inside WHERE directly, which is why the CTE is necessary rather than optional.
LAG, LEAD, and growth math
The third pattern is the one that quietly earns analysts their seniors stripes: any time you need to compare a row to the row before or after it, you want LAG or LEAD. Month-over-month growth. Week-over-week change. Time between consecutive purchases. Session-to-next-session gap.
Both functions take an offset (default 1) and optionally a default value for when there is no previous row (usually NULL). LAG(x) is "the value of x in the previous row." LEAD(x) is "the value of x in the next row." With a PARTITION BY, "previous" and "next" are scoped to the group. With an ORDER BY, they’re scoped to the sort order.
Month-over-month revenue is the canonical LAG query and one you’ll write dozens of times. The skeleton never changes:
WITH monthly AS (
SELECT STRFTIME('%Y-%m', date_col) AS month,
SUM(metric) AS total
FROM source_table
GROUP BY month
)
SELECT month,
total,
LAG(total) OVER (ORDER BY month) AS prev,
total - LAG(total) OVER (ORDER BY month) AS delta,
ROUND((total - LAG(total) OVER (ORDER BY month)) * 100.0
/ NULLIF(LAG(total) OVER (ORDER BY month), 0), 1) AS pct
FROM monthly
ORDER BY month;Swap the metric and the source table and this is half of every business-review slide deck ever built.
Compute revenue growth per campaign per month: for each (campaign, month) pair, the revenue, the previous month’s revenue for that campaign, and the MoM percent change. A campaign that didn’t run the previous month should show NULL for the prior value.
Two windows: group by campaign_id and month first (a plain GROUP BY), then LAG partitioned by campaign_id ordered by month.
One row per (campaign, month) pair. Columns: campaign_id, month, revenue, prev_month_revenue, mom_pct.
Show solution
WITH monthly AS (
SELECT
campaign_id,
STRFTIME('%Y-%m', purchased_at) AS month,
SUM(purchase_amount) AS revenue
FROM fact_purchases
GROUP BY campaign_id, month
)
SELECT
campaign_id,
month,
revenue,
LAG(revenue) OVER (PARTITION BY campaign_id ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue) OVER (PARTITION BY campaign_id ORDER BY month)) * 100.0
/ NULLIF(LAG(revenue) OVER (PARTITION BY campaign_id ORDER BY month), 0),
1
) AS mom_pct
FROM monthly
ORDER BY campaign_id, month;Watch what happens with PARTITION BY campaign_id: January-to-February comparisons stay within the same campaign, not across campaigns. If you forgot the partition, you’d get month-over-month growth comparing Campaign A’s February to Campaign B’s January, which is nonsense. This is the single most common window-function bug — missing partition means bleeding rows across the group boundary you intended.
Window functions are the moment SQL stops feeling like a query language and starts feeling like a modeling language. Once you reach for them by reflex, the queries you write stop needing post-processing in Python or Excel for 90% of analyst work. That’s the career milestone — not knowing the syntax, but knowing which pattern to reach for.
If you want to build window-function reflexes against a realistic dataset with marketing-style time-series quirks (uneven reporting gaps, mid-month backfills, campaigns that end and restart), the Business Intelligence path on caseSQL is designed around exactly these patterns.
Common window-function pitfalls
Four gotchas that bite every analyst who’s new to windows, in rough order of how often they show up in real queries:
- Missing PARTITION BY. A naked
LAG(revenue) OVER (ORDER BY month)across a multi-entity result silently compares across entities. Always ask: is "previous" scoped to a group? If so, add the partition. - Window functions in WHERE. You can’t write
WHERE ROW_NUMBER() OVER (...) = 1. The engine computes windows after the WHERE clause. Always wrap in a CTE (or useQUALIFYin Snowflake/BigQuery, if you’re lucky). - DISTINCT + window.
SELECT DISTINCT col, SUM(x) OVER (...)does not dedupe the way you think. DISTINCT runs on the final row shape, which includes the window output, so rows that differ only in their window value stay separate. Dedupe in a CTE first, then apply the window. - Frame clause surprises. The default frame for a cumulative window (
SUM(x) OVER (ORDER BY ...)) isRANGE UNBOUNDED PRECEDING AND CURRENT ROW— andRANGEtreats ties in the order-by column specially, summing all tied rows into the same running total. If you want strict row-by-row, writeROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWexplicitly.
The frame-clause one is the most subtle and the hardest to debug — numbers come out looking almost right, and only eagle-eyed readers catch that two adjacent rows with the same date share a running total. Once you’ve been burned once, you write ROWS BETWEEN explicitly for every cumulative window for the rest of your career.