The WITH statement — the common table expression, or CTE — is the single biggest readability improvement SQL has seen in the last twenty years. And yet many analysts still write deeply nested subqueries because that’s what their courses taught. This post is the case for CTEs becoming your default, plus a short tour of the recursive variant that unlocks hierarchical data problems every analyst eventually faces.
Why CTEs replaced most subqueries
Both forms below return the same result. One of them is pleasant to read six weeks later.
-- Subquery form: reads inside-out, nested, hard to comment
SELECT segment, AVG(ltv) AS avg_ltv
FROM (
SELECT c.segment, p.ltv
FROM dim_customers c
JOIN (
SELECT customer_id, SUM(amount) AS ltv
FROM fact_purchases
WHERE purchased_at >= '2025-01-01'
GROUP BY customer_id
) p USING (customer_id)
) x
GROUP BY segment;
-- CTE form: reads top-down, each block named, each block comment-able
WITH recent_ltv AS (
-- One row per customer with their 2025 lifetime value
SELECT customer_id, SUM(amount) AS ltv
FROM fact_purchases
WHERE purchased_at >= '2025-01-01'
GROUP BY customer_id
),
segment_ltv AS (
-- Join LTV back to customer segment
SELECT c.segment, r.ltv
FROM dim_customers c
JOIN recent_ltv r USING (customer_id)
)
SELECT segment, AVG(ltv) AS avg_ltv
FROM segment_ltv
GROUP BY segment;The CTE form is longer by a few lines. It’s also the form that survives code review, that a colleague can land on and modify confidently, and that comments usefully when the logic gets complex. For any query longer than three clauses, the CTE form is the correct default.
Chain vs nest: the readability decision
Once you’re using CTEs, the next decision is whether to chain (multiple CTEs in sequence) or nest (one CTE referenced from multiple places). The rule is simpler than it sounds:
- Chain when each step transforms the data further and the next step only needs the previous step’s output. Most analyst work looks like this: filter → aggregate → join → filter again → final select.
- Nest (well: reference a CTE from multiple places) when the same intermediate result is needed in two places. Don’t copy-paste the subquery; define it once as a CTE and reference it twice.
Recursive CTEs: the one pattern analysts need
A recursive CTE references itself. That sounds esoteric; it turns out to be the one SQL feature that makes hierarchical data tractable without writing a loop in Python. The skeleton is always the same shape:
WITH RECURSIVE walk AS (
-- BASE CASE: the starting set of rows
SELECT node_id, parent_id, 0 AS depth
FROM nodes
WHERE parent_id IS NULL -- roots
UNION ALL
-- RECURSIVE CASE: one level deeper each iteration
SELECT n.node_id, n.parent_id, w.depth + 1
FROM nodes n
JOIN walk w ON n.parent_id = w.node_id
)
SELECT * FROM walk;Every recursive CTE has two parts: a base case that seeds the recursion, and a recursive case that generates the next layer from the previous one, joined via UNION ALL. The database keeps iterating until the recursive case returns zero new rows, then stops.
Org charts, category trees, and session chains
Three places recursive CTEs show up in real analyst work. Each one is a hiring-interview-level problem and a real ticket you’ll see.
Org chart: count all reports under a manager
-- How many people (direct + indirect) report to each manager?
WITH RECURSIVE reports AS (
SELECT employee_id, manager_id, employee_id AS top_manager, 0 AS depth
FROM dim_employees
UNION ALL
SELECT r.employee_id, e.manager_id, r.top_manager, r.depth + 1
FROM reports r
JOIN dim_employees e ON e.employee_id = r.manager_id
)
SELECT
top_manager,
COUNT(*) - 1 AS total_reports -- minus 1 to exclude the manager themselves
FROM reports
GROUP BY top_manager;Category tree: all descendants of a parent category
E-commerce category trees are classic recursive territory. "Electronics → Computers → Laptops → Gaming Laptops" is four levels deep; a product tagged to Gaming Laptops should roll up into all three parent categories for reporting.
Session chains: next-page journeys
Less obviously recursive: "starting from landing page X, what’s the most common 5-step user journey?" can be written as a recursive CTE that walks next-page edges from each session. In practice most analyst teams solve this with window functions, but the recursive form handles variable-length journeys that LAG/LEAD can’t.
Given dim_employees(employee_id, name, manager_id), return every employee along with their full chain of management above them — as a comma-separated string like "Alice > Bob > Charlie > CEO".
CEO’s manager_id is NULL. Chain depth is at most ~10 levels in this org.
One row per employee, with a chain column containing the name path from the employee up to the CEO.
Show solution
WITH RECURSIVE chain AS (
SELECT
employee_id,
name,
manager_id,
name AS path,
1 AS depth
FROM dim_employees
UNION ALL
SELECT
c.employee_id,
c.name,
e.manager_id,
c.path || ' > ' || e.name,
c.depth + 1
FROM chain c
JOIN dim_employees e ON e.employee_id = c.manager_id
WHERE c.depth < 15 -- cycle guard
)
SELECT employee_id, name, path AS chain
FROM chain c1
WHERE depth = (
SELECT MAX(depth) FROM chain c2
WHERE c2.employee_id = c1.employee_id
);CTEs are the single biggest lever for writing SQL that other humans can read. Chain them by default, reach for the recursive variant the next time a manager asks you something about hierarchy, and you’ll stop writing the kind of inside-out nested subqueries that separate juniors from seniors in code review.