Join semantics are the first thing you learn in SQL and the last thing you get fully right. Most analysts who have written SQL for a year can still tell you, in the moment, which join they meant — and then type a different one. This post is the practical read-through: when each of the five joins is the correct answer, which default you should reach for before thinking, and the three join mistakes that quietly produce wrong dashboards.
Everything below assumes a star schema that looks like the one used in caseSQL missions: dim_customers, dim_campaigns, fact_sends, fact_purchases. The joins work the same shape against any schema.
The five joins in one sentence each
Before the tactics, the one-liners. Memorize these the way you memorized the first ten prime numbers.
- INNER JOIN — keep rows where both sides match. Drops everything else silently.
- LEFT JOIN — keep every row from the left side; fill the right side with NULL when it doesn’t match.
- RIGHT JOIN — same as LEFT, but reversed. Mostly a code smell; rewrite as LEFT.
- FULL OUTER JOIN — keep every row from both sides; NULL the non-matching halves.
- CROSS JOIN — produce every pair. Useful for calendars and scaffolding, dangerous everywhere else.
INNER JOIN is almost never what you actually want
The modal analyst writes INNER JOIN as their reflex. The modal analyst is wrong. INNER is the right answer when the business question is "show me the rows that exist on both sides." That’s rarer than it sounds. Most questions are "show me every customer, and include their purchases if they have any" — that’s LEFT, not INNER.
-- "How many customers bought something?" — INNER is correct
SELECT COUNT(DISTINCT c.customer_id)
FROM dim_customers c
INNER JOIN fact_purchases p USING (customer_id);
-- "How many customers do we have, and how many have bought?" — INNER is WRONG
-- It silently drops customers with no purchases, which is exactly the
-- population we care about for the answer.
SELECT COUNT(DISTINCT c.customer_id) AS total_customers,
COUNT(DISTINCT p.customer_id) AS purchasing_customers
FROM dim_customers c
LEFT JOIN fact_purchases p USING (customer_id);LEFT JOIN: the “keep everyone” default
LEFT JOIN keeps every row from the left table and fills unmatched right-side columns with NULL. That NULL is the data; it’s how you tell which left-side rows had no match. The two patterns worth burning into your reflex:
- Enrichment: left table is the population you care about; right table adds attributes.
dim_customers LEFT JOIN fact_purchasesgives every customer plus their purchases (NULL if none). - Anti-match: left table is the population; you want the rows that did NOT match.
LEFT JOIN ... WHERE right.id IS NULLreturns exactly that set. Covered in detail in the next section.
Every customer in dim_customers, with the count of purchases they’ve made. Customers with zero purchases should appear with a count of 0, not be missing.
Tables: dim_customers(customer_id, email, signup_date), fact_purchases(customer_id, amount, purchased_at).
One row per customer. Columns: customer_id, email, purchase_count (integer, 0 for non-buyers).
Show solution
SELECT
c.customer_id,
c.email,
COUNT(p.purchase_id) AS purchase_count
FROM dim_customers c
LEFT JOIN fact_purchases p USING (customer_id)
GROUP BY c.customer_id, c.email
ORDER BY purchase_count DESC;Anti-joins and reconciliation with FULL OUTER
An anti-join asks "what’s on the left side but NOT on the right?" It’s one of the three or four most useful patterns in analyst SQL, and the one most likely to be written wrong. The canonical form:
-- Customers who signed up but never purchased
SELECT c.*
FROM dim_customers c
LEFT JOIN fact_purchases p USING (customer_id)
WHERE p.customer_id IS NULL;The key detail: the IS NULL check goes on the JOIN-right side column, after the LEFT JOIN. If you tried to write this as WHERE customer_id NOT IN (SELECT customer_id FROM fact_purchases), the NOT IN returns no rows as soon as any purchase has a NULL customer_id — SQL’s three-valued logic quietly breaks the filter. The LEFT JOIN + IS NULL form is the correct idiom.
FULL OUTER JOIN is the reconciliation tool: rows that exist on the left but not the right, rows that exist on the right but not the left, and rows in both. Budget vs actuals, claims vs payments, source-of-truth comparisons.
-- Reconciling: which budgets have actuals, which actuals have no budget line
SELECT
COALESCE(b.cost_center_id, a.cost_center_id) AS cost_center_id,
b.budget_amount,
a.actual_amount,
CASE
WHEN b.cost_center_id IS NULL THEN 'Actual with no budget line'
WHEN a.cost_center_id IS NULL THEN 'Budget line with no actuals'
ELSE 'Matched'
END AS reconciliation_status
FROM fact_budgets b
FULL OUTER JOIN fact_actuals a USING (cost_center_id, month);The three join bugs that break dashboards
These three are the ones that hit every analyst at least once, usually in a way that doesn’t surface for weeks.
1. The fanout (many-to-many multiplication)
If the right side of a JOIN has multiple rows per join key, the left side gets multiplied. SUM(revenue) across that join returns 3x the real revenue because every purchase row got joined to three campaign rows. The fix is to pre-aggregate the fanning-out table to one row per key before the JOIN.
-- BROKEN: campaigns fans out because a customer can be on multiple campaigns
SELECT SUM(p.amount) AS revenue
FROM fact_purchases p
JOIN fact_campaign_memberships m ON p.customer_id = m.customer_id;
-- FIXED: roll up memberships before joining
SELECT SUM(p.amount) AS revenue
FROM fact_purchases p
JOIN (
SELECT customer_id, MIN(campaign_id) AS primary_campaign
FROM fact_campaign_memberships
GROUP BY customer_id
) m USING (customer_id);2. NULLs eating your rows (the INNER-by-accident bug)
A JOIN predicate on a nullable column silently drops rows where the column is NULL, even in a LEFT JOIN when the NULL is on the left side and you’re filtering on a right-side attribute. Every WHERE right.col = 'x' after a LEFT JOIN converts it back to an INNER JOIN in practice — because NULL = 'x' is NULL, which is falsey.
-- Feels like LEFT JOIN, behaves like INNER JOIN
SELECT c.*, p.amount
FROM dim_customers c
LEFT JOIN fact_purchases p USING (customer_id)
WHERE p.channel = 'email'; -- drops every customer with no purchase
-- If you want "customers whose last purchase was email, plus customers with
-- no purchases at all", move the predicate INTO the join:
SELECT c.*, p.amount
FROM dim_customers c
LEFT JOIN fact_purchases p
ON p.customer_id = c.customer_id AND p.channel = 'email';3. JOIN on the wrong grain
Every fact table has a grain: one row per purchase, or one row per send, or one row per campaign-day. Joining two fact tables at different grains without thinking about the grain product produces results that look right and aren’t. Always ask: what does one row on each side represent, and what does one row of the joined result represent?
Joins are the most deceptively simple part of SQL. Syntax is easy; semantics are easy; the hard part is the discipline to check grain, double-check your left-vs-inner reflex, and remember that NULL is its own kind of row. Every analyst who’s ever shipped a wrong number has shipped it through a join.