Business Intelligence Path · Mission 26 of 26Expert

Why is our leaderboard off by 12x?

Detect and fix grain-confusion fan-out when joining two fact tables at different grains

Back to Business Intelligence

The Brief

Marcus WebbVP of Salessales-ops

I'm pulling my hair out. Victoria wants a career-performance sheet for every active Sales rep: first_name, last_name, total career revenue (all years), and total target dollars assigned across every quarter we've tracked. I wrote what I thought was a simple query joining `fact_sales` to `fact_targets` on `employee_id` and summing revenue, but my top rep came back at $2.5M career revenue — the real number should be closer to $200K. Something is very wrong. Figure out what's happening and send me the correct version. 13 active Sales reps. Sort by total_revenue descending.

You'll practice

Grain reasoningFan-outPre-aggregated CTEs

Tables & columns available

dim_employeesdim10 columns
ColumnTypeKey
employee_idINTPK
first_nameTEXT
last_nameTEXT
emailTEXT
departmentTEXT
titleTEXT
hire_dateTEXT
salaryINT
manager_idINT
is_activeINT
fact_salesfact8 columns
ColumnTypeKey
sale_idINTPK
employee_idINTFK → dim_employees
product_idINTFK → dim_products
region_idINTFK → dim_regions
sale_dateTEXT
quantityINT
revenueREAL
discountREAL
fact_targetsfact6 columns
ColumnTypeKey
target_idINTPK
employee_idINTFK → dim_employees
quarterINT
yearINT
target_amountREAL
actual_amountREAL

Hints (3)

Each hint you reveal reduces the XP you can earn. Try the query first.

Hint 1

Run Marcus's naive approach first and compare against the ground truth: `SELECT SUM(revenue) FROM fact_sales` returns the company-wide total. Join `fact_sales` to `fact_targets` on `employee_id`, sum revenue, and the rep-level totals will be roughly 12x too big.

Hint 2

The root cause is a grain mismatch. `fact_sales` has a per-sale grain (~2000 rows); `fact_targets` has a per-(employee × quarter × year) grain (~168 rows, ~12 per rep). Joining on `employee_id` alone creates a Cartesian product: every sale row gets duplicated for each of that rep's target rows, so revenue is inflated by roughly the target-row count per rep.

Hint 3

Fix with pre-aggregation, not a bigger JOIN. Build a CTE that SUMs revenue per rep, a second CTE that SUMs target_amount per rep, then LEFT JOIN both to `dim_employees` filtered to `is_active = 1 AND department = 'Sales'`. Use COALESCE on any rep missing from either CTE.