Use HAVING to filter aggregated groups and COUNT(DISTINCT) for unique counts
I'm prepping for a SKU rationalization review — we want to consolidate slow-moving categories. For each product category (normalized, one row per category), show the number of distinct products we carry and the total sales transaction count. Only include categories with more than 400 transactions — those are the ones earning their place on the shelf. The ones below 400 are candidates to prune, but that's a separate conversation.
| Column | Type | Key |
|---|---|---|
| sale_id | INT | PK |
| employee_id | INT | FK → dim_employees |
| product_id | INT | FK → dim_products |
| region_id | INT | FK → dim_regions |
| sale_date | TEXT | |
| quantity | INT | |
| revenue | REAL | |
| discount | REAL |
| Column | Type | Key |
|---|---|---|
| product_id | INT | PK |
| product_name | TEXT | |
| category | TEXT | |
| unit_price | REAL | |
| launch_date | TEXT |
Each hint you reveal reduces the XP you can earn. Try the query first.
There's a casing inconsistency in the category column that will cause your GROUP BY to split what should be a single category into multiple rows. Normalize it before grouping.
Use COUNT(*) for total transactions and COUNT(DISTINCT product_id) for unique products
HAVING COUNT(*) > 400 filters after aggregation — WHERE cannot filter on aggregates