Use CASE expressions to clean and standardize inconsistent data
The gender data in dim_customers is a mess — 'M', 'Male', 'male', 'F', 'Female', 'female', and NULLs. Can you write a query that normalizes this into three clean buckets: 'male', 'female', and 'unknown'? Show each bucket and the count.
Each hint you reveal reduces the XP you can earn. Try the query first.
Use a CASE expression: `CASE WHEN LOWER(gender) IN ('m', 'male') THEN 'male' WHEN LOWER(gender) IN ('f', 'female') THEN 'female' ELSE 'unknown' END`
The ELSE 'unknown' catches NULLs and any other unexpected values — without it, NULLs would be silently dropped
Alias the CASE as `clean_gender`, GROUP BY it, and use `COUNT(*) AS count`