Marketing Analytics Path · Mission 18 of 25Hard

Normalize the gender mess

Use CASE expressions to clean and standardize inconsistent data

Back to Marketing Analytics

The Brief

Sarah KimVP of Marketingmarketing-analytics

The gender data in dim_customers is a mess — 'M', 'Male', 'male', 'F', 'Female', 'female', and NULLs. Can you write a query that normalizes this into three clean buckets: 'male', 'female', and 'unknown'? Show each bucket and the count.

You'll practice

CASEData cleaning

Tables available

dim_customers

Hints (3)

Each hint you reveal reduces the XP you can earn. Try the query first.

Hint 1

Use a CASE expression: `CASE WHEN LOWER(gender) IN ('m', 'male') THEN 'male' WHEN LOWER(gender) IN ('f', 'female') THEN 'female' ELSE 'unknown' END`

Hint 2

The ELSE 'unknown' catches NULLs and any other unexpected values — without it, NULLs would be silently dropped

Hint 3

Alias the CASE as `clean_gender`, GROUP BY it, and use `COUNT(*) AS count`