JOIN with COUNT DISTINCT to avoid double-counting
Planning the Q2 direct-mail drop. I need the count of active deposit customers by state. Deposit = Checking or Savings (look at `retail_products.subcategory`). Active = account status is `Open`. Show `state` and a count column called `n_customers`, sorted highest to lowest. Distinct customers — not accounts. One person can hold both a checking and a savings.
| Column | Type | Key |
|---|---|---|
| customer_id | INT | PK |
| household_id | TEXT | |
| first_name | TEXT | |
| last_name | TEXT | |
| date_of_birth | TEXT | |
| state | TEXT | |
| segment | TEXT | |
| fico_current | INT | |
| estimated_income | REAL | |
| kyc_risk | TEXT | |
| acquisition_channel | TEXT | |
| onboarded_date | TEXT | |
| status | TEXT |
| Column | Type | Key |
|---|---|---|
| account_id | INT | PK |
| customer_id | INT | FK → retail_customers |
| product_id | INT | FK → retail_products |
| branch_id | INT | FK → retail_branches |
| open_date | TEXT | |
| close_date | TEXT | |
| status | TEXT | |
| ownership_type | TEXT | |
| interest_rate | REAL | |
| original_balance | REAL | |
| credit_limit | REAL | |
| fico_at_orig | INT | |
| maturity_date | TEXT |
| Column | Type | Key |
|---|---|---|
| product_id | INT | PK |
| product_code | TEXT | |
| product_name | TEXT | |
| category | TEXT | |
| subcategory | TEXT | |
| default_apr | REAL | |
| default_apy | REAL | |
| monthly_fee | REAL | |
| nsf_fee | REAL | |
| od_fee | REAL |
Each hint you reveal reduces the XP you can earn. Try the query first.
Customer state and account product type live in different tables. You'll need to bring them together — three-table join through retail_accounts and retail_products to get from customer demographics to product subcategory.
Filter: `retail_products.subcategory IN ('Checking', 'Savings')` AND `retail_accounts.status = 'Open'`.
GROUP BY state, COUNT DISTINCT on customer_id (because a customer with both a Checking AND a Savings would otherwise be counted twice), and ORDER BY the count descending.