JOIN application fact to a geography dim with CASE bucketing for a 2-row LMI vs Non-LMI comparison
First slide of the fair-lending memo. Compare approval rate between LMI tracts and non-LMI tracts — that's `lending_geography.lmi_flag = 1` vs `0`. Four columns: `tract_type` (the string 'LMI' or 'Non-LMI'), `total_apps`, `approved_count`, `approval_rate` (4 decimals). Sort by `tract_type` descending so 'Non-LMI' (the comparison baseline) lists first.
| Column | Type | Key |
|---|---|---|
| app_id | INT | PK |
| customer_id | INT | FK → retail_customers |
| product_id | INT | FK → retail_products |
| application_date | TEXT | |
| requested_amount | REAL | |
| requested_term | INT | |
| loan_purpose | TEXT | |
| action_taken | INT | |
| denial_reason_1 | INT | |
| aus_recommendation | TEXT | |
| rate_spread | REAL | |
| lien_status | INT | |
| occupancy | INT | |
| ethnicity | INT | |
| race | INT | |
| sex | INT | |
| age_bucket | TEXT | |
| property_tract | TEXT | FK → lending_geography |
| Column | Type | Key |
|---|---|---|
| tract_id | TEXT | PK |
| msa_code | TEXT | |
| county | TEXT | |
| state | TEXT | |
| lmi_flag | INT | |
| minority_pct | REAL |
Each hint you reveal reduces the XP you can earn. Try the query first.
JOIN applications to `lending_geography` on the tract key. INNER JOIN drops apps with NULL property_tract — auto and personal loans, which are out of scope for HMDA-style geographic analysis anyway.
Translate the integer LMI flag into the human-readable label the slide expects. CASE is the typical idiom; the same expression goes in both the SELECT projection and the GROUP BY so they stay in sync.
Conditional aggregation for approved_count, COUNT(*) for total_apps, ratio for approval_rate (same shape as M6 / M11). Sort tract_type descending so the control group leads the slide — that's just how the lex-sort falls out for these two labels.