Three-CTE pipeline + window SUM with PARTITION BY for cumulative cohort loss progression
Vintage curve for the loss-forecasting deck. By vintage year and months-to-charge-off (the gap between funded_date and closed_date for charge_off loans), build a cumulative loss curve. Six columns: `vintage_year`, `months_to_co`, `co_count` (chargeoffs at this exact month-mark), `cum_co` (running total within the vintage), `vintage_size` (total loans in the vintage, denominator), `cum_co_rate` (running CO rate, 4 decimals). Filter `funded_date >= '2024-11-01'`. Sort by vintage_year ascending, then months_to_co ascending.
| Column | Type | Key |
|---|---|---|
| origination_id | INT | PK |
| app_id | INT | FK → lending_applications |
| account_id | INT | FK → retail_accounts |
| funded_date | TEXT | |
| funded_amount | REAL | |
| term_months | INT | |
| interest_rate | REAL | |
| rate_type | TEXT | |
| arm_reset_date | TEXT | |
| fico_at_orig | INT | |
| dti_at_orig | REAL | |
| ltv_at_orig | REAL | |
| property_value | REAL | |
| lo_name | TEXT | |
| channel | TEXT | |
| current_status | TEXT | |
| closed_date | TEXT |
Each hint you reveal reduces the XP you can earn. Try the query first.
Three CTEs decompose the work: extract vintage_year and months-to-CO per charge-off loan (EXTRACT-based date arithmetic); total each vintage's size; aggregate CO counts per (vintage, month-mark).
For the running total, reach for a SUM aggregate window function. Two design choices matter: the partition (so the running total restarts per vintage rather than spilling forward across vintages), and the window frame (a running total reads from the start of the partition up through the current row — ROWS rather than RANGE handles tied month-marks cleanly).
Final JOIN to vintage_size so each row carries its denominator. Compute cum_co_rate by dividing the window-summed cumulative by the vintage total — cast to NUMERIC before ROUND or Postgres throws on `double precision`.