Filter on a status enum and group on a TEXT reason column. Practice WHERE on a categorical status field, GROUP BY on a free-text dimension, and ranked output — the standard data-quality and patient-retention diagnostic.
DSMB meets Friday and they always ask about withdrawals. From fact_enrollments, give me a breakdown of every distinct withdrawal_reason for patients with status = 'withdrawn' — withdrawal_reason and withdrawn_count, sorted by count descending, then reason ascending as tiebreaker. Skip the 'screen_fail' status — those are pre-randomization and tracked separately on the screening report. We only want post-randomization withdrawals here.
| Column | Type | Key |
|---|---|---|
| enrollment_id | INT | PK |
| trial_id | INT | FK → dim_trials |
| protocol_id | INT | FK → dim_protocols |
| patient_id | INT | FK → dim_trial_patients |
| site_id | INT | FK → dim_sites |
| enrolled_date | TEXT | |
| status | TEXT | |
| withdrawal_reason | TEXT |
Each hint you reveal reduces the XP you can earn. Try the query first.
Single-table filtered aggregate. fact_enrollments has a status enum; the briefing's 'withdrawn' is one specific value of it (case-sensitive). Other statuses (screen_fail, completed, active) are excluded.
Group on the withdrawal_reason TEXT column. Aggregate is a row count; name it for the metric.
Sort by the count descending, with the reason text as the alphabetic tiebreaker. Six to eight distinct reasons appear in practice.