Conditional aggregation across two tables: per-trial enrollment by arm, then a window-style total to compute each arm's share. Practice the core randomization-balance audit pattern that biostatisticians run on every trial before unblinding.
DSMB pre-read needs a randomization audit. Find every trial where any single arm is carrying more than 70% of the post-randomization enrollments — that's the threshold the protocols flag as needing investigator review. Join fact_enrollments to dim_protocols to dim_trials. Skip status = 'screen_fail' (those are pre-randomization). Per-trial total = sum of arm enrollments. Arm share = arm count / trial total. I want trial_name, arm_name, arm_n, arm_share_pct (rounded to 1 decimal), sorted by arm_share_pct DESC. Two trials should pop — one we already know about and one that's a surprise.
| Column | Type | Key |
|---|---|---|
| enrollment_id | INT | PK |
| trial_id | INT | FK → dim_trials |
| protocol_id | INT | FK → dim_protocols |
| patient_id | INT | FK → dim_trial_patients |
| site_id | INT | FK → dim_sites |
| enrolled_date | TEXT | |
| status | TEXT | |
| withdrawal_reason | TEXT |
| Column | Type | Key |
|---|---|---|
| protocol_id | INT | PK |
| trial_id | INT | FK → dim_trials |
| arm_name | TEXT | |
| intervention_type | TEXT | |
| atc_class | TEXT | |
| version | TEXT |
| Column | Type | Key |
|---|---|---|
| trial_id | INT | PK |
| nct_id | TEXT | |
| trial_name | TEXT | |
| phase | INT | |
| condition | TEXT | |
| therapeutic_area | TEXT | |
| sponsor | TEXT | |
| status | TEXT | |
| start_date | TEXT | |
| end_date | TEXT |
Each hint you reveal reduces the XP you can earn. Try the query first.
Three-table join: enrollments to protocols (for arm name) to trials (for trial name). Exclude screen_fail enrollments — they're pre-randomization and don't affect arm balance.
Two aggregations are needed: per-arm count and per-trial total. The per-trial total can be computed as a window aggregate over the per-arm GROUP BY result (a window function on top of an aggregate is the canonical pattern here).
Cast the arm/total ratio to a float before comparing against 0.70 — integer division silently rounds the share to 0 and filters out every match.
Round the share to one decimal for display. Sort by share descending. Row count is small — only a couple of trials hit the threshold.