Three-table JOIN with NULL-safe AVG. Practice routing through fact-fact joins (visits → enrollments) into a dim, filtering NULLs out of an aggregate, and ranking sites on an operational drift metric.
Site management call needs a drift report. For every site, give me the average days_off_schedule across all visits at that site — site_name, avg_drift_days (rounded to 2 decimals), sorted by avg_drift_days DESC. Skip NULL drift values (those are no-shows; visit_type and completion live in different columns and we don't want to mix them in). Path is fact_trial_visits → fact_enrollments (on enrollment_id) → dim_sites (on site_id). The top of the list is where I focus this week's calls.
| Column | Type | Key |
|---|---|---|
| visit_id | INT | PK |
| enrollment_id | INT | FK → fact_enrollments |
| scheduled_date | TEXT | |
| actual_date | TEXT | |
| visit_type | TEXT | |
| completed_flag | INT | |
| days_off_schedule | INT |
| Column | Type | Key |
|---|---|---|
| enrollment_id | INT | PK |
| trial_id | INT | FK → dim_trials |
| protocol_id | INT | FK → dim_protocols |
| patient_id | INT | FK → dim_trial_patients |
| site_id | INT | FK → dim_sites |
| enrolled_date | TEXT | |
| status | TEXT | |
| withdrawal_reason | TEXT |
| Column | Type | Key |
|---|---|---|
| site_id | INT | PK |
| site_name | TEXT | |
| city | TEXT | |
| state | TEXT | |
| region | TEXT | |
| site_type | TEXT | |
| enrollment_capacity | INT | |
| pi_name | TEXT | |
| activated_date | TEXT |
Each hint you reveal reduces the XP you can earn. Try the query first.
Visits don't have a direct site FK — they reach the site through the enrollment table. Three-table join: visits → enrollments → sites.
Filter NULL drift values out before averaging (no-shows have NULL drift). AVG ignores NULLs by default, but the explicit filter documents intent and makes the row-count expectation predictable.
Group by site name; round the average to two decimals. Sort by drift descending, with site name as the alphabetic tiebreaker. One row per site.