Each path drops you into a different industry with its own database, stakeholders, and messy-data traps. Pick one to start, or try them all — your progress carries over between paths when you’re signed in.
22 paths · 580 missions · Real stakeholder briefs
Marketing Analytics
Query campaign data, measure ROI, and debug data quality issues for your marketing team.
30 missions|5 tables
“What campaigns are running?”
“Which campaign drove the most revenue?”
“Normalize the gender mess”
Real-World Scenarios — Solve marketing analytics problems from fictional managers, just like a real job.
Messy Data — Work with NULLs, duplicates, inconsistent formats, and refunds — the stuff tutorials skip.
Instant Feedback — 5-tier validation engine compares your results against multiple accepted approaches.
Build executive dashboards, track KPIs, and translate business questions into SQL.
31 missions|5 tables
“Who is hitting their sales targets?”
“Rank the sales team”
“Three-table join: who sold what where?”
Dimensional Modeling in Practice — A 5-table star schema (fact_sales, fact_targets + 3 dimensions) with trailing spaces, ghost regions, inactive employees, and casing drift — the exact patterns you’ll handle modeling real marts.
Window Functions End-to-End — RANK, ROW_NUMBER, LAG, LEAD, NTILE, PERCENT_RANK, SUM() OVER with framed windows, moving averages, YoY pivots, and running totals — the window-function curriculum most interviews screen on.
Grain Reasoning & Fan-Out Bugs — Mission 26 (“Why is our leaderboard off by 12x?”) drills the canonical BI mistake: a clean query, the wrong grain, a 12× inflated total. Pre-aggregated CTEs as the fix.
Practice the SQL patterns hospital hiring managers actually recognize: ICD-10 ↔ CCSR terminology joins, principal vs. secondary diagnosis logic, POA gotchas, HCC risk tiering, and the LEFT-JOIN-vs-INNER-JOIN trap that silently breaks production healthcare dashboards. Curriculum modeled on the patterns named in CMS-HCC, AHRQ CCSR, and the Tuva Project.
25 missions|7 tables
“Pull the chronic disease list”
“Top 3 codes per body system”
“Department CCSR concentration (HHI)”
Real Terminology Joins — Practice ICD-10 → CCSR → HCC the way Cogito and Caboodle analysts do — the foundational pattern absent from every other SQL learning platform.
Principal vs. Secondary Logic — Cohort patients by principal diagnosis, aggregate comorbidities on those encounters — the HCC-recapture pattern run quarterly at every ACO and Medicare Advantage plan.
Production-Grade Gotchas — POA NULL handling, ICD-9 legacy detection, and the LEFT-JOIN-vs-INNER-JOIN trap that’s the #1 silent bug in healthcare reporting.
Practice the SQL patterns clinical research analysts and trial sponsors actually run: enrollment cohort analysis, arm balance checks, MedDRA-coded adverse event signal detection, visit adherence drift, and Kaplan-Meier-style retention curves. Real-world terminology — NCT-format trial IDs, CTCAE v5 severity grades 1-5, ATC drug classes, and MedDRA Preferred Terms — on a fully isolated 7-table dataset modeled on real phase 2 / phase 3 protocols.
25 missions|7 tables
“Enrollments per site”
“First adverse event per enrollment”
“Site HHI — concentration of enrollments per therapeutic area”
Authentic Trial Terminology — NCT-format trial IDs, MedDRA Preferred Terms for adverse events, CTCAE v5 severity grades, ATC drug classes. The vocabulary every clinical research analyst, biostatistician, and trial sponsor uses daily.
Protocol & Cohort Analysis — Arm balance checks, screen-fail rates, eligibility strata, and per-arm safety comparisons — the SQL patterns that drive every interim analysis and DSMB report.
Safety Signal Detection — Severity-by-arm cross-tabs, drug-vs-placebo AE rate deltas, and SAE cohort intersection — the analytics that flag a signal before the next protocol amendment.
Practice the SQL patterns payer claims and appeals analysts actually run: denial rate by CPT, CARC/RARC code frequency analysis, appeal-overturn pivots by level, out-of-network leakage cohorts, and provider-scorecard composites against a real-shape claims ledger. X12 835 standard codes, plan metal tiers, and the adjudication patterns every health-plan analytics team writes daily.
“Provider appeal-success ranking within specialty (RANK PARTITION)”
Authentic Payer Terminology — X12 835 CARC and RARC denial codes, CPT procedure codes, place-of-service, plan metal tiers (Bronze/Silver/Gold/Platinum), and HMO/PPO/EPO plan types. The vocabulary every claims analyst, appeals coordinator, and payer revenue-integrity team uses daily.
Denial & Appeal Mechanics — Denial rate by CPT, CARC frequency × category cross-tabs, overturn% by appeal level, and turnaround-time percentiles — the SQL patterns that drive every monthly denial-management review.
Network Leakage & Adequacy — Out-of-network provider cohorts via anti-join, network adequacy gaps by specialty × region, and provider denial-rate outliers (specialty-adjusted) — the analytics that surface contracting and access risks.
Practice the SQL patterns pharma commercial analysts run at Pfizer / Lilly / Novartis / Veeva: top prescribers by NRx, decile-based HCP segmentation, NRx vs TRx breakdowns, call-to-Rx attribution lag, territory market share HHI, switcher cohorts, persistence curves, and brand-launch performance scorecards. Per-Rx-claim grain on 24 months of data spanning a single-brand launch + in-class competitors. The patterns life-sciences commercial teams interview on.
25 missions|7 tables
“HCPs in cardiology specialty”
“Decile-based prescriber tiers — recompute from Rx volume”
“Territory market share HHI — concentration of Rx volume”
Authentic Pharma Commercial Schema — Per-Rx-claim grain (NRx vs TRx flag, days_supply, copay), HCP/NPI dimension with decile + target flag + specialty, rep activity with sample drops, payer plans by formulary tier, and normalized territory FK that mirrors how real IQVIA / Veeva / Komodo datasets are shaped.
Launch + Competitor Narrative — Single launch brand (Cardiozin, a new statin) entered market month 17 of a 24-month window; 2-3 in-class competitors hold incumbent share; 5-6 unrelated products provide portfolio breadth. Every mission can hook into 'how is our launch tracking vs. competition?' — the same question every brand team asks every Monday.
NRx, TRx & Persistence — Per-fill is_new_rx flag enables the canonical NRx (new prescriptions) vs TRx (total prescriptions including refills) split that drives every commercial dashboard. Days-supply distributions, refill-rate cohorts, and time-to-discontinuation curves — the patterns brand teams use to detect adherence and switching.
Practice the SQL patterns hospital RCM analysts run at HCA / CommonSpirit / Epic shops / Athena: denial rate by CPT, days-in-AR by payer, first-pass yield, contractual underpayment detection, claim-to-remit lag, AR aging buckets, and provider productivity vs collections matrices. Per-charge / per-claim / per-remit grain on 24 months of provider-side billing data with planted denial cohorts and underpayment traps. The patterns billing analysts, denials specialists, and CFOs actually run.
25 missions|7 tables
“Medicare patients by demographic”
“Days-in-AR by payer — open claims aging”
“Payer concentration HHI per service line”
Authentic RCM Schema — Per-charge / per-claim / per-remit grain mirroring how Epic Resolute, Athenahealth, and Cerner billing systems shape data. Patient and payer dims with primary/secondary insurance attribution, provider dim with department + credentialing flag, CPT charge-master with RVU, claim status transitions (submitted → paid / denied / appealed / closed), and remit-level allowed/paid/adjustment splits with denial codes.
Denial + AR Workflow Narrative — Planted denial cohorts (timely-filing, prior-auth, medical-necessity, coordination-of-benefits) drive the denials-management missions. Contractual underpayment plants test the M14 expected-vs-paid detection. AR aging follows realistic 0-30 / 31-60 / 61-90 / 90+ bucket distributions across payer mix.
Real RCM KPIs — Clean Claim Rate (first-pass yield), Days in AR, denial rate by CPT and payer, contractual yield, write-off rate, and the provider productivity views that surface the difference between high-volume billers and high-collections billers.
Write the variance bridges, P&L rollups, and cash-burn queries tech FP&A teams gatekeep on.
30 missions|5 tables
“Cost-center ownership map”
“March budget-variance check”
“YoY revenue growth by month”
Finance Deliverable Scenarios — Answer questions from the CFO, FP&A Manager, Treasury Analyst, Controller, and Senior Auditor — MBR variance packs, budget attainment, revenue seasonality, cash burn, and vendor concentration.
Messy Ledger Data — Duplicate vendors, un-posted journal entries, inconsistent account-type casing, refund leakage, and duplicate payments — the reconciliation traps that quietly wreck a variance narrative.
Finance Analyst SQL Skills — A vs B vs F vs PY variance, gross margin decomposition, running cash burn, YoY revenue growth, percentile ranking, cohort onboarding, and late-posting analysis.
Structuring detection, velocity windows, entity resolution, and rule-tuning backtests — the SQL AML analysts and fraud strategists actually write.
30 missions|6 tables
“What channels do we see?”
“Round-dollar wires — flag a pattern”
“Alert precision — how good is our rule?”
Real AML Typologies — Structuring below the $10K CTR threshold, smurfing across multiple accounts, round-dollar wire cycles, dormant-then-active money mules, and device/IP/email reuse across “separate” customers — every FinCEN / FATF / Wolfsberg pattern maps to a mission.
Velocity + Self-Join SQL — COUNT() OVER with date-range window frames, correlated subqueries for same-counterparty pairs, SOUNDEX / LEVENSHTEIN for entity resolution, and gap-and-island logic for dormant account reactivation.
Rule Tuning + Backtesting — Backtest a proposed threshold against historical alerts, compute rule precision / recall on true-positive labels, and measure false-positive reduction — the analysis every compliance team asks of new hires.
The retail-banking analyst role at a regional or money-center bank: deposit growth decomposition, deposit beta on MMAs, NSF/OD fee revenue post-CFPB, branch consolidation scoring, NII decomposition, primary-banking-relationship classification, churn leading indicators, and FTP-allocated branch P&Ls against a realistic retail bank with planted reconciliation traps.
30 missions|11 tables
“Product taxonomy inventory”
“Deposit beta on MMAs (cycle-to-date)”
“2025 closed-account exit cohort with tenure”
Retail Banking Scenarios — Answer questions from the Retail COO, Branch Ops Manager, Treasurer, and Customer Insights Lead — weekly retail dashboards, ALCO-ready deposit beta, NSF/OD scenario modeling, and branch P&L with FTP allocation.
Authentic Retail Bank Schema — 6-table retail_* core (customers, accounts, products, branches, transactions, daily balances) plus retailops_* specialty tables for branch activity, fees, and ATM events. Numbers anchor to 2024-2026 retail banking realities.
Deposit + Branch + Treasury + Customer Skills — Average daily balance, weighted deposit beta, primary-relationship EXISTS logic, churn lag features, FTP curve allocation, and the multi-CTE rollups that real retail-bank dashboards run on.
The consumer-lending analyst role at a regional or money-center bank: HMDA action-taken decomposition, fair-lending disparity analysis, vintage curves and first-payment-default cohorts, ARM reset wall identification, refi candidate targeting, delinquency aging and roll rates, charge-off recovery, CECL-style loss reserve decomposition, and ALCO-ready portfolio stratification — against a realistic mortgage / auto / HELOC / personal lending dataset with planted HMDA and servicing reconciliation traps.
30 missions|8 tables
“Loan product taxonomy”
“Approval rate by FICO band”
“Refi candidate cohort”
Consumer Lending Scenarios — Answer questions from the Chief Credit Officer, Mortgage Sales Director, Fair Lending Compliance Officer, and Loan Servicing Lead — weekly origination dashboards, HMDA LAR reconstructions, ARM reset walls, and CECL-style loss reserves.
Authentic Lending Schema — lending_applications (HMDA-shaped), lending_originations (funded loans with FICO/DTI/LTV at orig), lending_payments (past installments with days_late), and lending_geography (census tract + LMI). Joins to the shared retail_* core for customer demographics.
HMDA + Vintage + Servicing Skills — HMDA action-taken decomposition, denial-reason rollups, LMI-tract approval gap analysis, vintage curves with cohort EXISTS, ARM reset arithmetic, roll-rate matrices, charge-off recovery ratios, and the multi-CTE rollups that real credit teams ship.
OpportunityHistory SCD2, Closed-Won reopens, stage regressions, forecast accuracy, ARR waterfalls — the SQL RevOps and Sales Ops analysts actually write.
34 missions|8 tables
“What stages exist in the pipeline?”
“Deals that reopened after Closed-Won”
“Closed-Won cohort retention by month”
Salesforce-Shaped Schema — Account, Contact, Lead, Opportunity, OpportunityHistory, User. The exact table shape you’ll see on day one of any Sales Cloud shop, with the field names you’ll recognize from real Salesforce exports.
OpportunityHistory SCD2 — Every stage change, amount change, owner change logged as a row. Point-in-time reconstruction (“what did the forecast look like on March 15?”), stage regression detection, and the Closed-Won reopen anti-pattern that invalidates 80% of naive pipeline reports.
Sales Cycle + Velocity SQL — Stage-duration windows, rep win-rate by region, top-N per group, quarter-over-quarter bookings waterfalls, forecast accuracy as a backtest. The full RevOps analyst toolkit.
Shopify-vs-Stripe-vs-GA4 reconciliation, refund-aware revenue, RFM segmentation, repeat-purchase cohorts — the SQL DTC and retail analysts actually write.
30 missions|6 tables
“What financial statuses exist?”
“Top 10 products by units sold”
“Repeat-purchase cohort retention”
Shopify-Shaped Schema — Customers, products, orders, line items, with the financial_status / fulfillment_status fields you’ll see on day one of any DTC shop. Familiar to anyone who’s opened a Shopify export.
Three-System Reconciliation — GA4 sessions and Stripe charges sit alongside Shopify orders with planted mismatches — partial refunds, currency conversion fees, failed captures, GA4 conversion-value drift. The signature mission ties all three together.
Carrier OTIF, freight invoice audits, detention hot spots, cost-per-unit benchmarking, and multi-metric carrier scorecards — the full logistics & transportation analyst SQL curriculum across 30 missions.
30 missions|5 tables
“Meet the carrier network”
“Monthly shipment volume trend”
“Carrier OTD vs. target gap”
OTIF Compound Metric — Build the on-time AND in-full formula at PO-line grain — the exact logic Walmart enforces with a 3% COGS chargeback.
Freight Invoice Audits — Find overbilled invoices, accessorial overcharges, and contract rate violations across carriers and lanes.
Detention Analysis — Identify facilities where dwell time exceeds 2 hours on more than a third of shipments — the hidden cost every logistics manager wants to cut.
PO cycle time analysis, purchase price variance, maverick spend detection, three-way match exceptions, and supplier risk scorecards — the full procurement & sourcing analyst SQL curriculum across 30 missions.
30 missions|6 tables
“Category catalog”
“Top 10 suppliers by spend”
“Contract compliance rate”
PO Lifecycle Analysis — Trace the full req → PO → ASN → receipt → invoice cycle and measure cycle time at every stage.
Purchase Price Variance — Compare actual vs contracted unit price to surface PPV exceedances by category — the metric that drives finance escalations.
Maverick Spend Detection — Find POs with no contract_id — off-contract spend that bypasses negotiated pricing.
Units per labor hour, dock-to-stock decomposition, pick accuracy by zone, slotting recommendations, and wave analysis — the full warehouse operations analyst SQL curriculum across 30 missions.
30 missions|7 tables
“Worker roster”
“Pick accuracy”
“Labor by function”
UPLH by Zone and Shift — Compute units per labor hour across picking zones and shifts — the metric DC managers review every morning.
Dock-to-Stock Decomposition — Break the inbound cycle into unload, count, and putaway phases to find the bottleneck causing SLA breaches.
Pick Accuracy Heat Map — Rank zones by pick error rate — surface the OVERFLOW zone pattern where error rates are 4× higher than GOLDEN.
WMAPE calculation, forecast bias detection, tracking signal computation, consensus vs statistical forecast value-add, promo lift attribution, and ABC×XYZ segmentation — the full demand planning analyst SQL curriculum across 30 missions.
30 missions|6 tables
“Location count”
“WMAPE”
“Forecast value-add”
WMAPE vs MAPE — Learn why WMAPE (Σ|A−F|/ΣA) is the industry default — it weights by volume and avoids divide-by-zero on intermittent SKUs.
Forecast Bias Detection — Spot the systematic over-forecast on Category C that hides inside an acceptable WMAPE — the pattern demand planners escalate to leadership.
Tracking Signal — Compute rolling tracking signals and flag SKUs where |TS| > 4 — the statistical trigger for model recalibration.