Healthcare

SQL for Healthcare Analysts: The Queries That Matter

HEDIS measures, 30-day readmission logic, claims-vs-clinical reconciliation, and the PHI handling rules that will get you fired if you skip them.

10 min read

Healthcare analytics has the same SQL syntax as every other analyst domain and almost none of the same habits. The data shape is different (encounters, not events; members, not customers), the definitions are regulated (CMS specs, HEDIS rules), and the mistakes are costly in ways a marketing analyst never experiences (HIPAA audits, CMS quality-report rework, misidentified patients). This post is the working shape of the role — the five query patterns a healthcare analyst writes every week.

Examples use a simplified schema: dim_patients, dim_providers, dim_departments, fact_encounters (admit/discharge records), fact_quality_metrics (observations on encounters). This is close to the caseSQL provider path schema; if you’ve done those missions, the shape will be familiar.

Why healthcare SQL is its own discipline

Three things separate healthcare analytics from consumer / product analytics:

  • Regulated definitions. "30-day readmission rate" isn’t whatever the analyst thinks; it’s a CMS-specified measure with inclusion/exclusion rules. Getting the definition wrong isn’t "wrong number" — it’s "rework the CMS report."
  • Member identity is fragile. Patients can appear under different IDs across systems (EHR, claims, registration), and matching rules are imperfect. The dedup logic that a marketing analyst uses on emails doesn’t translate.
  • PHI. Protected Health Information. Exporting a CSV with 10 rows of patient names to your laptop for "quick analysis" is a fireable offense in most organizations and a regulated breach under HIPAA.

The HEDIS measure shape

HEDIS (Healthcare Effectiveness Data and Information Set) is the quality-measurement standard payers report to NCQA. Every measure has the same shape: a denominator (the population eligible for the measure) and a numerator (the subset that received the appropriate care). The rate is numerator / denominator.

-- HEDIS-style: Breast cancer screening rate for women 50-74 in 2025
WITH denominator AS (
  -- Women aged 50-74 with continuous enrollment during the measurement year
  SELECT patient_id
  FROM dim_patients
  WHERE gender = 'F'
    AND EXTRACT(YEAR FROM AGE(CURRENT_DATE, dob)) BETWEEN 50 AND 74
    AND enrollment_start <= '2024-01-01'
    AND enrollment_end   >= '2025-12-31'
),
numerator AS (
  -- The subset of the denominator who had a mammogram in 2024 or 2025
  SELECT DISTINCT d.patient_id
  FROM denominator d
  JOIN fact_encounters e USING (patient_id)
  WHERE e.cpt_code IN ('77057', '77063', '77067')
    AND e.service_date BETWEEN '2024-01-01' AND '2025-12-31'
)
SELECT
  (SELECT COUNT(*) FROM denominator) AS denom,
  (SELECT COUNT(*) FROM numerator)   AS numer,
  (SELECT COUNT(*) FROM numerator) * 100.0 /
    NULLIF((SELECT COUNT(*) FROM denominator), 0) AS screening_rate_pct;

30-day readmissions — the definition actually matters

Readmission rate is the most-reported and most-frequently-miscomputed healthcare metric. "Patient discharged, came back within 30 days" sounds simple; the real CMS definition has:

  • Index admission type matters. Planned readmissions (scheduled chemotherapy) aren’t counted. Certain transfer patterns are excluded.
  • Unit matters. Is it readmissions per patient, or per index admission? The same patient with three readmissions in a year counts three times, not once.
  • Denominator matters. Only specific index-admission diagnoses (heart failure, AMI, pneumonia, COPD, CABG, THA/TKA) count for the CMS Hospital Readmissions Reduction Program.
-- All-cause 30-day readmission rate, self-join on the encounters table
WITH index_admissions AS (
  SELECT
    encounter_id AS index_encounter_id,
    patient_id,
    discharge_date
  FROM fact_encounters
  WHERE encounter_type = 'inpatient'
    AND discharged_alive = true
    AND discharge_date BETWEEN '2025-01-01' AND '2025-11-30'
),
readmits AS (
  SELECT
    i.index_encounter_id,
    i.patient_id,
    MIN(r.admit_date) AS readmit_date
  FROM index_admissions i
  JOIN fact_encounters r
    ON r.patient_id = i.patient_id
   AND r.encounter_type = 'inpatient'
   AND r.admit_date  >  i.discharge_date
   AND r.admit_date <= i.discharge_date + INTERVAL '30 days'
   AND r.encounter_id <> i.index_encounter_id
  GROUP BY i.index_encounter_id, i.patient_id
)
SELECT
  COUNT(*) FILTER (WHERE r.index_encounter_id IS NOT NULL) * 100.0 /
    COUNT(*) AS readmit_rate_pct,
  COUNT(*) AS index_admissions
FROM index_admissions i
LEFT JOIN readmits r USING (index_encounter_id);

Claims vs clinical: the reconciliation problem

Healthcare organizations run on two parallel data systems: claims (what the payer was billed) and clinical (what the EHR recorded). They describe the same encounters and disagree constantly. Reconciling them is a standing analyst task.

  • Timing disagrees. Claim service-date can be the discharge date; clinical admit-date is when the patient came in. For a 7-day stay those are different weeks.
  • Counts disagree. Claims may bundle multiple clinical encounters into one; the EHR may have "visits" that claims doesn’t bill.
  • Diagnoses disagree. Clinical ICD codes include problem list; claim ICD codes are only the billed ones. The claim underrepresents the clinical complexity.

The pattern: FULL OUTER JOIN the two sources on patient_id + encounter date window, then categorize every row as "in both," "claims only," or "clinical only." The non-matching rows ARE the analysis — they tell you where the two systems drift.

PHI: the handling rules that keep you employed

A short list that every healthcare analyst internalizes in week one. These aren’t guidelines; breaking them has gotten analysts walked out by security.

  • Never export patient-identified data to your laptop. Not even "for a quick Excel pivot." The moment raw PHI leaves the warehouse, you’ve created a breach risk that legal will now spend a month chasing.
  • De-identify before sharing with non-clinical staff. Replace patient_id with a hashed surrogate; drop name, DOB, MRN. The HIPAA Safe Harbor list has 18 identifiers to strip.
  • Row counts below 11 round up or are suppressed. In public-facing reports, any cell with n<11 patients is typically reported as "<11" to prevent re-identification. Internal numbers can be exact; external ones can’t.
  • Audit log everything. Most warehouses log queries against PHI-tagged tables. Assume your queries are reviewed. Don’t run "SELECT * FROM patients" out of curiosity.
Exercise

Compute the 30-day readmission rate by admitting department, but apply the small-cell suppression rule (show "<11" as the count when a department has fewer than 11 index admissions in the window).

Schema hint

fact_encounters(encounter_id, patient_id, department_id, encounter_type, admit_date, discharge_date, discharged_alive), dim_departments(department_id, name).

Expected

One row per department. Columns: department_name, index_admissions_display (either the count or "<11"), readmit_rate_pct (NULL if index_admissions < 11).

Show solution
WITH ia AS (
  SELECT encounter_id, patient_id, department_id, discharge_date
  FROM fact_encounters
  WHERE encounter_type = 'inpatient'
    AND discharged_alive = true
    AND discharge_date BETWEEN '2025-01-01' AND '2025-11-30'
),
r AS (
  SELECT ia.encounter_id
  FROM ia
  JOIN fact_encounters f
    ON f.patient_id = ia.patient_id
   AND f.encounter_type = 'inpatient'
   AND f.admit_date  >  ia.discharge_date
   AND f.admit_date <= ia.discharge_date + INTERVAL '30 days'
   AND f.encounter_id <> ia.encounter_id
  GROUP BY ia.encounter_id
),
rates AS (
  SELECT
    ia.department_id,
    COUNT(*) AS n,
    COUNT(r.encounter_id) * 100.0 / COUNT(*) AS readmit_rate_pct
  FROM ia
  LEFT JOIN r USING (encounter_id)
  GROUP BY ia.department_id
)
SELECT
  d.name AS department_name,
  CASE WHEN rates.n < 11 THEN '<11' ELSE rates.n::text END AS index_admissions_display,
  CASE WHEN rates.n < 11 THEN NULL  ELSE rates.readmit_rate_pct END AS readmit_rate_pct
FROM rates
JOIN dim_departments d USING (department_id)
ORDER BY d.name;

Healthcare SQL rewards analysts who internalize the domain constraints before the syntax. A perfectly-written query against a wrong HEDIS definition is worse than a slower query against the right one — because the first one ships a number that legal, compliance, and CMS will all have to live with.

Build these reflexes against real data.

caseSQL runs 100+ missions against a realistic star schema with planted data-quality issues. Free tier runs in your browser; upgrade to Pro for interview prompts.

Keep reading