Data CleaningHard
An upstream team silently changed a column's meaning. How do you catch and handle it?
Seen at: Netflix · Airbnb · Stripe
Your pipeline joins against a product_status column from an upstream team. Last Tuesday, they changed the set of valid values without telling you — "active" is now "live," and "paused" is now two values "manual_paused" and "auto_paused." How would you detect this, and how do you handle upstream drift going forward?
Draft your answer
Saved to this browser only. Try it before you peek at the model answer.
Stuck? Peek at a hint
Reveals the model answer and the self-score rubric.