Data CleaningVery Hard

How do you detect silent data loss in a pipeline?

Seen at: Netflix · Airbnb · Stripe

A weekly report has looked fine for months, but it turns out the ingestion pipeline has been silently dropping 3% of events since January due to a serialization bug in a rare edge case. How would you have detected this earlier, and what monitoring would you set up to prevent future silent loss?

Draft your answer

Saved to this browser only. Try it before you peek at the model answer.

Reveals the model answer and the self-score rubric.

More data cleaning questions