Data CleaningVery Hard

How do you ensure correctness of incremental pipelines?

Seen at: Meta · Airbnb · Netflix

Your pipeline processes data incrementally (new partition each day). How do you guarantee the incremental result matches the full-refresh result?

Draft your answer

Saved to this browser only. Try it before you peek at the model answer.

Reveals the model answer and the self-score rubric.

More data cleaning questions