I enjoyed this post!

1 min readJul 15, 2021

I found the motivation part to be illuminating.

One thought: if you've already come to terms with extracting python modules and testing them, then your notebook is not much more than a simple (linear, naive) pipeline which runs encapsulated steps, and whatever visualization cells you need to make sense of your data.

In which case, it would not be a great effort to author a more production-oriented pipeline in whatever orchestration engine you choose (kubeflow pipelines/airflow/prefect/etc.).

Since by definition the notebook as a pipeline tends to be linear and naive -maintaining it will not be hard at all.

This way you get pretty much the best of both worlds -

a notebook which is easy to maintain and you can pick up reserach any time, and a fully productionalized pipeilne which offers many benefits - from distribution /parallelization of steps, to logging and monitoring etc.

Just a thought ;-)

Written by Assaf Pinhasi

Responses (1)