--- For Camera-Readh --- - Spend more time emphasizing: - The "First run" problem - Dynamic provenance changes with each update - Static buys you better scheduling decisions - Better explain "hot" vs "cold" cache (e.g., spark loading data into memory) - Explain that the choice of spark is due to Vizier having already chosen it. The main thing we need it for is Arrow and scheduling. - Space permitting, maybe spend a bit more time contrasting microkernel with jupyter "hacks" like Nodebook. - Add some text emphasizing the point that even though Jupyter is not intended for batch ETL processing, that is how a lot of people (e.g., cite netflix, stitchfix?). (and yes, we're aware that this is bad practice) - Around the point where we describe that Vizier involves explicit dependencies, also point out that we describe how to provide a Jupyter-like experience on top of this model later in the paper. "Keep the mental model" - Typos: - " Not that Vizier" - Add more future work - Direct output to Arrow instead of via parquet. - Add copyright text - Check for and remove Type 3 fonts if any exist. - Make sure fonts are embedded (should be default for LaTeX) --- For Next Paper --- - Use GIT history to recover the dependency graph - e.g., figure out how much dynamic provenance changes for a single cell over a series of edits. - Static vs Dynamic provenance: How different are they? - e.g., how often do you need to "repair" - How much further away from serial does dynamic get you?