paper-ParallelPython-Short/TODO.txt

--- For Camera-Readh ---

- Spend more time emphasizing:
  - The "First run" problem
  - Dynamic provenance changes with each update
  - Static buys you better scheduling decisions

- Better explain "hot" vs "cold" cache (e.g., spark loading data into memory)

- Explain that the choice of spark is due to Vizier having already chosen it.  The main thing we need it for is Arrow and scheduling.

- Space permitting, maybe spend a bit more time contrasting microkernel with jupyter "hacks" like Nodebook.

- Add some text emphasizing the point that even though Jupyter is not intended for batch ETL processing, that is how a lot of people (e.g., cite netflix, stitchfix?).  (and yes, we're aware that this is bad practice)

- Around the point where we describe that Vizier involves explicit dependencies, also point out that we describe how to provide a Jupyter-like experience on top of this model later in the paper.  "Keep the mental model"

- Typos:
  - " Not that Vizier"

- Add more future work
  - Direct output to Arrow instead of via parquet.

- Add copyright text

- Check for and remove Type 3 fonts if any exist.

- Make sure fonts are embedded (should be default for LaTeX)

--- For Next Paper ---

- Use GIT history to recover the dependency graph
  - e.g., figure out how much dynamic provenance changes for a single cell over a series of edits.

- Static vs Dynamic provenance: How different are they?
  - e.g., how often do you need to "repair"
  - How much further away from serial does dynamic get you?