Update on Overleaf.

2022-05-02 15:03:17 +00:00 · 2022-05-02 15:03:17 +00:00 · ea377fb0ee
parent 0fc55751b8
commit ea377fb0ee
2 changed files with 57 additions and 0 deletions
--- a/TODO.txt
+++ b/TODO.txt
--- a/main.tex
+++ b/main.tex
@ -145,3 +145,60 @@
 \bibliography{main}

 \end{document}
+--- For Camera-Readh ---
+
+- Spend more time emphasizing:
+  - The "First run" problem
+  - Dynamic provenance changes with each update
+  - Static buys you better scheduling decisions
+
+- Better explain "hot" vs "cold" cache (e.g., spark loading data into memory)
+
+- Explain that the choice of spark is due to Vizier having already chosen it.  The main thing we need it for is Arrow and scheduling.
+
+- Space permitting, maybe spend a bit more time contrasting microkernel with jupyter "hacks" like Nodebook.
+
+- Add some text emphasizing the point that even though Jupyter is not intended for batch ETL processing, that is how a lot of people (e.g., cite netflix, stitchfix?).  (and yes, we're aware that this is bad practice)
+
+- Around the point where we describe that Vizier involves explicit dependencies, also point out that we describe how to provide a Jupyter-like experience on top of this model later in the paper.  "Keep the mental model"
+
+- Typos:
+  - " Not that Vizier"
+
+- Add more future work
+  - Direct output to Arrow instead of via parquet.
+
+- Add copyright text
+
+- Check for and remove Type 3 fonts if any exist.
+
+- Make sure fonts are embedded (should be default for LaTeX)
+
+- Add:
+\begin{CCSXML}
+<ccs2012>
+   <concept>
+       <concept_id>10011007.10011006.10011041.10011048</concept_id>
+       <concept_desc>Software and its engineering~Runtime environments</concept_desc>
+       <concept_significance>300</concept_significance>
+       </concept>
+   <concept>
+       <concept_id>10002951.10002952.10002953.10010820.10003623</concept_id>
+       <concept_desc>Information systems~Data provenance</concept_desc>
+       <concept_significance>500</concept_significance>
+       </concept>
+ </ccs2012>
+\end{CCSXML}
+
+\ccsdesc[300]{Software and its engineering~Runtime environments}
+\ccsdesc[500]{Information systems~Data provenance}
+
+--- For Next Paper ---
+
+- Use GIT history to recover the dependency graph
+  - e.g., figure out how much dynamic provenance changes for a single cell over a series of edits.
+
+- Static vs Dynamic provenance: How different are they?
+  - e.g., how often do you need to "repair"
+  - How much further away from serial does dynamic get you?
+