diff --git a/src/talks/2024-04-12-UIC.erb b/src/talks/2024-04-12-UIC.erb index df28bf8c..32ba6da5 100644 --- a/src/talks/2024-04-12-UIC.erb +++ b/src/talks/2024-04-12-UIC.erb @@ -221,7 +221,6 @@ end
  1. Static Analysis
  2. Microkernel Notebooks
  3. -
  4. Approximate Dependencies
  5. Inter-Kernel Interop [Work In Progress]
@@ -413,6 +412,7 @@ end "Bolt-on, Compact, and Rapid Program Slicing for Notebooks" (Shenkar et. al.; VLDB 2023) + (Similar ideas in Nodebook, etc...) @@ -423,90 +423,186 @@ end
  • Does a cell need to be re-run based on these changes?
    Dynamic sufficient (assuming deterministic cells).
  • -
  • -
    - What is the minimal set of inputs a cell needs to run? -
    Static required.
    -
    +
  • Which cell last wrote to a variable? +
    Dynamic sufficient.
  • -
  • Which cell last wrote to a variable? -
    Dynamic sufficient.
    +
  • +
    + What is the minimal set of inputs a cell needs to run? +
    Static required.
    +
  • - $$\{\;???\;\}$$ +
    + $\{\;???\;\}$ $\leftarrow \{\;z \rightarrow \textbf{@1}; x \rightarrow \textbf{@2}\;\}$ +
    +
    + $\{\;\;\;\;\;\}$ $\leftarrow \{\;z \rightarrow \textbf{@1}; x \rightarrow \textbf{@2}\;\}$ +
    <%= notebook() do nbcell("if z:\n y = x + 2", idx: 2) end %> -

    We need to be able to recover the notebook from any state.

    +

    We need to be able to recover the kernel to any state.

    - Not same interpreter means: - - No worrying about crashes - - Portability / Resume at any point - - Parallel execution +

    Why have only one kernel?

    + +

    🤷

    - Outline the data model: - - - Interpreter - - "backend 'state database'" - - Lazy-loading interpreter state + <%= + notebook() do + nbcell("x = expensive_initialization()") + nbcell("y = expensive_cloud_training1(x)") + nbcell("z = expensive_cloud_training2(x)") + nbcell("print( compare(y, z)") + end + %>
    - If we have to have the ability to recover a state, does it have to be the same interpreter *version*? +
    - If we have to have the ability to recover a state, does it have to be the same language? +

    When is parallelism allowed?

    +

    When is a cell runnable?

    - Cool things we can do if we lift the "state lives in the kernel" model - - - Deserialize program state into another interpreter - - Graphical widgets for common tasks (data loading) - - 1-3 slides on spreadsheets -
    - - -
    - How to figure out dependencies - - 1. Run the code (exact, after the fact) - 2. Static analysis (imprecise, incomplete) - 3. Both! +

    Static: What variables could be read/written.

    +

    vs

    +

    Dynamic: What variables were read/written.

    - Idea: use static analysis to create a mask. +

    Actual State

    - Cell state model: - - stable - - unknown - - stale - - runnable (revisit parallelism) + $$\{\;x \rightarrow \textbf{@1}\;\}$$ + +
    +

    Tentative State

    + + $$\{\;x \rightarrow \textbf{@1},\;y \rightarrow \textbf{???}\;\}$$ +
    + $$\{\;* \rightarrow \textbf{???}\;\}$$ +
    +
    - Preliminary results: TAPP +

    Cell Status

    + +
    +
    Complete
    +
    Active if: $\forall (x \rightarrow \textbf{@i}) \in \texttt{DynamicReads} : \texttt{InState}[x] = \textbf{@i}$
    +
    $\texttt{OutState} = \texttt{InState} + \{\;x \rightarrow \textbf{@i}\;|\;\forall (x \rightarrow \textbf{@i}) \in \texttt{DynamicWrites}\;\}$
    + +
    Stale
    +
    Active if: first run or $\exists (x \rightarrow \textbf{@i}) \in \texttt{DynamicReads} : \texttt{InState}[x] \neq \textbf{@i}$
    +
    $\texttt{OutState} = \texttt{InState} + \{\;x \rightarrow \textbf{???}\;|\;\forall x \in \texttt{StaticWrites}\;\}$
    + +
    Runnable
    +
    Active if: $\forall x \in \texttt{StaticReads} : \texttt{InState}[x] \neq \textbf{???}$
    +
    $\texttt{OutState} = \texttt{InState} + \{\;x \rightarrow \textbf{???}\;|\;\forall x \in \texttt{StaticWrites}\;\}$
    + +
    Unknown
    +
    Active otherwise.
    +
    $\texttt{OutState} = \texttt{InState} + \{\;x \rightarrow \textbf{???}\;|\;\forall x \in \texttt{StaticWrites}\;\}$
    +
    - +
    + + "The Right Tool for the Job: Data-Centric Workflows in Vizier" (Kennedy et. al.; IEEE DEB 2022) +
    - State model. Review: - - State needs to come *out* of the cell that created it - - State needs to go *into* the cell that is about to consume it +

    Serial

    + +

    Parallel

    + + "Runtime Provenance Refinement for Notebooks" (Deo et. al.; TaPP 2022) +
    + +
    + + https://openclipart.com +
    + +
    +

    Why have only one python version?

    + +

    🤷

    +
    + +
    + +
    + +
    +

    Why have only one language?

    + +

    🤷

    +
    + +
    + +
    + +
    +

    Why require code?

    + +

    🤷

    +
    + +
    +

    Repeatable Spreadsheet Dataframe Editing

    + + "Overlay Spreadsheets" (Kennedy et. al.; HILDA 2022) +
    + +
    +

    Data Widgets

    + + "Your notebook is not crumby enough, REPLace it" (Brachmann et. al.; CIDR 2020) +
    + +
    +

    Data Vis

    + + "Your notebook is not crumby enough, REPLace it" (Brachmann et. al.; CIDR 2020) +
    + +
    +

    Data Curation

    + + "Lenses: An On-Demand Approach to ETL" (Yang et. al.; VLDB 2015) +
    + + + +
    + +

    ... but this requires migrating state.

    +
    + +
    +

    State Management

    + +
    diff --git a/src/talks/graphics/2024-04-12/14thWarrior-Cartoon-Elephant.svg b/src/talks/graphics/2024-04-12/14thWarrior-Cartoon-Elephant.svg new file mode 100644 index 00000000..13ba05d1 --- /dev/null +++ b/src/talks/graphics/2024-04-12/14thWarrior-Cartoon-Elephant.svg @@ -0,0 +1,4 @@ + + + +image/svg+xmlOpenclipartCartoon Elephant2010-09-03T16:28:14Cartoon Elephant. Remixed from Studiofibonacci's Cartoon Rhino.https://openclipart.org/detail/83479/cartoon-elephant-by-14thwarrior14thWarriorafricaanimalcartoonelephantindiamammalremix \ No newline at end of file diff --git a/src/talks/graphics/2024-04-12/Dependencies.svg b/src/talks/graphics/2024-04-12/Dependencies.svg index f30a050b..cb6211e9 100644 --- a/src/talks/graphics/2024-04-12/Dependencies.svg +++ b/src/talks/graphics/2024-04-12/Dependencies.svg @@ -37,7 +37,7 @@ id="defs1"> @@ -178,7 +178,7 @@ x="75.052376" y="47.658699">Python state is mutable @@ -216,7 +216,7 @@ x="-7.6239524" y="35.563065">Dependency tracking is hard diff --git a/src/talks/graphics/2024-04-12/MultiRunnerBlockDiagram.svg b/src/talks/graphics/2024-04-12/MultiRunnerBlockDiagram.svg new file mode 100644 index 00000000..8b82a324 --- /dev/null +++ b/src/talks/graphics/2024-04-12/MultiRunnerBlockDiagram.svg @@ -0,0 +1,552 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + W + orkflow API + + ArtifactStore + + Scheduler + + + + Command Implementation + + + + + + + + + + + + + + + + Data Processing Backends + + + + + + + + + diff --git a/src/talks/graphics/2024-04-12/Parallel.svg b/src/talks/graphics/2024-04-12/Parallel.svg new file mode 100644 index 00000000..ce2c3d8a --- /dev/null +++ b/src/talks/graphics/2024-04-12/Parallel.svg @@ -0,0 +1,330 @@ + + + + + + + + + [1] + + [2] + + [3] + + [4] + + x + y + x + z + x + y z + + + + + + + + + + + + y + x + + [2] + + + z + x + + [3] + + + + + + + diff --git a/src/talks/graphics/2024-04-12/VizierDataVis.png b/src/talks/graphics/2024-04-12/VizierDataVis.png new file mode 100644 index 00000000..2d3e419f Binary files /dev/null and b/src/talks/graphics/2024-04-12/VizierDataVis.png differ diff --git a/src/talks/graphics/2024-04-12/VizierLoadData.png b/src/talks/graphics/2024-04-12/VizierLoadData.png new file mode 100644 index 00000000..57afa8d1 Binary files /dev/null and b/src/talks/graphics/2024-04-12/VizierLoadData.png differ diff --git a/src/talks/graphics/2024-04-12/VizierMimir.png b/src/talks/graphics/2024-04-12/VizierMimir.png new file mode 100644 index 00000000..d29dc578 Binary files /dev/null and b/src/talks/graphics/2024-04-12/VizierMimir.png differ diff --git a/src/talks/graphics/2024-04-12/gantt_parallel.pdf b/src/talks/graphics/2024-04-12/gantt_parallel.pdf new file mode 100644 index 00000000..a7afeb87 Binary files /dev/null and b/src/talks/graphics/2024-04-12/gantt_parallel.pdf differ diff --git a/src/talks/graphics/2024-04-12/gantt_parallel.svg b/src/talks/graphics/2024-04-12/gantt_parallel.svg new file mode 100644 index 00000000..aa4182a3 --- /dev/null +++ b/src/talks/graphics/2024-04-12/gantt_parallel.svg @@ -0,0 +1,659 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1 + + 2 + + 3 + + 4 + + 5 + + 6 + + 7 + + 8 + + 9 + + 10 + + 11 + Cell Position + + + 0 + + + 5 + + + 10 + + + 15 + + + 20 + Runtime (sec) + + + + + + + + + + + + + + + + + diff --git a/src/talks/graphics/2024-04-12/gantt_serial.pdf b/src/talks/graphics/2024-04-12/gantt_serial.pdf new file mode 100644 index 00000000..2d9951f1 Binary files /dev/null and b/src/talks/graphics/2024-04-12/gantt_serial.pdf differ diff --git a/src/talks/graphics/2024-04-12/gantt_serial.svg b/src/talks/graphics/2024-04-12/gantt_serial.svg new file mode 100644 index 00000000..56cba28c --- /dev/null +++ b/src/talks/graphics/2024-04-12/gantt_serial.svg @@ -0,0 +1,660 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1 + + 2 + + 3 + + 4 + + 5 + + 6 + + 7 + + 8 + + 9 + + 10 + + 11 + Cell Position + + + 0 + + + 20 + + + 40 + + + 60 + + + 80 + Runtime (sec) + + + + + + + + + + + + + + + + +