From fcb5cb31d4252abbd5772afc099c646157ac50fa Mon Sep 17 00:00:00 2001 From: Boris Glavic Date: Thu, 31 Mar 2022 18:50:32 -0500 Subject: [PATCH] aP --- sections/approx-prov.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sections/approx-prov.tex b/sections/approx-prov.tex index 51b6f2c..98324ef 100644 --- a/sections/approx-prov.tex +++ b/sections/approx-prov.tex @@ -25,13 +25,13 @@ For our use cases of provenance (parallel execution of cells and limiting automa %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \tinysection{Approximate Provenance through Static Dataflow Analysis} - +% In terms of static analysis, we built a dataflow graph for the code of Python cells, using Python's AST library and dataflow equations that are standard in static program analysis. We only analyze the user's code and do not extend the analysis to other modules (libraries). While it is possible that we will miss data dependencies between objects created by the user that are caused by such library code, this is acceptable, because such dependencies will be discovered and compensated for at runtime. Like any static dataflow analysis technique, our approach may produce false positives.\footnote{This is due to the fact that static analysis has to reason about all possible controlflow paths through a program that could arise for some input, while for a concrete input only some of these paths may be taken.} For example, in the code snippet shown at the bottom in \Cref{fig:example-python-code} that value of the variable \texttt{b} may dependent on either \texttt{c} or \texttt{d} subject to whether \texttt{a < 10} evaluates to true. Since the value of \texttt{a} is only known at runtime, static analysis has to assume that \texttt{b} depends on both \texttt{c} and \texttt{d}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \tinysection{Refining Approximate Provenance at Runtime} - +% To deal with the approximate nature of provenance generated by our static analysis approach, we allow the educated guesses made by static analysis to be refined at runtime. This is were the benefits of isolated cell execution with explicit communication between cells through data artificers that are created, read, and written through an API provided by the system become clear. Because the system keeps book about which cells access what data through this API, new data dependencies not predicted by static analysis are automatically detected at runtime. Similarly, when a cell finishes execution, we will know which predicated data dependencies have not materialized. Next we will introduce a scheduling algorithm that dynamically adapts its execution plan when new dependencies are detected or predicted cell dependencies do not materialize.