Revisions
This commit is contained in:
parent
d0922d87cd
commit
aab72ced0f
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 112 KiB |
|
@ -92,8 +92,11 @@
|
||||||
<section>
|
<section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
<h3>Act 1</h3>
|
||||||
<p>Alice wants to analize two unaligned time series.</p>
|
<p>Alice wants to analize two unaligned time series.</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
<table style="font-size: 60%; display: inline; padding: 50px;">
|
<table style="font-size: 60%; display: inline; padding: 50px;">
|
||||||
<tr><th>Time</th><th>Reading</th></tr>
|
<tr><th>Time</th><th>Reading</th></tr>
|
||||||
<tr><td>1575731001</td><td>0</td></tr>
|
<tr><td>1575731001</td><td>0</td></tr>
|
||||||
|
@ -159,12 +162,6 @@
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<p class="fragment">Interpolate missing values</p>
|
<p class="fragment">Interpolate missing values</p>
|
||||||
<p class="fragment">Hand tune around the switchover as-needed</p>
|
<p class="fragment">Hand tune around the switchover as-needed</p>
|
||||||
<pre class="fragment"><code class="sql">
|
|
||||||
SELECT a.time, a.reading AS reading_one
|
|
||||||
b.reading AS reading_two
|
|
||||||
FROM series_one_buckets a, series_two_buckets b
|
|
||||||
WHERE a.time = b.time
|
|
||||||
</code></pre>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -203,7 +200,7 @@
|
||||||
<section>
|
<section>
|
||||||
<section>
|
<section>
|
||||||
<h3>Act 2</h3>
|
<h3>Act 2</h3>
|
||||||
<p class="fragment">Carol gets a dataset from Dave</p>
|
<p>Carol gets a dataset from Dave</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -239,6 +236,7 @@
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<section>
|
<section>
|
||||||
|
<h3>Act 3</h3>
|
||||||
<p>Eve needs to load a CSV file</p>
|
<p>Eve needs to load a CSV file</p>
|
||||||
<img src="graphics/Binary-file-20110715.svg" height="100px" style="vertical-align: middle;">
|
<img src="graphics/Binary-file-20110715.svg" height="100px" style="vertical-align: middle;">
|
||||||
→
|
→
|
||||||
|
@ -255,7 +253,7 @@
|
||||||
I'm sorry, I can't do that, Eve.<br/>
|
I'm sorry, I can't do that, Eve.<br/>
|
||||||
</p>
|
</p>
|
||||||
<p class="fragment" style="font-family: monospace;">
|
<p class="fragment" style="font-family: monospace;">
|
||||||
You have a stray comma on line 1252538.
|
You have a non-numerical value at position 1252538:24.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
@ -308,11 +306,42 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
<h3>Wouldn't it be nice if...</h3>
|
||||||
<img src="graphics/montoya.jpeg" height="400px" />
|
<img src="graphics/montoya.jpeg" height="400px" />
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>There needs to be a better way!</h3>
|
<h3>Wouldn't it be nice if...</h3>
|
||||||
|
<p>... this is what Bob saw:</p>
|
||||||
|
<img src="graphics/time_series_with_errors.svg" />
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h3>Wouldn't it be nice if...</h3>
|
||||||
|
<p>... this is what Carol saw:</p>
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<td style="
|
||||||
|
color: rgb(251, 189, 8);
|
||||||
|
background-color: #eed;
|
||||||
|
text-decoration: none;
|
||||||
|
text-decoration-color: rgb(251, 189, 8);
|
||||||
|
text-decoration-line: none;
|
||||||
|
text-decoration-style: solid;
|
||||||
|
vertical-align: middle;
|
||||||
|
border-radius: 15px 0px 0px 15px;
|
||||||
|
font-size: 150%">⚠</td>
|
||||||
|
<td style="
|
||||||
|
font-size: 70%;
|
||||||
|
background-color: #eee;
|
||||||
|
vertical-align: middle;
|
||||||
|
border-radius: 0px 15px 15px 0px;
|
||||||
|
padding: 20px;">
|
||||||
|
The data included an unexpected value: <b>'Non-Hispanic White'</b><br/>The most similar known value is <b>'White Non-Hispanic'</b>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
</table>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -324,19 +353,33 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p>Declare a caveat when volating an assumption might...</p>
|
<h3>Why?</h3>
|
||||||
<ul>
|
<h4 class="fragment" data-fragment-index="1">Propagation</h4>
|
||||||
<li class="fragment">... change one or more values</li>
|
<dl>
|
||||||
<li class="fragment">... remove one or more records</li>
|
<dd class="fragment" data-fragment-index="2" style="margin-left: -20px;">Caveats...</dd>
|
||||||
<li class="fragment">... add one or more record</li>
|
|
||||||
<li class="fragment">(rarely) ... change the db schema</li>
|
<div class="fragment" data-fragment-index="2">
|
||||||
</ul>
|
<dt>... can go where the data goes</dt>
|
||||||
|
<dd>Derived values retain caveats on source data.</dd>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="fragment" data-fragment-index="3">
|
||||||
|
<dt>... stop where the data stops</dt>
|
||||||
|
<dd>Irrelevant caveats don't get propagated</dd>
|
||||||
|
</div>
|
||||||
|
</dl>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h3>Wouldn't it be nice if...</h3>
|
||||||
|
<p>... this is what Eve saw:</p>
|
||||||
|
<img src="graphics/caveat-spreadsheet.png"/>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<section>
|
<section>
|
||||||
<p>So what is a caveat?</p>
|
<h3>What is a Caveat?</h3>
|
||||||
|
|
||||||
<p class="fragment">A brief digression...</p>
|
<p class="fragment">A brief digression...</p>
|
||||||
</section>
|
</section>
|
||||||
|
@ -358,49 +401,98 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p class="fragment"><b>Possible</b> tuples exist in at least one one possible world. $$possible(\mathcal R) = \bigcup_{R \in \mathcal R} R$$</p>
|
|
||||||
<p class="fragment"><b>Certain</b> tuples exist in all possible worlds. $$certain(\mathcal R) = \bigcap_{R \in \mathcal R} R$$</p>
|
<p class="fragment"><b>Certain</b> tuples exist in all possible worlds. $$certain(\mathcal R) = \bigcap_{R \in \mathcal R} R$$</p>
|
||||||
|
<p class="fragment"><b>Uncertain</b> tuples exist in at least one, <br/>but not all possible worlds. $$uncertain(\mathcal R) = \bigcup_{R \in \mathcal R} R - certain(\mathcal R)$$</p>
|
||||||
<p style="font-size: 70%;" class="fragment">(not limited to set semantics)</p>
|
<p style="font-size: 70%;" class="fragment">(not limited to set semantics)</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<p>A caveat is an assumption tied to one or more data elements (cells or rows).</p>
|
||||||
|
<p>If the assumption is wrong, so is the element.</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h3>Alice / Bob</h3>
|
||||||
|
<ul>
|
||||||
|
<li><span style="font-family: monospace;">FIRST</span> may not pick the right value for a bucket with 2+ distinct values.</li>
|
||||||
|
<li>Interpolation may not pick the right value for a bucket with 0 values.</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h3>Carol / Dave</h3>
|
||||||
|
<ul>
|
||||||
|
<li>The model hyperparameters may not work if the data changes too significantly.</li>
|
||||||
|
<li>New values could indicate new data errors that Carol's ingest script hasn't accounted for.</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<h3>Eve / Hal</h3>
|
||||||
|
<ul>
|
||||||
|
<li>Replacing a parse error with a NULL might not be what Eve expects.</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<p>An element has a caveat → The element is uncertain.</p>
|
||||||
|
|
||||||
|
<p class="fragment">... and btw, here's why.</p>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<section>
|
||||||
|
<h3>Caveats</h3>
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li style="color: lightgrey;">Story Time</li>
|
||||||
|
<li style="color: lightgrey;">What is a Caveat?</li>
|
||||||
|
<li class="fragment grow highlight-blue">Applying Caveats</li>
|
||||||
|
<li>Propagating Caveats</li>
|
||||||
|
<li>Caveats Beyond SQL</li>
|
||||||
|
<li>The Vizier Notebook</li>
|
||||||
|
</ol>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<pre><code class="sql">
|
||||||
|
SELECT setting_1, setting_2, estimate
|
||||||
|
FROM Simulation;
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<p>We want to indicate that the estimate column is only accurate if (for example) P ≠ NP.</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<p style="font-family: monospace;">caveat(value, assumption)</p>
|
||||||
|
|
||||||
|
<p>returns <span style="font-family: monospace;">value</span>, annotated with <span style="font-family: monospace; ">assumption</span>.</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT setting_1, setting_2,
|
SELECT setting_1, setting_2,
|
||||||
caveat(estimate, 'Only correct if phi is 42')
|
caveat(estimate, 'Only correct if P ≠ NP')
|
||||||
AS estimate
|
AS estimate
|
||||||
FROM Simulation;
|
FROM Simulation;
|
||||||
</code></pre>
|
</code></pre>
|
||||||
is the same as
|
<p><span style="font-family: monospace;">annotation</span> is just a human-readable string.</p>
|
||||||
<pre><code class="sql">
|
|
||||||
SELECT setting_1, setting_2, estimate
|
|
||||||
FROM Simulation;
|
|
||||||
</code></pre>
|
|
||||||
<p class="fragment"><b>Caveat: </b>If it turns out that phi ≠ 42, <br/>all <span style="font-family: monospace;">estimate</span> values could be wrong.</p>
|
|
||||||
<p style="font-size: 70%" class="fragment">(The first query annotates all <span style="font-family: monospace;">`estimate`</span> values with the caveat)</p>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p style="font-family: monospace;">caveat(value, assumption)</p>
|
<h3>Incomplete Databases</h3>
|
||||||
<p class="fragment">Each call fragments reality into multiple possible worlds.</p>
|
<p>
|
||||||
<dl>
|
<span style="font-family: monospace;">caveat()</span> creates 2 sets of possible worlds:
|
||||||
<div class="fragment">
|
<ul>
|
||||||
<dt>value</dt>
|
<li>The assumption holds: <span style="font-family: monospace;">value</span> is correct.</li>
|
||||||
<dd>Indicates the value in <u>one</u> of those worlds.</dd>
|
<li>The assumption does not hold: <span style="font-family: monospace;">value</span> is unknown.</li>
|
||||||
</div>
|
</ul>
|
||||||
|
</p>
|
||||||
<div class="fragment">
|
|
||||||
<dt>assumption</dt>
|
|
||||||
<dd>being wrong indicates that we need a different world.</dd>
|
|
||||||
</div>
|
|
||||||
</dl>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<h3>Applying Caveats</h3>
|
|
||||||
<p class="fragment">a few examples...</p>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
<h3>Alice / Bob</h3>
|
||||||
<p>Mark multi-valued buckets <span class="fragment">(key repair).</span></p>
|
<p>Mark multi-valued buckets <span class="fragment">(key repair).</span></p>
|
||||||
<pre><code class="sql" data-line-numbers="2-3">
|
<pre><code class="sql" data-line-numbers="2-3">
|
||||||
SELECT bucket,
|
SELECT bucket,
|
||||||
|
@ -412,20 +504,24 @@
|
||||||
FIRST(reading) AS reading
|
FIRST(reading) AS reading
|
||||||
COUNT(*) AS bucket_size
|
COUNT(*) AS bucket_size
|
||||||
FROM sensor
|
FROM sensor
|
||||||
|
GROUP BY bucket;
|
||||||
)
|
)
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<p class="fragment">Interpolation is more complex... but similar.</p>
|
<p class="fragment">Interpolation is more complex... but similar.</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
<h3>Carol / Dave</h3>
|
||||||
<p>Mark unexpected values the model wasn't trained on.</p>
|
<p>Mark unexpected values the model wasn't trained on.</p>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT
|
SELECT
|
||||||
CASE WHEN race_ethnicity
|
CASE WHEN race_ethnicity
|
||||||
IN ('white non-hispanic', 'black non-hispanic', /* ... */)
|
IN ('White Non-Hispanic', 'Black Non-Hispanic', /* ... */)
|
||||||
THEN race_ethnicity
|
THEN race_ethnicity
|
||||||
|
|
||||||
ELSE caveat(race_ethnicity,
|
ELSE caveat(race_ethnicity,
|
||||||
'Unexpected race_ethnicity: ' & race_ethnicity)
|
'Unexpected race_ethnicity: ' & race_ethnicity)
|
||||||
|
|
||||||
END, /* ... */
|
END, /* ... */
|
||||||
FROM R
|
FROM R
|
||||||
</code></pre>
|
</code></pre>
|
||||||
|
@ -433,41 +529,29 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p>Spark's CSV loader can augment tables with a $\texttt{parse_error}$ column.</p>
|
<h3>Eve / Hal</h3>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT * FROM csv_file
|
SELECT /* ... */,
|
||||||
WHERE
|
CASE WHEN CAST(salary AS float) IS NULL THEN
|
||||||
CASE WHEN parse_error IS NULL THEN TRUE ELSE
|
|
||||||
caveat(FALSE, parse_error)
|
caveat(NULL, 'Could not cast [ '&salary&' ] to float.')
|
||||||
END;
|
|
||||||
|
ELSE CAST(salary AS float) END AS salary
|
||||||
|
FROM raw_csv_data;
|
||||||
</code></pre>
|
</code></pre>
|
||||||
</section>
|
</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>Why?</h3>
|
|
||||||
<h4 class="fragment" data-fragment-index="1">Propagation</h4>
|
|
||||||
<dl>
|
|
||||||
<dd class="fragment" data-fragment-index="2" style="margin-left: -20px;">Caveats...</dd>
|
|
||||||
|
|
||||||
<div class="fragment" data-fragment-index="2">
|
|
||||||
<dt>... can go where the data goes</dt>
|
|
||||||
<dd>Derived values retain caveats on source data.</dd>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="fragment" data-fragment-index="3">
|
|
||||||
<dt>... stop where the data stops</dt>
|
|
||||||
<dd>Irrelevant caveats don't get propagated</dd>
|
|
||||||
</div>
|
|
||||||
</dl>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>Caveats</h3>
|
<h3>Caveats</h3>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li>Propagating Caveats</li>
|
<li style="color: lightgrey;">Story Time</li>
|
||||||
|
<li style="color: lightgrey;">What is a Caveat?</li>
|
||||||
|
<li style="color: lightgrey;">Applying Caveats</li>
|
||||||
|
<li class="fragment grow highlight-blue">Propagating Caveats</li>
|
||||||
<li>Caveats Beyond SQL</li>
|
<li>Caveats Beyond SQL</li>
|
||||||
<li>The Vizier Notebook</li>
|
<li>The Vizier Notebook</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
@ -553,21 +637,6 @@
|
||||||
- Step 1: Which values are affected by a caveat
|
- Step 1: Which values are affected by a caveat
|
||||||
- Step 2: Which caveats affect those values
|
- Step 2: Which caveats affect those values
|
||||||
-->
|
-->
|
||||||
<section>
|
|
||||||
<h3>Caveats</h3>
|
|
||||||
|
|
||||||
<ol>
|
|
||||||
<li class="fragment highlight-blue grow">Propagating Caveats</li>
|
|
||||||
<li>Caveats Beyond SQL</li>
|
|
||||||
<li>The Vizier Notebook</li>
|
|
||||||
</ol>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<p>What semantics do we want?</p>
|
|
||||||
<p class="fragment">Caveatted data elements could be wrong.</p>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p><b>Certain Data Elements: </b> Elements guaranteed to be in the result <u>in all possible worlds</u>.</p>
|
<p><b>Certain Data Elements: </b> Elements guaranteed to be in the result <u>in all possible worlds</u>.</p>
|
||||||
|
|
||||||
|
@ -578,8 +647,9 @@
|
||||||
<p>If a caveatted element can't affect an output element, don't propagate its caveats!</p>
|
<p>If a caveatted element can't affect an output element, don't propagate its caveats!</p>
|
||||||
<p class="fragment">Propagate caveats to any data elements that could be affected by a change.</p>
|
<p class="fragment">Propagate caveats to any data elements that could be affected by a change.</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<p><b>Challenge: </b> How do we propagate caveats<br/>without penalizing query evaluation.</p>
|
<p><b>Challenge: </b> How do we propagate caveats<br/>without penalizing query evaluation?</p>
|
||||||
|
|
||||||
<p class="fragment">Don't!</p>
|
<p class="fragment">Don't!</p>
|
||||||
</section>
|
</section>
|
||||||
|
@ -649,19 +719,17 @@
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
CREATE VIEW by_language AS
|
CREATE VIEW survey_responses AS
|
||||||
SELECT language,
|
SELECT language,
|
||||||
CASE WHEN CAST(salary AS float) IS NOT NULL THEN
|
CASE WHEN CAST(salary AS float) IS NULL THEN
|
||||||
|
|
||||||
caveat(NULL, 'Could not cast [ '&salary&' ] to float.')
|
caveat(NULL, 'Could not cast [ '&salary&' ] to float.')
|
||||||
|
|
||||||
ELSE CAST(salary AS float) END AS salary
|
ELSE CAST(salary AS float) END AS salary
|
||||||
FROM raw_csv_data;
|
FROM raw_csv_data;
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<div class="fragment">
|
<div class="fragment">
|
||||||
becomes
|
becomes
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
CREATE VIEW by_language AS
|
CREATE VIEW survey_responses AS
|
||||||
SELECT language, CAST(salary AS float) AS salary,
|
SELECT language, CAST(salary AS float) AS salary,
|
||||||
FALSE AS _caveat_field_language,
|
FALSE AS _caveat_field_language,
|
||||||
CAST(salary as float) IS NULL AS _caveat_field_salary
|
CAST(salary as float) IS NULL AS _caveat_field_salary
|
||||||
|
@ -674,7 +742,7 @@
|
||||||
<section>
|
<section>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT salary
|
SELECT salary
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
WHERE language = 'Scala'
|
WHERE language = 'Scala'
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<div class="fragment">
|
<div class="fragment">
|
||||||
|
@ -683,7 +751,7 @@
|
||||||
SELECT salary,
|
SELECT salary,
|
||||||
_caveat_field_salary AS _caveat_field_salary,
|
_caveat_field_salary AS _caveat_field_salary,
|
||||||
_caveat_row AND _caveat_field_language AS _caveat_row
|
_caveat_row AND _caveat_field_language AS _caveat_row
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
WHERE language = 'Scala'
|
WHERE language = 'Scala'
|
||||||
</code></pre>
|
</code></pre>
|
||||||
</div>
|
</div>
|
||||||
|
@ -692,7 +760,7 @@
|
||||||
<section>
|
<section>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT AVG(salary) AS salary
|
SELECT AVG(salary) AS salary
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<div class="fragment">
|
<div class="fragment">
|
||||||
becomes
|
becomes
|
||||||
|
@ -700,7 +768,7 @@
|
||||||
SELECT salary,
|
SELECT salary,
|
||||||
GROUP_OR(_caveat_field_salary) AS _caveat_field_salary,
|
GROUP_OR(_caveat_field_salary) AS _caveat_field_salary,
|
||||||
FALSE AS _caveat_row
|
FALSE AS _caveat_row
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
</code></pre>
|
</code></pre>
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
@ -708,21 +776,21 @@
|
||||||
<section>
|
<section>
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT language, AVG(salary) AS salary
|
SELECT language, AVG(salary) AS salary
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
GROUP BY language
|
GROUP BY language
|
||||||
</code></pre>
|
</code></pre>
|
||||||
<div class="fragment">
|
<div class="fragment">
|
||||||
... first we evaluate
|
... first we evaluate
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT GROUP_OR(_caveat_field_language)
|
SELECT GROUP_OR(_caveat_field_language)
|
||||||
FROM by_language
|
FROM survey_responses
|
||||||
</code></pre>
|
</code></pre>
|
||||||
</div>
|
</div>
|
||||||
<p class="fragment">Can often be evaluated statically.</p>
|
<p class="fragment">Can often be evaluated statically.</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>If TRUE</h3>
|
<h3>If GROUP BY has caveats</h3>
|
||||||
|
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT language, AVG(salary) AS salary
|
SELECT language, AVG(salary) AS salary
|
||||||
|
@ -736,7 +804,7 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>If FALSE</h3>
|
<h3>If no GROUP BY caveats</h3>
|
||||||
|
|
||||||
<pre><code class="sql">
|
<pre><code class="sql">
|
||||||
SELECT language, AVG(salary) AS salary
|
SELECT language, AVG(salary) AS salary
|
||||||
|
@ -749,10 +817,6 @@
|
||||||
</code></pre>
|
</code></pre>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
|
||||||
<p>Ongoing work with Boris Glavic + Su Feng @ IIT</p>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
<section>
|
<section>
|
||||||
<!--
|
<!--
|
||||||
|
@ -863,8 +927,11 @@
|
||||||
<h3>Caveats</h3>
|
<h3>Caveats</h3>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li style="color: grey;">Propagating Caveats</li>
|
<li style="color: lightgrey;">Story Time</li>
|
||||||
<li class="fragment highlight-blue grow">Caveats Beyond SQL</li>
|
<li style="color: lightgrey;">What is a Caveat?</li>
|
||||||
|
<li style="color: lightgrey;">Applying Caveats</li>
|
||||||
|
<li style="color: lightgrey;">Propagating Caveats</li>
|
||||||
|
<li class="fragment grow highlight-blue">Caveats Beyond SQL</li>
|
||||||
<li>The Vizier Notebook</li>
|
<li>The Vizier Notebook</li>
|
||||||
</ol>
|
</ol>
|
||||||
</section>
|
</section>
|
||||||
|
@ -1009,9 +1076,12 @@
|
||||||
<h3>Caveats</h3>
|
<h3>Caveats</h3>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li style="color: grey;">Propagating Caveats</li>
|
<li style="color: lightgrey;">Story Time</li>
|
||||||
<li style="color: grey;">Caveats Beyond SQL</li>
|
<li style="color: lightgrey;">What is a Caveat?</li>
|
||||||
<li class="fragment highlight-blue grow">The Vizier Notebook</li>
|
<li style="color: lightgrey;">Applying Caveats</li>
|
||||||
|
<li style="color: lightgrey;">Propagating Caveats</li>
|
||||||
|
<li style="color: lightgrey;">Caveats Beyond SQL</li>
|
||||||
|
<li class="fragment grow highlight-blue">The Vizier Notebook</li>
|
||||||
</ol>
|
</ol>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -1020,24 +1090,6 @@
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
|
||||||
<section>
|
|
||||||
<h3>Other Work in Progress</h3>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<h3>Caveats + Data Vis</h3>
|
|
||||||
<img src="graphics/cli_plot.png" height="400px;">
|
|
||||||
<p><b>Poonam Kumari @ UB</b></p>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
|
||||||
<h3>Handling Hidden Caveats</h3>
|
|
||||||
<p><b>Work with Boris Glavic, Su Feng @ IIT<br/>
|
|
||||||
Atri Rudra, Aaron Huber @ UB</b></p>
|
|
||||||
</section>
|
|
||||||
</section>
|
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<h3>
|
<h3>
|
||||||
<img src="graphics/vizier-blue.svg" height="100px" style="vertical-align: middle; margin-right: 20px;" />
|
<img src="graphics/vizier-blue.svg" height="100px" style="vertical-align: middle; margin-right: 20px;" />
|
||||||
|
|
|
@ -13,3 +13,61 @@ FROM salaries
|
||||||
GROUP BY PRIMARY_LANGUAGE_TECHNOLOGY_STACK
|
GROUP BY PRIMARY_LANGUAGE_TECHNOLOGY_STACK
|
||||||
HAVING tot > 2;
|
HAVING tot > 2;
|
||||||
|
|
||||||
|
--- group by race_ethnicity ---
|
||||||
|
|
||||||
|
SELECT year,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'Other Race/ Ethnicity' THEN deaths ELSE 0 END ) as Other,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'Black Non-Hispanic' THEN deaths ELSE 0 END ) as Black_NH,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'White Non-Hispanic' THEN deaths ELSE 0 END ) as White_NH,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'Hispanic' THEN deaths ELSE 0 END ) as Hispanic,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'Asian and Pacific Islander' THEN deaths ELSE 0 END ) as Asian_Pacific,
|
||||||
|
SUM( CASE WHEN race_ethnicity = 'Not Stated/Unknown' THEN deaths ELSE 0 END ) as Unknown_Ethnicity
|
||||||
|
FROM causes GROUP BY year
|
||||||
|
|
||||||
|
--- # Import matplotlib, generate a plot, and output it.
|
||||||
|
import matplotlib
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import io
|
||||||
|
#switch to non display backend
|
||||||
|
plt.switch_backend('agg')
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# Get object for dataset with given name.
|
||||||
|
ds = vizierdb.get_dataset('causes')
|
||||||
|
|
||||||
|
data = dict()
|
||||||
|
|
||||||
|
for row in ds.rows:
|
||||||
|
year = row.get_value('YEAR')
|
||||||
|
ethnicity = row.get_value('RACE_ETHNICITY')
|
||||||
|
deaths = row.get_value('DEATHS')
|
||||||
|
|
||||||
|
if deaths != None:
|
||||||
|
if ethnicity not in data:
|
||||||
|
data[ethnicity] = dict()
|
||||||
|
|
||||||
|
if year not in data[ethnicity]:
|
||||||
|
data[ethnicity][year] = 0
|
||||||
|
|
||||||
|
data[ethnicity][year] += deaths
|
||||||
|
|
||||||
|
# Data for plotting
|
||||||
|
fig, ax = plt.subplots()
|
||||||
|
for ethnicity in data:
|
||||||
|
by_year = data[ethnicity]
|
||||||
|
years = sorted(by_year.keys())
|
||||||
|
ax.plot(
|
||||||
|
years,
|
||||||
|
[ by_year[year] for year in years ],
|
||||||
|
label = ethnicity,
|
||||||
|
linewidth=3
|
||||||
|
)
|
||||||
|
|
||||||
|
ax.plot()
|
||||||
|
ax.set(xlabel='Year', ylabel='Death Rate')
|
||||||
|
ax.grid()
|
||||||
|
ax.legend(loc = 'upper left')
|
||||||
|
|
||||||
|
with io.BytesIO() as imgbytes:
|
||||||
|
fig.savefig(imgbytes, format="svg")
|
||||||
|
print(imgbytes.getvalue().decode("utf-8"))
|
||||||
|
|
Loading…
Reference in a new issue