501 slides

pull/2/head
Oliver Kennedy 2022-09-19 22:17:00 -04:00
parent d4dc6d813a
commit cab1600312
Signed by: okennedy
GPG Key ID: 3E5F9B3ABD3FDB60
3 changed files with 676 additions and 0 deletions

View File

@ -0,0 +1,177 @@
---
template: templates/talk_slides_v1.erb
title: How to Review a Paper
---
<section>
<h2>How to <s>Write</s> Review a Paper</h2>
</section>
<section>
<h3>Why do you care?</h3>
<ul>
<li>Your advisor will one day ask you to review a paper.</li>
<li class="fragment">You will graduate and one day be asked to review a paper.</li>
<li class="fragment">Your papers will be reviewed like this... get into your reviewer's head!</li>
</ul>
</section>
<section>
<h3>My Process</h3>
<dl>
<dt>Make the paper scribblable</dt>
<dd>I load the paper into my tablet or print it out. It's important that (a) I'm comfortable, (b) distraction-free, and (c) can make unrestricted markings.</dd>
<dt>Read-Through</dt>
<dd>I generally read the paper in a single pass. But I write down <b>everything</b>.</dd>
<dt>Review notes</dt>
<dd>Back at a text editor/CMT/etc... I go through the notes and ensure that each note I made is (a) addressed in the paper, and/or (b) appears in my review.</dd>
</dl>
</section>
<section>
<h3>Criterion</h3>
<ul>
<li>Motivation</li>
<li>Completeness</li>
<li>Validity</li>
<li>Readability</li>
</ul>
<p class="fragment">A good paper doesn't need to win in all categories.</p>
</section>
<section>
<h3>Motivation</h3>
<p>Does the paper address a relevant problem or provide useful insights?</p>
<ul>
<li>The paper identifies a new problem.</li>
<li>The paper provides new insights on an old problem.</li>
<li>The paper identifies new applications of old techniques.</li>
</ul>
<p class="fragment">Why would someone want to read this paper?</p>
</section>
<section>
<h3>Completeness</h3>
<p>Does the paper solve the problem it set out to solve?</p>
<ul>
<li>Validate the list of claims (if present).</li>
<li>Does the paper have a broad/narrow enough scope?</li>
<li>Is the initial motivation addressed? <ul>
<li>... or has a sufficient milestone been met?</li>
</ul></li>
</ul>
<p class="fragment">Is the paper clickbait?</p>
</section>
<section>
<h3>Validity</h3>
<p>Are statements in the paper correct?</p>
<ul style="font-size: 50%">
<li>System design/Algorithms <ul>
<li>Unexpected runtime costs.</li>
<li>Unhandled corner cases.</li>
</ul></li>
<li>Experiments <ul>
<li>Experimental design doesn't test what the authors claim.</li>
<li>Results don't agree with author's claims.</li>
</ul></li>
<li>Proofs <ul>
<li>Sanity check</li>
</ul></li>
</ul>
<p class="fragment">How would you solve the problem the authors pose yourself?</p>
</section>
<section>
<h3>Readability</h3>
<p>Is the paper written clearly</p>
<ul>
<li>There is a clearly stated problem / objective.</li>
<li>All background topics not covered in [grad level class] are outlined.</li>
<li>System design and formalisms are clear, precise, and complete.</li>
<li>The paper is free of English bugs, grammar bugs, typos, etc...</li>
</ul>
<p class="fragment">Do you understand the paper?</p>
</section>
<section>
<h3>Reading the Paper: Milestones</h3>
<dl>
<dt>Introduction</dt>
<dd>Do I understand the problem the paper is solving?</dd>
<dd>How would I go about solving the problem the paper outlines?</dd>
<dd>How would I go about measuring a solution to this problem?</dd>
<dt>Background</dt>
<dd>Do I have a reasonable understanding of the techniques the paper plans to use?</dd>
</dl>
</section>
<section>
<h3>Reading the Paper: Milestones</h3>
<dl>
<dt>Algorithm/Data Structures</dt>
<dd>Do I understand the approach the authors are taking?</dd>
<dd>If the approach doesn't line up with my own, why?</dd>
<dt>Experiments</dt>
<dd>Are all of my expected experiments from earlier addressed?</dd>
<dd>Do the experiments measure what the authors want to measure?</dd>
<dd>Are the datasets reasonable/representative of the motivating workloads?</dd>
<dd>Do the graphs support the paper's claims?</dd>
</dl>
</section>
<section>
<h3>Feedback</h3>
<p>Be Specific</p>
<ul>
<li>Communicate to the authors at least one way to address your concern.</li>
<li>Establish a clear metric that can test whether the concern is addressed.</li>
<li>Include citations where possible.</li>
<li>Differentiate suggestions from criticism.</li>
<li>Refer to specific lines of text.</li>
</ul>
</section>
<section>
<h3>Receiving Feedback</h3>
<p>Why did the reviewer write it?</p>
<ul>
<li>Did the reviewer misunderstand what you wrote?
<div class="fragment">What can you do to make it clearer?</div></li>
<li>Did the reviewer ignore an important point?
<div class="fragment">What can you do to make it more visible?</div></li>
<li>Did the reviewer not have the right background?
<div class="fragment">What can you do to make it more accessible?</div></li>
<li>Did the reviewer disagree with the motivation?
<div class="fragment">"Pitch" the motivation to your fellow students/faculty.</div></li>
</ul>
</section>
<section>
<p>The reviewer is a sample of the people who will be reviewing your paper. They may be wrong, but you still need to communicate to others like them to get an accept!</p>
</section>

View File

@ -0,0 +1,491 @@
---
template: templates/talk_slides_v1.erb
title: "CSE 501: Microkernel Notebooks"
---
<section>
<h2>Microkernel Notebooks</h2>
<h4 style="margin-top: 20px;">Oliver Kennedy</h4>
<p style="font-size: 70%; width: 730px; margin-right: auto; margin-left: auto; margin-top: 100px;" >
<a href="https://vizierdb.info">
<img src="graphics/logos/vizier-blue.svg" height="70px" style="float: left; margin-right: 20px; vertical-align: middle;" />
</a>
Boris Glavic, Juliana Freire, Michael Brachmann, William Spoth, Poonam Kumari, Ying Yang, Su Feng, Heiko Mueller, Aaron Huber, Nachiket Deo, and many more...</p>
</section>
<section>
<section>
<h2>But first...</h2>
</section>
<section>
<h3>Databases?</h3>
<img src="graphics/2022-04-02/er-diagrams.png" height="400px">
</section>
<section>
<img src="graphics/clipart/Female-or-Male-Unisex-Geek-or-Nerd-Light-Skin.svg" height="400px">
<attribution><a href="https://openclipart.org/">openclipart.org</a></attribution>
</section>
<section>
<h2>Data Structures?</h2>
<img src="graphics/2022-04-02/250-textbook.png" height="300px">
</section>
<section>
<img src="graphics/2022-04-02/Macintosh_classic_250.jpg" height="400px">
<attribution>Adapted from <a href="http://creativecommons.org/licenses/by-sa/3.0/" title="Creative Commons Attribution-Share Alike 3.0">CC BY-SA 3.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=10101">Wikimedia Commons</a></attribution>
</section>
<section>
<img src="graphics/2022-04-02/nuclear_dino_db.png">
<attribution>Adapted from <a href="https://www.destroyallsoftware.com/talks/wat">Wat; Gary Bernhardt @ CodeMash 2012</a>
</section>
<section>
<h3>CSE 562; Database Systems</h3>
<ul>
<li>A bit of operating systems</li>
<li>A bit of hardware</li>
<li>A bit of compilers</li>
<li>A bit of distributed systems</li>
</ul>
<p class="fragment" style="font-weight: bold; margin-top: 50px">Applied Computer Science</p>
</section>
<section>
<img src="graphics/2022-04-02/db-convergence.svg">
</section>
</section>
<section>
<section>
<h3>For example...</h3>
</section>
<section>
<pre><code class="sql">
CREATE VIEW salesSinceLastMonth AS
SELECT l.*
FROM lineitem l, orders o
WHERE l.orderkey = o.orderkey
AND o.orderdate > DATE(NOW() - '1 Month')
</code></pre>
<pre><code class="sql">
SELECT partkey FROM salesSinceLastMonth
ORDER BY shipdate DESC LIMIT 10;
</code></pre>
<pre><code class="sql">
SELECT suppkey, COUNT(*)
FROM salesSinceLastMonth
GROUP BY suppkey;
</code></pre>
<pre><code class="sql">
SELECT DISTINCT partkey
FROM salesSinceLastMonth
</code></pre>
</section>
<section>
<pre><code class="python">
def really_expensive_computation():
return [
expensive_computation(i)
for i in range(1, 1000000):
if expensive_test(i)
]
</code></pre>
<pre><code class="python">
print(sorted(really_expensive_computation())[:10])
</code></pre>
<pre><code class="python">
print(len(really_expensive_computation()))
</code></pre>
<pre><code class="python">
print(set(really_expensive_computation()))
</code></pre>
</section>
<section>
<pre><code class="python">
def really_expensive_computation():
return [
expensive_computation(i)
for i in range(1, 1000000):
if expensive_test(i)
]
view = really_expensive_computation()
</code></pre>
<pre><code class="python">
print(sorted(view)[:10])
</code></pre>
<pre><code class="python">
print(len(view))
</code></pre>
<pre><code class="python">
print(set(view))
</code></pre>
</section>
<section>
<p><b>Opportunity:</b> Views are queried frequently</p>
<p><b>Idea: </b> Pre-compute and save the views contents!</p>
</section>
<section>
<p>Btw... this idea is the essence of CSE 250.</p>
</section>
<section>
<svg data-src="graphics/2022-04-02/DBToQ.svg" />
<attribution>openclipart.org</attribution>
</section>
<section>
<p>When the base data changes, <br/>the view needs to be updated too!</p>
</section>
<section>
<pre><code class="python">
def init():
view = query(database)
</code></pre>
<p style="margin-top: 100px;">Our view starts off initialized</p>
</section>
<section>
<p style="margin-top: 100px;"><b>Idea:</b> Recompute the view from scratch when data changes.</p>
</section>
<section>
<pre><code class="python">
def update(changes):
database = database + changes
view = query(database) # includes changes
</code></pre>
</section>
<section>
<img src="graphics/clipart/Snail.jpg" height="400px">
<attribution><a href="http://creativecommons.org/licenses/by-sa/3.0/" title="Creative Commons Attribution-Share Alike 3.0">CC BY-SA 3.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=95926">Wikimedia Commons</a></attribution>
</section>
<section>
<pre><code class="python">
def update(changes):
view = delta(query, database, changes)
database = database + changes
</code></pre>
<table style="margin-top: 50px;">
<tr class="fragment">
<td style="font-size: 150%;"><tt>delta</tt></td>
<td>(ideally) Small &amp; fast query</td>
</tr>
<tr class="fragment">
<td style="font-size: 150%;"><tt>+</tt></td>
<td>(ideally) Fast "merge" operation</td>
</tr>
</table>
</section>
<section>
<h3>Intuition</h3>
<div>
$$\mathcal{D} = \{\ 1,\ 2,\ 3,\ 4\ \} \hspace{1in} \Delta\mathcal{D} = \{\ 5\ \}$$
$$Q(\mathcal D) = \texttt{SUM}(\mathcal D)$$
</div>
<div style="margin-top: 50px;">
<div class="fragment">$$ 1 + 2 + 3 + 4 + 5 $$</div>
<div class="fragment">$Q(\mathcal D+\Delta\mathcal D)$ <span class="fragment">$\sim O(|\mathcal D| + |\Delta\mathcal D|)$</span></div>
</div>
<div style="margin-top: 50px;">
<div class="fragment">$10$<span class="fragment">$+ 5$</span></div>
<div class="fragment">$\texttt{VIEW} + SUM(\Delta\mathcal D)$ <span class="fragment">$\sim O(|\Delta\mathcal D|)$</span></div>
</div>
</section>
<section>
<img src="graphics/2022-04-02/morpheus.jpeg">
<attribution>©1999 Warner Bros. Pictures</attribution>
</section>
</section>
<section>
<h6 style="font-size: 60%" class="fragment" data-fragment-index="2">Get off my database's lawn, punk kids</h6>
<h4 class="fragment" data-fragment-index="1"><span class="fragment strike" data-fragment-index="2">Why Jupyter Sucks</span></h4>
<h2 class="fragment strike" data-fragment-index="1">Microkernel Notebooks</h2>
<h4 style="margin-top: 20px;">Oliver Kennedy</h4>
<p style="font-size: 70%; width: 730px; margin-right: auto; margin-left: auto; margin-top: 100px;" >
<a href="https://vizierdb.info">
<img src="graphics/logos/vizier-blue.svg" height="70px" style="float: left; margin-right: 20px; vertical-align: middle;" />
</a>
Boris Glavic, Juliana Freire, Michael Brachmann, William Spoth, Poonam Kumari, Ying Yang, Su Feng, Heiko Mueller, Aaron Huber, Nachiket Deo, and many more...</p>
</section>
<section>
<section>
<img src="graphics/logos/jupyter.svg" height="500px">
<img src="graphics/2022-04-02/jupyter.png" height="500px">
</section>
<section>
<pre><code class="python">
import pandas as pd
</code></pre>
<pre class="fragment"><code class="python">
df = pd.read_csv("AMS-USDA-Directories-FarmersMarkets.csv")
df
</code></pre>
<img class="fragment" src="graphics/2022-04-02/jupyter-table.png" width="800px">
<pre class="fragment"><code class="python">
df.groupby("County").count()
</code></pre>
<p class="fragment">...</p>
</section>
<section>
<img src="graphics/2022-04-02/oz.jpeg" height="500px">
<attribution>©1939 Metro-Goldwyn-Mayer</attribution>
</section>
<section>
<img src="graphics/2022-04-02/oz_curtain.jpeg" height="500px">
<attribution>©1939 Metro-Goldwyn-Mayer</attribution>
</section>
<section>
<console>
Python 3.9.7 (default, Sep 10 2021, 14:59:43)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> <span class="fragment">import pandas as pd
>>> </span><span class="fragment">df = pd.read_csv("AMS-USDA-Directories-FarmersMarkets.csv")
>>> df<span>
<div class="fragment"> FMID MarketName ... WildHarvested updateTime
0 1000519 Alexandria Bay Farmers Market ... N 2/1/2021 11:02:22 AM
1 1021329 Aurora Farmers Market ... Y 1/30/2021 6:24:08 PM
2 1002064 Belmont Farmers Market ... N 1/27/2021 9:03:15 PM
3 1021262 Broome County Regional Farmers Market ... Y 1/5/2021 10:02:05 AM
4 1021202 Canal Village Farmers' Market ... N 9/9/2020 7:55:23 PM
.. ... ... ... ... ...
82 1020021 Waterloo Rotary Farm Market ... N 8/3/2020 2:28:33 PM
83 1000384 Webster's Joe Obbie Farmers' Market, Inc. ... N 1/5/2021 10:18:30 AM
84 1002177 West Point-Town of Highlands Farmers Market ... N 8/2/2018 12:58:13 AM
85 1019038 Woodstock Farm Festival ... N 4/4/2018 11:27:02 AM
86 1007259 Yates County Cooperative Farm and Craft Market... ... N 2/3/2019 12:29:07 PM
[87 rows x 59 columns]
>>> </div>
</console>
</section>
<section>
<p>Cells are code snippets that get pasted into a long running <b>kernel</b></p>
</section>
<section>
<img src="graphics/2022-04-02/joelgrus.png" height="500px">
<attribution><a href="https://www.youtube.com/watch?v=7jiPeIFXb6U">I don't like notebooks.- Joel Grus (Allen Institute for Artificial Intelligence)</a></attribution>
</section>
<section>
<img src="graphics/2022-04-02/joelgrus_hiddenstate.png" height="500px">
<attribution><a href="https://www.youtube.com/watch?v=7jiPeIFXb6U">I don't like notebooks.- Joel Grus (Allen Institute for Artificial Intelligence)</a></attribution>
</section>
<section>
<img src="graphics/2022-04-02/joelgrus_y_is_5.png" height="300px">
<attribution><a href="https://www.youtube.com/watch?v=7jiPeIFXb6U">I don't like notebooks.- Joel Grus (Allen Institute for Artificial Intelligence)</a></attribution>
</section>
<section>
<p>Evaluation Order ≠ Notebook Order</p>
<p style="margin-top: 100px; font-weight: bold;" class="fragment">... but why?</p>
</section>
<section>
<h3>In a monokernel...</h3>
</section>
<section>
<pre><code class="python">
import pandas as pd
</code></pre>
<pre class="fragment"><code class="python">
df = pd.read_csv("really_big_dataset.csv")
</code></pre>
<pre class="fragment"><code class="python">
test = df.iloc[:800]
train = df.iloc[800:]
</code></pre>
<pre class="fragment"><code class="python">
model = train_linear_regression(train, "target")
</code></pre>
<pre class="fragment"><code class="python">
evaluate_linear_regresion(model, test, "target")
</code></pre>
</section>
<section>
<pre><code class="python">
import pandas as pd
</code></pre>
<pre><code class="python">
df = pd.read_csv("really_big_dataset.csv")
</code></pre>
<pre style="box-shadow: 0px 0px 12px red; "><code class="python">
test = df.iloc[:500]
train = df.iloc[500:]
</code></pre>
<pre class="fragment red-shadow-current" data-fragment-index="1"><code class="python">
model = train_linear_regression(train, "target")
</code></pre>
<pre class="fragment red-shadow-current" data-fragment-index="1"><code class="python">
evaluate_linear_regresion(model, test, "target")
</code></pre>
</section>
<section>
<h3>Q1: Which cells need to be re-evaluated?</h3>
<p style="margin-top: 100px; font-weight: bold;" class="fragment">Idea 1: All of them!</p>
</section>
<section>
<pre><code class="python">
import pandas as pd
df = pd.read_csv("really_big_dataset.csv")
test = df.iloc[:500]
train = df.iloc[500:]
model = train_linear_regression(train, "target")
evaluate_linear_regresion(model, test, "target")
</code></pre>
</section>
<section>
<img src="graphics/clipart/Snail.jpg" height="400px">
<attribution><a href="http://creativecommons.org/licenses/by-sa/3.0/" title="Creative Commons Attribution-Share Alike 3.0">CC BY-SA 3.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=95926">Wikimedia Commons</a></attribution>
</section>
<section>
<p>... but <tt>df</tt> is still around, and you can "re-use" it.</p>
<p>Idea 2: Skip cells that haven't changed.</p>
<p style="margin-top: 100px; font-weight: bold;" class="fragment">... but <u class="fragment highlight-red">you</u> need to keep track of this.</p>
</section>
</section>
<section>
<section>
<p>Idea 3: Pull out your CSE 443 Textbooks</p>
</section>
<section>
<svg data-src="graphics/2022-04-02/data-flow-simple.svg" height="500px" />
</section>
<section>
<h2>Data Flow Graph</h2>
<p>Cell 3 changed, so re-evaluate only cells 4 and 5</p>
<p style="margin-top: 100px; font-weight: bold;" class="fragment">... but</p>
</section>
<section>
<h2>...</h2>
<pre><code class="python">
model = train_linear_regression(train, "target")
</code></pre>
<pre><code class="python">
evaluate_linear_regresion(model, test, "target")
</code></pre>
<pre class="fragment"><code class="python">
df = pd.read_csv("another_really_big_dataset.csv")
test = df.iloc[:500]
train = df.iloc[500:]
</code></pre>
<p style="margin-top: 100px; font-weight: bold;" class="fragment"><tt>df</tt> has changed!</p>
</section>
<section>
<p>We want to "snapshot" <tt>df</tt> in between cells.</p>
</section>
<section>
<svg data-src="graphics/2022-04-02/monokernel.svg" height="350px" />
</section>
<section>
<svg data-src="graphics/2022-04-02/microkernel.svg" height="350px" />
</section>
<section>
<p>The kernel runs, snapshots its variables, and quits.</p>
</section>
<section>
<svg data-src="graphics/2022-04-02/microkernel-invalidate.svg" height="500px" />
</section>
<section>
<h2>Microkernel Notebooks</h2>
<ul>
<li>Lots of small "micro-kernels"</li>
<li>Explicit inter-cell messaging</li>
<li>Messsages are snapshotted for re-use</li>
</ul>
</section>
<section>
<svg data-src="graphics/2022-04-02/microkernel-multiarch.svg" height="500px" />
</section>
<section>
<svg data-src="graphics/2022-04-02/microkernel-parallelism.svg" height="500px" />
</section>
<section>
<svg data-src="graphics/2022-04-02/microkernel-parallelism-2.svg" height="500px" />
</section>
<section>
<h2>Demo</h2>
<a href="https://vizierdb.info">
<img src="graphics/logos/vizier-blue.svg" height="200px" />
</a>
</section>
<section>
<a href="https://vizierdb.info">
<img src="graphics/logos/vizier-blue.svg" height="200px" />
</a>
<p><a href="https://vizierdb.info">https://vizierdb.info</a></p>
<p><a href="https://github.com/VizierDB/vizier-scala">https://github.com/VizierDB/vizier-scala</a></p>
</section>
</section>

View File

@ -31,6 +31,14 @@ schedule:
talks:
- speaker: Oliver Kennedy
topic: "How to <s>Write</s> Review a Paper"
materials:
talk: /talks/2022-09-20-501-HowToReview.html
- speaker: Oliver Kennedy
topic: "Microkernel Notebooks"
materials:
talk: /talks/2022-09-20-501-Vizier.html
- speaker: Erdem Sariyuce
topic: "TBD"
- date: 09/27/22
talks:
- speaker: Zhuoyue Zhao