Website/slides/talks/2020-2-WorkingWithCSE/index.html
2020-02-18 10:13:02 -05:00

499 lines
19 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>How to Start Collaborating with CSE to Solve Global Health Problems</title>
<meta name="description" content="How to Start Collaborating with CSE to Solve Global Health Problems">
<meta name="author" content="Oliver Kennedy">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="../../reveal.js-3.7.0/css/reveal.css">
<link rel="stylesheet" href="ubodin.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="../../reveal.js-3.7.0/lib/css/zenburn.css">
<style type="text/css">
.reveal .slides section .fragment.growbig {
opacity: 1;
visibility: inherit; }
.reveal .slides section .fragment.growbig.visible {
-webkit-transform: scale(7);
transform: scale(7); }
</style>
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? '../../reveal.js-3.7.0/css/print/pdf.css' : '../reveal.js-3.7.0/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="../reveal.js-3.5.0/lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<div class="header">
<!-- Any Talk-Specific Header Content Goes Here -->
<center>
<a href="http://www.buffalo.edu" target="_blank">
<img src="../graphics/logos/ub-1line-ro-white.png" height="20"/>
</a>
</center>
</div>
<div class="footer">
<!-- Any Talk-Specific Footer Content Goes Here -->
<div style="float: left; margin-top: 15px; ">
Exploring <u><b>O</b></u>nline <u><b>D</b></u>ata <u><b>In</b></u>teractions
</div>
<a href="https://odin.cse.buffalo.edu" target="_blank">
<img src="../graphics/logos/odin-1line-white.png" height="40" style="float: right;"/>
</a>
</div>
<div class="slides">
<!-- Any section element inside of this container is displayed as a slide -->
<section>
<h3>How to Start Collaborating with CSE to Solve Global Health Problems</h3>
<h4>Oliver Kennedy</h4>
</section>
<section>
<ol>
<li>CSE: Research vs Implementation?</li>
<li>Research Highlights.</li>
<li>Behind the Buzzwords.</li>
<li>CSE Resources for Your Benefit.</li>
</ol>
</section>
<section>
<!--
<section>
<p style="font-family: serif">Computer science is not about machines, in the same way that astronomy is not about telescopes.</p>
<p>Michael R. Fellows (often attributed to Edward Dijkstra)</p>
</section>
-->
<section>
<h3>Computer Science &amp; Engineering</h3>
<ul>
<li>How hard is a particular type of problem to solve?</li>
<li>Can a particular solution be scaled to bigger problems?</li>
<li>How do I solve problems that fit a particular pattern?</li>
</ul>
</section>
<section>
<h3>Abstraction</h3>
<p>CSE research is about creating <u><b>general</b></u> solutions</p>
<p class="fragment">(motivated by specific problems)</p>
</section>
<section>
<h3>CSE Publication</h3>
<p>How does the new solution differ? Is it...</p>
<ul>
<li>... more general?</li>
<li>... more efficient/reliable?</li>
<li>... more scalable?</li>
<li>... easier to use?</li>
</ul>
</section>
<section>
<p>Do you need <u>research</u> or <u>implementation</u>?</p>
<p class="fragment" style="margin-top: 50px;">UB-CSE can help with both, <br/>but each takes a different approach.</p>
<p class="fragment">Have clear, well-defined parameters for the problem.</p>
</section>
<section>
<h3>Typical Implementation Problems</h3>
<ul>
<li>Organizing data into a database.</li>
<li>A mobile app to display information.</li>
<li>Making an R script run faster.</li>
</ul>
</section>
<section>
<h3>Implementation Resources</h3>
<p>CSE-611</p>
<p>Undergraduate Research</p>
<p>Invenst</p>
</section>
<section>
<h3>But...</h3>
<p class="fragment">Implementation often inspires research topics.<br/>(Ailamaki's 7 month rule)</p>
<p class="fragment">Implementation may synergize with existing research.<br/>(e.g., Motivating use cases)</p>
<p class="fragment">(so talking to a friendly CSE faculty can still be useful)</p>
</section>
<!--
<section>
<h3>Automating Repetitive Tasks</h3>
<p>It's more likely to be research when...</p>
<ul>
<li>... the task requires at least some human intuition.</li>
<li>... the task is "hard" for computers (images, video, audio, prose).</li>
<li>... the task requires a lot (10+ TB) of data.</li>
<li>... the task is hard to describe precisely (outlier detection).</li>
</ul>
</section>
<section>
<h3>Solving Bigger Problems Faster</h3>
<p>It's more likely to be research when...</p>
<ul>
<li>... the task requires a lot (10+ TB) of data.</li>
<li>... the task has extremely tight time constraints.</li>
<li>... your code works well for smaller problems (fewer variables, less data)</li>
<li>... the task is naturally computationally complex.</li>
</ul>
</section>
<section>
<h3>Solved Problems on Different Tech.</h3>
<p>It's more likely to be research when...</p>
<ul>
<li>... the new technology is resource constrained (battery, cpu, memory).</li>
<li>... the new technology uses a different interface.</li>
<li>... the new technology violates assumptions made by existing solutions.</li>
<li>... the new technology makes something an existing solution does easier.</li>
</ul>
</section>
-->
</section>
<section>
<section>
<h3>Research Highlights</h3>
<ul>
<li>Reproducible Datasets (Oliver Kennedy)</li>
<li>Wireless Sensor Networks (Chang Wen Chen)</li>
</ul>
</section>
<section>
<h2>Reproducible Datasets</h2>
<h3>Oliver Kennedy</h3>
</section>
<section>
<h3>Data Errors Suck</h3>
<img src="images/data_error.png">
<attribution><a href="https://xkcd.com/2239/">https://xkcd.com/2239/</a></attribution>
</section>
<section>
<p>
<span class="fragment">
<img src="images/female-computer-user.svg" height="70px" style="vertical-align: middle;"/>
<span style="vertical-align: middle; padding-left: 70px; padding-right: 70px"></span>
</span>
<img src="images/db.svg" height="70px" style="vertical-align: middle;"/>
<span style="vertical-align: middle; padding-left: 70px; padding-right: 70px"></span>
<img src="images/male-computer-user.png" height="70px" style="vertical-align: middle;"/>
</p>
<p class="fragment">
<span style="margin-right: 250px; vertical-align: middle;"></span>
<span style="margin-left: 250px; vertical-align: middle;"></span>
<br/>
<span style="margin-right: 100px; vertical-align: middle;">Assumption</span>
<span style="font-size: 300%; vertical-align: middle;" class="fragment"></span>
<span style="margin-left: 100px; vertical-align: middle;">Assumption</span>
</p>
<attribution>freesvg.org</attribution>
</section>
<section style="top: 121px; display: block;" class="" aria-hidden="true">
<h3>Assumptions?</h3>
<ol style="font-size: 70%">
<li>"This outlier is actually a data error"</li>
<li>"There will always be six values in this column"</li>
<li>"The correct fix is to delete erroneous records"</li>
<li>"Unparseable values should be treated as NULL"</li>
<li>"Nobody will analyze this portion of the dataset"</li>
<li>"These subjective field observations are correct"</li>
</ol>
<p class="fragment">Alice needs to document each and every assumption.</p>
<p class="fragment">Bob needs to understand the implications<br/>on every part of his analysis.</p>
</section>
<section>
<img src="images/montoya.jpeg" height="400px" />
<attribution>&copy; 20th Century Fox</attribution>
</section>
<section>
<h3>The Vizier Notebook</h3>
<img src="images/1.1.StagedExecution.png" height="400px">
</section>
<section>
<p>If you're using Python, R, SQL, or Jupyter, we can help...</p>
<ul>
<li>...improve your dataset documentation</li>
<li>...make your workflows more reproducible</li>
<li>...make your code faster</li>
</ul>
<p class="fragment">talk to me afterwards!</p>
</section>
</section>
<section>
<section>
<h2>Wireless Sensor Networks</h2>
<h3>Chang Wen Chen</h3>
</section>
<section>
<h3>IoT Devices Need To Communicate</h3>
<dl>
<dt>Cellular or LORA Networks</dt>
<dd>Each device talks to a tower <br>(reliable, but requires infrastructure)</dd>
<dt>Store and Collect</dt>
<dd>Each device stores data and is recovered <br>(no infrastructure, but requires physical visits)</dd>
<dt>Mesh Networks</dt>
<dd>Each device communicates via nearby devices <br>(bandwidth/power limited)</dd>
</dl>
</section>
<section>
<p>UB-CSE is at the forefront of wireless research</p>
</section>
<section>
<p><b>Application: </b>Monitoring the excessive antibiotic discharge into Missouri river from overdose usage as agricultural runoff</p>
<p>Problem: 1d "mesh" is even more limited</p>
<img src="images/changwen_1d_mesh.png">
</section>
</section>
<section>
<section>
<h2>Buzzwords </h2>
</section>
<!--
<section>
<p>
Accountants use ledgers for keeping track of accounts, inventory status, etc...
Ledgers have basic physical limitations (hard to erase pen marks, hard to insert new entries, only one person can write at a time).
These physical limitations go away when the ledger is on the computer.
Need to trust that everyone participating keeps "track changes" on.
</p>
</section>
<section>
<p>
Blockchain is a collection of techniques that enforce similar limitations in a digital setting <i>without requiring trust</i>.
If you trust whoever's running the computer to play by the rules (e.g., through personal trust, legal force, or crescent wrenches), you probably don't need a blockchain.
</p>
</section>
<section>
<p>As an aside, these techniques are extremely compute-intensive (and intentionally so). Estimates put the power use of Bitcoin alone at 30-60 TWh per year (back of the napkin: enough to power all homes in the US, 10-20 times over)</p>
</section>
-->
<section>
<h3>Neural Networks / Deep Learning</h3>
<div style="font-size: 80% ">
<p class="fragment">Linear Regression</p>
<p class="fragment"><br>Spline Fitting</p>
<p class="fragment"><br>Graphical Models</p>
<p class="fragment"><br>Neural Networks</p>
</div>
<p class="fragment">$y=f(x)$ where $f$ has 100s or 1000s (or more) DoF</p>
</section>
<section>
<dl>
<dt>The Good</dt>
<dd>• Feasible to fit very complex functions (e.g., face?)</dd>
<dd>• Minimal knowledge of problem structure required.</dd>
<dt>The Bad</dt>
<dd>• Need <b>huge</b> training data</dd>
<dd>• Very easy to overfit</dd>
<dd>• Not explainable (yet)</dd>
</dl>
</section>
<!--
<section>
<dl style="font-size: 70%">
<dt>Layer</dt>
<dd>A function that regresses N variables to predict M variables (typically N=M). </dd>
<dt>Neural Network</dt>
<dd>
A stack of 2 or more layers, with each network predicting its outputs from the (latent/hidden) variables produced by the previous layer
</dd>
<dt>Recurrent Neural Network (RNN)</dt>
<dd>A Neural Network for timeseries data (NN:RNN :: Markovian Variable:Markov Chain)</dd>
<dt>Convolutional Neural Network (CNN)</dt>
<dd>A Neural Network for image data designed to take advantage of the fact that a feature (like a face) is usually size, position, and/or rotationally invariant (you can make it bigger, move it around, or rotate it and it's still a face).</dd>
<dt>Generative Adversarial Network (GAN)</dt>
<dd>Black wizardry that can generate an image/audio in one "style" while retaining the key features of another. (e.g., Transform a video of a horse into one of a zebra)</dd>
</dl>
</section>
<section>
<h3>Cloud Computing</h3>
<p>
Rent a server from Amazon, Microsoft, or Google (or UB).
Basically, pay someone to do the managerial work of running/security/etc...
Also benefit by not having infrastructure (e.g., access to 100s of servers for precisely as long as it takes you to get your answer)
</p>
</section>
-->
<section>
<h3>Differential Privacy</h3>
<p>
A way to mathematically prove to yourself how much PII could leak if an aggregate dataset is released.
</p>
<p class="fragment">
"Can I create a statistically significant effect on the dataset by removing N individuals?"
</p>
</section>
<section>
<h3>Blockchain</h3>
<img src="images/blockchain.png">
<attribution><a href="https://xkcd.com/2267/">https://xkcd.com/2267/</a></attribution>
</section>
<section>
<h3>Oliver's Blockchain PSA</h3>
<ul>
<li>It is statistically unlikely that you need a blockchain.<br/>
<span class="fragment" style="font-size: 70%">(I've yet to see a use case apart from cryptocurrency that needs a blockchain)</span></li>
<li>Proof of work is a huge power sink.<br>
<span class="fragment" style="font-size: 70%">(Bitcoin alone <a href="https://digiconomist.net/bitcoin-energy-consumption">estimated</a> at 77 TWh this year ~= the power consumption of Chile)</span></li>
</ul>
<p class="fragment">Please consult a Doctor (of Philosophy in CS)<br> before starting a blockchain project</p>
</section>
<!--
<section>
<h3>Internet of Things</h3>
<p>
Cellphones are cheap. Cellular plans are cheap. Put cellphone radios (modems) in everything.
Cellphone + Sensors = tons of data about anything and everything that you want to measure.
</p>
</section>
-->
</section>
<section>
<section>
<h3>Resources</h3>
<ul style="font-size: 80%">
<li>CSE 611: <a href="https://invenst.cse.buffalo.edu/viewideabank.php">https://invenst.cse.buffalo.edu/viewideabank.php</a></li>
<li>Undergraduates: Any CSE Faculty</li>
<li>Invenst: Alan Hunt</li>
<li>Centers <ul>
<li>CUBS/CEDAR (image/video data)</li>
<li>CARA (structured data)</li>
<li>CMIF (multisource fusion)</li>
</ul></li>
</ul>
</section>
<section>
<h3>NSF CS+X Programs</h3>
<ul>
<li><b>CSSI</b>: Cyberinfrastructure for Sustained Scientific Innovation</li>
<li><b>SCH</b>: Smart and Connected Health</li>
<li><b>SCC</b>: Smart and Connected Communities</li>
</dl>
</section>
</section>
</div></div>
<script src="../reveal.js-3.5.0/lib/js/head.min.js"></script>
<script src="../reveal.js-3.5.0/js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/../reveal.js#configuration
Reveal.initialize({
controls: false,
progress: true,
history: true,
center: true,
slideNumber: true,
transition: 'fade', // none/fade/slide/convex/concave/zoom
// Optional ../reveal.js plugins
dependencies: [
{ src: '../../reveal.js-3.7.0/plugin/svginline/data-src-svg.js' },
{ src: '../reveal.js-3.5.0/lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: '../reveal.js-3.5.0/plugin/math/math.js',
condition: function() { return true; },
mathjax: '../reveal.js-3.5.0/js/MathJax.js'
},
{ src: '../reveal.js-3.5.0/plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: '../reveal.js-3.5.0/plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
//{ src: '../reveal.js-3.5.0/plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'tt code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../../reveal.js-3.7.0/plugin/highlight/highlight-9.16.2.js', async: true,
callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: '../reveal.js-3.5.0/plugin/zoom-js/zoom.js', async: true },
{ src: '../reveal.js-3.5.0/plugin/notes/notes.js', async: true }
]
});
</script>
<script>document.write('<script src="http://' + (location.host || 'localhost').split(':')[0] + ':35729/livereload.js?snipver=1"></' + 'script>')</script>
</body>
</html>