slides
|
@ -1,5 +1,24 @@
|
|||
[
|
||||
|
||||
{ "title" : "Collaborative Research: III: MEDIUM: U4U - Taming Uncertainty with Uncertainty-Annotated Databases",
|
||||
"agency" : "NSF: CISE: IIS: MEDIUM",
|
||||
"role" : "Co-PI",
|
||||
"amount" : 663996,
|
||||
"effort" : "100%",
|
||||
"status" : "submitted",
|
||||
"start" : "08/15/2020", "end" : "08/14/2024",
|
||||
"type" : "grant",
|
||||
"commitment" : { "summer" : 0.5 },
|
||||
"projects" : ["mimir", "vizier"],
|
||||
"copis" : [
|
||||
"Atri Rudra"
|
||||
],
|
||||
"collaborative" : [
|
||||
{ "institution" : "Illinois Inst. Tech.",
|
||||
"pis" : ["Boris Glavic"],
|
||||
"amount" : 535014
|
||||
}
|
||||
]
|
||||
},
|
||||
{ "title" : "SCC-IRG Track 1: A Sustainable and Connected Community-Scale Food System to Empower Consumers, Farmers, and Retailers in Buffalo, NY",
|
||||
"agency" : "NSF: CISE: CNS: CSR",
|
||||
"role" : "Co-I",
|
||||
|
|
|
@ -88,7 +88,7 @@ After the taking the course, students should be able to:
|
|||
* **Sep 17** - Program Slicing ([reading 1](https://cse.buffalo.edu/LRG/CSE705/Papers/Weiser-Static-Slicing.pdf) | [reading 2](http://sites.computer.org/debull/A07dec/cheney.pdf))
|
||||
* **Sep 19** - Learned Index Structures ([slides](slide/2019-09-18-LearnedIndexStructures.pdf))
|
||||
* **Sep 24** - Program Slicing ([slides](slide/2019-09-24-ProgramSlicing2.html))
|
||||
* **Oct 1** - Interpretable Deep Learning ([reading](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871))
|
||||
* **Oct 1** - Interpretable Deep Learning ([reading 1](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871) | [reading 2](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?Id=2939778))
|
||||
|
||||
---
|
||||
|
||||
|
|
451
src/teaching/cse-662/2019fa/slide/2019-10-01-ProvenanceInNNs.erb
Normal file
|
@ -0,0 +1,451 @@
|
|||
---
|
||||
template: templates/cse662_2019_slides.erb
|
||||
title: Explaining Machine Learning
|
||||
date: October 1
|
||||
---
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h3>Machine Learning</h3>
|
||||
<p>Given a set of sample points: discover their defining function</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Classical Regression</h3>
|
||||
<p>given $\{(x_1,y_1), \ldots, (x_K,y_K)\}$, find $f(x) = y$</p>
|
||||
<ul>
|
||||
<li class="fragment">Linear (Fit $m\cdot x + b$)</li>
|
||||
<li class="fragment">Polynomial (Fit $m_0\cdot x^0+m_1\cdot x^1+\ldots+m_N\cdot x^N$)</li>
|
||||
<li class="fragment">Logarithmic (Fit $a\cdot \log(x+b) + c$)</li>
|
||||
<li class="fragment">Multidimensional Variants (etc...)</li>
|
||||
</ul>
|
||||
<p class="fragment">Fit data to a predefined equation / family of equations.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Limitations</h3>
|
||||
<ul>
|
||||
<li>Need to select equation(s) to fit upfront.</li>
|
||||
<li>Difficult to cope with discontinuities.</li>
|
||||
<li>Simple equations have very few parameters.</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Neural Networks</h3>
|
||||
<img style="float: left" width="300px" src="graphics/2019-10-01-BasicNN.svg" />
|
||||
|
||||
$$z = RELU(v_{z,1}y_1 + ... + v_{z,7}y_7)$$
|
||||
$$y_i = RELU(v_{y,i,1}x_1 + ... + v_{y,i,7}x_7)$$
|
||||
$$x_i = RELU(v_{x,i,1}w_1 + ... + v_{x,i,10}w_10)$$
|
||||
|
||||
<p class="fragment">$RELU$: Discontinuity Function</p>
|
||||
<p class="fragment">One big mega-function to fit to the data</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>More complex flavors</h3>
|
||||
<dl>
|
||||
<div class="fragment">
|
||||
<dt>Convolutional NNs</dt>
|
||||
<dd>Train pattern-matching kernels for images.</dd>
|
||||
</div>
|
||||
<div class="fragment">
|
||||
<dt>Recurrent NNs</dt>
|
||||
<dd>Feed output of NN back into itself for time-series data.</dd>
|
||||
</div>
|
||||
</dl>
|
||||
|
||||
<p class="fragment"><b>In general: </b>Pre-specify layering structure as a NN workflow.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h3>Today</h3>
|
||||
<p>Not looking at how to train a NN!</p>
|
||||
<p class="fragment">Why are NNs dangerous?</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>NN Use Cases</h3>
|
||||
<ul>
|
||||
<li class="fragment">Sentencing / Parole decisions</li>
|
||||
<li class="fragment">Loan decisions</li>
|
||||
<li class="fragment">Housing/Employment opportunities</li>
|
||||
</ul>
|
||||
|
||||
<p class="fragment">Why does it make the choices it does?</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Why is "why" important?</h3>
|
||||
<dl>
|
||||
<div class="fragment">
|
||||
<dt>Legal reasons</dt>
|
||||
<dd>Protected classes or variables that can act as proxies (redlining).</dd>
|
||||
</div>
|
||||
<div class="fragment">
|
||||
<dt>Training-only signals</dt>
|
||||
<dd>Image quality (tanks), Coincidental features (dogs).</dd>
|
||||
</div>
|
||||
<div class="fragment">
|
||||
<dt>Debugging</dt>
|
||||
<dd>Anticipate, react to unusual failure modes.</dd>
|
||||
</div>
|
||||
<div class="fragment">
|
||||
<dt>Overfitting</dt>
|
||||
<dd>Is the model just "memorizing" the training data?</dd>
|
||||
</div>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Contrast with...</h3>
|
||||
<dl>
|
||||
<dt class="fragment">Classical Regression</dt>
|
||||
<dd class="fragment">The formula is the formula!</dd>
|
||||
|
||||
<dt class="fragment">Graphical Models</dt>
|
||||
<dd class="fragment">Model structure clearly connects related variables.</dd>
|
||||
|
||||
<dt class="fragment">Decision Trees</dt>
|
||||
<dd class="fragment">Everything is phrased as Yes/No questions.</dd>
|
||||
|
||||
<dt class="fragment">Neural Networks</dt>
|
||||
<dd class="fragment">🤷</dd>
|
||||
</dl>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h3>Understanding NNs</h3>
|
||||
<p>Today's discussion:</p>
|
||||
<dl>
|
||||
<dt>Slicing?</dt>
|
||||
<dd>Not practical: Too many interdependencies.</dd>
|
||||
<dt>Sensitivity Analysis</dt>
|
||||
<dd>Figure out which "features" are the most relevant.</dd>
|
||||
<dt>Direct Debugging</dt>
|
||||
<dd>Help the user discover patterns in the neurons.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Understanding NNs</h3>
|
||||
<p></p>
|
||||
<p><b>Observation 1:</b> Describing the <i>entire</i> model concisely is hard</p>
|
||||
<p><b>Observation 2:</b> Describing the model on a single input is easier</p>
|
||||
<p><b>Contrast with dynamic slicing</b></p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h3>Sensitivity Analysis</h3>
|
||||
|
||||
<p>Given a target point, figure out which of the point's features are <i>most</i> responsible for the classification.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<p><b>“Why Should I Trust You?” Explaining the Predictions of Any Classifier</b></p>
|
||||
<p>Ribero, Sing, Guestrin</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Model</h3>
|
||||
$$f : \mathbb R^d \rightarrow \mathbb R$$
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Model</h3>
|
||||
<p>Arguments to $f$ are input <i>features</i>. For example:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Probability that an email is spam</dt>
|
||||
<dd>Each known word is a binary (0-1) feature</dd>
|
||||
<dt>0-1 confidence that a greyscale image contains a "3"</dt>
|
||||
<dd>The brightness of each pixel is a feature.</dd>
|
||||
<dt>Chance of default on a potential loan</dt>
|
||||
<dd>Customer characteristics (income, spending, education)</dd>
|
||||
</ul>
|
||||
|
||||
<p><b>Note:</b> The source model doesn't have to be an NN.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h2>Abstractions</h2>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<p>Focusing on <u>similar inputs</u>, learn an <u>explainable</u> model $g$.</p>
|
||||
|
||||
<p>Define similarity by a distance function: </p>
|
||||
$$\pi_x : R^d \times R^d \rightarrow [0,1]$$
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Desiderata</h3>
|
||||
|
||||
<ul>
|
||||
<li>$g$ should be simple (complexity function $\Omega(g)$</li>
|
||||
<li>$g$ should be similar to $f$ on values similar to $x$ (error function $\mathcal L(f, g, \pi_x)$</li>
|
||||
</ul>
|
||||
<p><b>Overall Goal: </b> Find a $g$ that minimizes $\mathcal L(f, g, \pi_x) + \Omega(g)$</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Simpler Model</h3>
|
||||
$$g : \mathbb \{0,1\}^d \rightarrow \mathbb R$$
|
||||
$$g(x) = w_1\cdot x_1 + \ldots + w_d\cdot x_d$$
|
||||
<p class="fragment">Boolean- (not Real-)valued features.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Simplification: Thresholding</h3>
|
||||
$$x'_i = \begin{cases} 1 & \textbf{if } x_i \geq T \\ 0 & \textbf{otherwise} \end{cases}$$
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Sampling Around $x$</h3>
|
||||
<ol>
|
||||
<li class="fragment">Pick a number of features $\# \sim Uniform(1, |\{x'_i \neq 0\}|)$</li>
|
||||
<li class="fragment">Randomly select features $F \subseteq \{i \;|\;x'_i \neq 0\}$ s.t. $|F| = \#$</li>
|
||||
<li class="fragment">Pick $z$ (resp., $z'$) s.t. $z_i = \begin{cases} x_i & \textbf{if } i \in F \\ \textit{random} & \textbf{otherwise}\end{cases}$</li>
|
||||
<li class="fragment">Repeat $K$ times to get a collection of samples $(z,z') \in \mathcal Z$</li>
|
||||
</ol>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Simple Error Function</h3>
|
||||
$$\mathcal L(f,g,\pi_x) = \sum_{(z,z')\in \mathcal Z} \pi_x(z) (f(z) - g(z'))^2$$
|
||||
<p>Pick a $g$ (resp., $\{w_i\}$) that minimizes this!</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Example</h3>
|
||||
<h4>Classifying Email Subjects</h4>
|
||||
|
||||
<p>Each feature is a word: '1' if the word is present, '0' if not. <br/> ($x = x'$)</p>
|
||||
|
||||
<p><b>Complexity Function: </b> (at most $K$ features)
|
||||
$\Omega(g) = \begin{cases} 0 & \textbf{if }|\{w_i > 0\}| > K \\ \infty & \textbf{otherwise}\end{cases}$
|
||||
</p>
|
||||
|
||||
<p><b>Simplified model</b>: Find the $K$ features <i>most</i> responsible for differentiating between the target class.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<img src="graphics/2019-10-01-Weights.svg" height="400px"/>
|
||||
<attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Example</h3>
|
||||
<h4>Interpreting Image Classification</h4>
|
||||
|
||||
<p>Each feature is a "superpixel" (a contiguous region of similarly colored pixels): $x_i$ is the color of the pixel; $x_i'$ is 1 if the superpixel is identical to the original.</p>
|
||||
|
||||
<p><b>Complexity Function: </b> (at most $K$ superpixels)
|
||||
"Lasso" (as in the lasso tool) pixels together into a contiguous region of no more than $K$ superpixels.
|
||||
</p>
|
||||
|
||||
<p><b>Simplified model</b>: Find the $K$ superpixels <i>most</i> responsible for differentiating between the target class.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<img src="graphics/2019-10-01-Images.svg" height="400px"/>
|
||||
<attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Summary</h3>
|
||||
<ul>
|
||||
<li>Generate a set of "similar" samples</li>
|
||||
<li>See how those samples behave on the model</li>
|
||||
<li>Train a simpler model $g$ to pick the most relevant features.</li>
|
||||
</ul>
|
||||
|
||||
<p>If the model has the features identified by $g$, it is probably of the target class.</p>
|
||||
</section>
|
||||
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<p><b>ACTIVIS: Visual Exploration of Industry-Scale
|
||||
Deep Neural Network Models</b></p>
|
||||
<p>Kahng, Andrews, Kalro, Chau</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Shortcomings</h3>
|
||||
<ul>
|
||||
<li>Tells <i>what</i> the network is doing, but not why.</li>
|
||||
<li>Of limited use for understanding inner layers of an NN.</li>
|
||||
<li>Too automated: Hard to test hypotheses.</li>
|
||||
</ul>
|
||||
<p><b>Solution: </b> Let users see the nodes themselves.</p>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<img src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Desiderata</h3>
|
||||
<dl>
|
||||
<dt>Can't see what the network is doing.</dt>
|
||||
<dd>Let users <i>compare</i> neuron activations.</dd>
|
||||
|
||||
<dt>Can't test hypotheses</dt>
|
||||
<dd>Let users <i>pick</i> comparison points.</dd>
|
||||
</dl>
|
||||
</section>
|
||||
|
||||
|
||||
|
||||
<section>
|
||||
<h3>Exploring comparison points</h3>
|
||||
<ul>
|
||||
<li>Compare by class / correctness</li>
|
||||
<li>Compare by activation</li>
|
||||
<li>Manually select classes</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Select by Class</h3>
|
||||
<img src="graphics/2019-10-01-SelectExampleByClass.svg" height="400px" />
|
||||
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Select by Activation</h3>
|
||||
<div style="vertical-align: middle">
|
||||
<img style="vertical-align: middle" src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
|
||||
→<img style="vertical-align: middle" src="graphics/2019-10-01-SelectExampleByActivation.svg" height="400px" />
|
||||
</div>
|
||||
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Select Manually</h3>
|
||||
<img src="graphics/2019-10-01-SelectExampleManually.svg" height="400px" />
|
||||
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<h3>Compare Neurons</h3>
|
||||
<img src="graphics/2019-10-01-Compare.svg" height="400px" />
|
||||
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<section>
|
||||
<h3>Compare to Provenance/Slicing</h3>
|
||||
|
||||
<ul class="fragment">
|
||||
<li>Everything contributes (but to varying degrees).</li>
|
||||
<li>Minimal intuition for what individual "statements" does.</li>
|
||||
<li>"Program" is more regular: Easier to find patterns.</li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
<!--
|
||||
|
||||
- Light background on NNs
|
||||
- Forward Propagation
|
||||
- Matrix Multiplication
|
||||
- Activation Function
|
||||
- Motivation:
|
||||
- What
|
||||
- NNs used in sentencing (Bias)
|
||||
- The Wolf/Husky example
|
||||
- Why
|
||||
- Overestimating model accuracy
|
||||
- Signal in training data that is not present "in the wild"
|
||||
- Overfitting
|
||||
- Central Challenge
|
||||
- Why does a neural network make the choices it does?
|
||||
- When the NN breaks, how to fix it?
|
||||
- What does a neuron represent
|
||||
- Why is the represented feature relevant / how does it help classification
|
||||
- Complexity Tradeoff
|
||||
- Linear/Polynomial Regression: Fully explainable!
|
||||
- Bayesian Models: Fully explainable!
|
||||
- Decision Trees: Usually Explainable (at least at the top)
|
||||
- SVMs: Visualizable
|
||||
- Neural Networks : Lol
|
||||
- Word Embeddings : Lol
|
||||
- Classes of Summary
|
||||
- Activation weights? (too fine-grained)
|
||||
- Focus on a specific context
|
||||
- Source weights / contribution (too fine grained)
|
||||
- Focus on a specific context
|
||||
- So what's a context
|
||||
- What are the main reasons that the specific class is assigned to a specific input/image/document/etc?
|
||||
- Pure Black-Box-Based Techniques
|
||||
- e.g., Ribero: Global explanations are too complex. Focus on explaining a specific case
|
||||
- Steps
|
||||
- Treat the model as a black box: Features in / Class out
|
||||
- Sample a set of inputs, biasing towards "nearby" inputs with features similar to target
|
||||
- Train a linear classifier to distinguish nearby-biased inputs
|
||||
- Restrict non-zero features based on a "complexity" measure
|
||||
- Output:
|
||||
- Directly output weights (e.g., for text classification on n-grams)
|
||||
- Lasso regions with specific features (e.g., for image classification)
|
||||
- Complexity reduced for contiguous regions of similar color (superpixels)
|
||||
- Contextualizing Features
|
||||
- Define feature importance as the number of samples for which the feature is relevant
|
||||
- Pick a set of explanatory samples that cover the maximal importance (weighted set-cover)
|
||||
- NN Activation Techniques
|
||||
- Similar ideas:
|
||||
- Trace the execution of a single example, or
|
||||
- Contrast the execution of a set of examples.
|
||||
|
||||
Trade off between fidelity and interpretability (aggretation)
|
||||
- also global vs local fidelity
|
||||
|
||||
|
||||
- Explain Premise of NNs
|
||||
- Explain Visualization Techniques
|
||||
- Relate to Provenance / Slicing
|
||||
|
||||
|
||||
|
||||
- Exploration models
|
||||
- Instance- based
|
||||
- Subset-based
|
||||
|
||||
[10,26,35,38]
|
||||
|
||||
S. Chung, C. Park, S. Suh, K. Kang, J. Choo, and B. C. Kwon. ReVACNN:
|
||||
Steering convolutional neural network via real-time visual analytics. In
|
||||
Future of Interactive Learning Machines Workshop at the 30th Annual
|
||||
Conference on Neural Information Processing Systems (NIPS)
|
||||
, 2016
|
||||
|
||||
M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis of
|
||||
deep convolutional neural networks.
|
||||
IEEE Transactions on Visualization
|
||||
and Computer Graphics
|
||||
, 23(1):91–100, 2017
|
||||
|
||||
D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viegas, and M. Watten-
|
||||
berg. Embedding Projector: Interactive visualization and interpretation of
|
||||
embeddings. In
|
||||
Workshop on Interpretable Machine Learning in Complex
|
||||
Systems at the 30th Annual Conference on Neural Information Processing
|
||||
Systems (NIPS)
|
||||
, 2016.
|
||||
|
||||
J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding
|
||||
neural networks through deep visualization. In
|
||||
Workshop on Visualization
|
||||
for Deep Learning at the 33rd International Conference on Machine
|
||||
Learning (ICML)
|
||||
, 2016
|
||||
-->
|
1297
src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-BasicNN.svg
Normal file
After Width: | Height: | Size: 57 KiB |
After Width: | Height: | Size: 1.1 MiB |
259
src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Images.svg
Normal file
After Width: | Height: | Size: 177 KiB |
After Width: | Height: | Size: 1.1 MiB |
After Width: | Height: | Size: 1.1 MiB |
After Width: | Height: | Size: 1.1 MiB |
1297
src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-WeightedNN.svg
Normal file
After Width: | Height: | Size: 57 KiB |
After Width: | Height: | Size: 83 KiB |