This commit is contained in:
Oliver Kennedy 2019-09-30 23:14:24 -04:00
parent 70dd474ada
commit 32e278d826
Signed by: okennedy
GPG key ID: 3E5F9B3ABD3FDB60
11 changed files with 3783 additions and 2 deletions

View file

@ -1,5 +1,24 @@
[
{ "title" : "Collaborative Research: III: MEDIUM: U4U - Taming Uncertainty with Uncertainty-Annotated Databases",
"agency" : "NSF: CISE: IIS: MEDIUM",
"role" : "Co-PI",
"amount" : 663996,
"effort" : "100%",
"status" : "submitted",
"start" : "08/15/2020", "end" : "08/14/2024",
"type" : "grant",
"commitment" : { "summer" : 0.5 },
"projects" : ["mimir", "vizier"],
"copis" : [
"Atri Rudra"
],
"collaborative" : [
{ "institution" : "Illinois Inst. Tech.",
"pis" : ["Boris Glavic"],
"amount" : 535014
}
]
},
{ "title" : "SCC-IRG Track 1: A Sustainable and Connected Community-Scale Food System to Empower Consumers, Farmers, and Retailers in Buffalo, NY",
"agency" : "NSF: CISE: CNS: CSR",
"role" : "Co-I",

View file

@ -88,7 +88,7 @@ After the taking the course, students should be able to:
* **Sep 17** - Program Slicing ([reading 1](https://cse.buffalo.edu/LRG/CSE705/Papers/Weiser-Static-Slicing.pdf) | [reading 2](http://sites.computer.org/debull/A07dec/cheney.pdf))
* **Sep 19** - Learned Index Structures ([slides](slide/2019-09-18-LearnedIndexStructures.pdf))
* **Sep 24** - Program Slicing ([slides](slide/2019-09-24-ProgramSlicing2.html))
* **Oct 1** - Interpretable Deep Learning ([reading](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871))
* **Oct 1** - Interpretable Deep Learning ([reading 1](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871) | [reading 2](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?Id=2939778))
---

View file

@ -0,0 +1,451 @@
---
template: templates/cse662_2019_slides.erb
title: Explaining Machine Learning
date: October 1
---
<section>
<section>
<h3>Machine Learning</h3>
<p>Given a set of sample points: discover their defining function</p>
</section>
<section>
<h3>Classical Regression</h3>
<p>given $\{(x_1,y_1), \ldots, (x_K,y_K)\}$, find $f(x) = y$</p>
<ul>
<li class="fragment">Linear (Fit $m\cdot x + b$)</li>
<li class="fragment">Polynomial (Fit $m_0\cdot x^0+m_1\cdot x^1+\ldots+m_N\cdot x^N$)</li>
<li class="fragment">Logarithmic (Fit $a\cdot \log(x+b) + c$)</li>
<li class="fragment">Multidimensional Variants (etc...)</li>
</ul>
<p class="fragment">Fit data to a predefined equation / family of equations.</p>
</section>
<section>
<h3>Limitations</h3>
<ul>
<li>Need to select equation(s) to fit upfront.</li>
<li>Difficult to cope with discontinuities.</li>
<li>Simple equations have very few parameters.</li>
</ul>
</section>
<section>
<h3>Neural Networks</h3>
<img style="float: left" width="300px" src="graphics/2019-10-01-BasicNN.svg" />
$$z = RELU(v_{z,1}y_1 + ... + v_{z,7}y_7)$$
$$y_i = RELU(v_{y,i,1}x_1 + ... + v_{y,i,7}x_7)$$
$$x_i = RELU(v_{x,i,1}w_1 + ... + v_{x,i,10}w_10)$$
<p class="fragment">$RELU$: Discontinuity Function</p>
<p class="fragment">One big mega-function to fit to the data</p>
</section>
<section>
<h3>More complex flavors</h3>
<dl>
<div class="fragment">
<dt>Convolutional NNs</dt>
<dd>Train pattern-matching kernels for images.</dd>
</div>
<div class="fragment">
<dt>Recurrent NNs</dt>
<dd>Feed output of NN back into itself for time-series data.</dd>
</div>
</dl>
<p class="fragment"><b>In general: </b>Pre-specify layering structure as a NN workflow.</p>
</section>
</section>
<section>
<section>
<h3>Today</h3>
<p>Not looking at how to train a NN!</p>
<p class="fragment">Why are NNs dangerous?</p>
</section>
<section>
<h3>NN Use Cases</h3>
<ul>
<li class="fragment">Sentencing / Parole decisions</li>
<li class="fragment">Loan decisions</li>
<li class="fragment">Housing/Employment opportunities</li>
</ul>
<p class="fragment">Why does it make the choices it does?</p>
</section>
<section>
<h3>Why is "why" important?</h3>
<dl>
<div class="fragment">
<dt>Legal reasons</dt>
<dd>Protected classes or variables that can act as proxies (redlining).</dd>
</div>
<div class="fragment">
<dt>Training-only signals</dt>
<dd>Image quality (tanks), Coincidental features (dogs).</dd>
</div>
<div class="fragment">
<dt>Debugging</dt>
<dd>Anticipate, react to unusual failure modes.</dd>
</div>
<div class="fragment">
<dt>Overfitting</dt>
<dd>Is the model just "memorizing" the training data?</dd>
</div>
</dl>
</section>
<section>
<h3>Contrast with...</h3>
<dl>
<dt class="fragment">Classical Regression</dt>
<dd class="fragment">The formula is the formula!</dd>
<dt class="fragment">Graphical Models</dt>
<dd class="fragment">Model structure clearly connects related variables.</dd>
<dt class="fragment">Decision Trees</dt>
<dd class="fragment">Everything is phrased as Yes/No questions.</dd>
<dt class="fragment">Neural Networks</dt>
<dd class="fragment">🤷</dd>
</dl>
</section>
</section>
<section>
<section>
<h3>Understanding NNs</h3>
<p>Today's discussion:</p>
<dl>
<dt>Slicing?</dt>
<dd>Not practical: Too many interdependencies.</dd>
<dt>Sensitivity Analysis</dt>
<dd>Figure out which "features" are the most relevant.</dd>
<dt>Direct Debugging</dt>
<dd>Help the user discover patterns in the neurons.</dd>
</dl>
</section>
<section>
<h3>Understanding NNs</h3>
<p></p>
<p><b>Observation 1:</b> Describing the <i>entire</i> model concisely is hard</p>
<p><b>Observation 2:</b> Describing the model on a single input is easier</p>
<p><b>Contrast with dynamic slicing</b></p>
</section>
</section>
<section>
<section>
<h3>Sensitivity Analysis</h3>
<p>Given a target point, figure out which of the point's features are <i>most</i> responsible for the classification.</p>
</section>
<section>
<p><b>“Why Should I Trust You?” Explaining the Predictions of Any Classifier</b></p>
<p>Ribero, Sing, Guestrin</p>
</section>
<section>
<h3>Model</h3>
$$f : \mathbb R^d \rightarrow \mathbb R$$
</section>
<section>
<h3>Model</h3>
<p>Arguments to $f$ are input <i>features</i>. For example:</p>
<dl>
<dt>Probability that an email is spam</dt>
<dd>Each known word is a binary (0-1) feature</dd>
<dt>0-1 confidence that a greyscale image contains a "3"</dt>
<dd>The brightness of each pixel is a feature.</dd>
<dt>Chance of default on a potential loan</dt>
<dd>Customer characteristics (income, spending, education)</dd>
</ul>
<p><b>Note:</b> The source model doesn't have to be an NN.</p>
</section>
<section>
<h2>Abstractions</h2>
</section>
<section>
<p>Focusing on <u>similar inputs</u>, learn an <u>explainable</u> model $g$.</p>
<p>Define similarity by a distance function: </p>
$$\pi_x : R^d \times R^d \rightarrow [0,1]$$
</section>
<section>
<h3>Desiderata</h3>
<ul>
<li>$g$ should be simple (complexity function $\Omega(g)$</li>
<li>$g$ should be similar to $f$ on values similar to $x$ (error function $\mathcal L(f, g, \pi_x)$</li>
</ul>
<p><b>Overall Goal: </b> Find a $g$ that minimizes $\mathcal L(f, g, \pi_x) + \Omega(g)$</p>
</section>
<section>
<h3>Simpler Model</h3>
$$g : \mathbb \{0,1\}^d \rightarrow \mathbb R$$
$$g(x) = w_1\cdot x_1 + \ldots + w_d\cdot x_d$$
<p class="fragment">Boolean- (not Real-)valued features.</p>
</section>
<section>
<h3>Simplification: Thresholding</h3>
$$x'_i = \begin{cases} 1 & \textbf{if } x_i \geq T \\ 0 & \textbf{otherwise} \end{cases}$$
</section>
<section>
<h3>Sampling Around $x$</h3>
<ol>
<li class="fragment">Pick a number of features $\# \sim Uniform(1, |\{x'_i \neq 0\}|)$</li>
<li class="fragment">Randomly select features $F \subseteq \{i \;|\;x'_i \neq 0\}$ s.t. $|F| = \#$</li>
<li class="fragment">Pick $z$ (resp., $z'$) s.t. $z_i = \begin{cases} x_i & \textbf{if } i \in F \\ \textit{random} & \textbf{otherwise}\end{cases}$</li>
<li class="fragment">Repeat $K$ times to get a collection of samples $(z,z') \in \mathcal Z$</li>
</ol>
</section>
<section>
<h3>Simple Error Function</h3>
$$\mathcal L(f,g,\pi_x) = \sum_{(z,z')\in \mathcal Z} \pi_x(z) (f(z) - g(z'))^2$$
<p>Pick a $g$ (resp., $\{w_i\}$) that minimizes this!</p>
</section>
<section>
<h3>Example</h3>
<h4>Classifying Email Subjects</h4>
<p>Each feature is a word: '1' if the word is present, '0' if not. <br/> ($x = x'$)</p>
<p><b>Complexity Function: </b> (at most $K$ features)
$\Omega(g) = \begin{cases} 0 & \textbf{if }|\{w_i > 0\}| > K \\ \infty & \textbf{otherwise}\end{cases}$
</p>
<p><b>Simplified model</b>: Find the $K$ features <i>most</i> responsible for differentiating between the target class.</p>
</section>
<section>
<img src="graphics/2019-10-01-Weights.svg" height="400px"/>
<attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
</section>
<section>
<h3>Example</h3>
<h4>Interpreting Image Classification</h4>
<p>Each feature is a "superpixel" (a contiguous region of similarly colored pixels): $x_i$ is the color of the pixel; $x_i'$ is 1 if the superpixel is identical to the original.</p>
<p><b>Complexity Function: </b> (at most $K$ superpixels)
"Lasso" (as in the lasso tool) pixels together into a contiguous region of no more than $K$ superpixels.
</p>
<p><b>Simplified model</b>: Find the $K$ superpixels <i>most</i> responsible for differentiating between the target class.</p>
</section>
<section>
<img src="graphics/2019-10-01-Images.svg" height="400px"/>
<attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
</section>
<section>
<h3>Summary</h3>
<ul>
<li>Generate a set of "similar" samples</li>
<li>See how those samples behave on the model</li>
<li>Train a simpler model $g$ to pick the most relevant features.</li>
</ul>
<p>If the model has the features identified by $g$, it is probably of the target class.</p>
</section>
</section>
<section>
<section>
<p><b>ACTIVIS: Visual Exploration of Industry-Scale
Deep Neural Network Models</b></p>
<p>Kahng, Andrews, Kalro, Chau</p>
</section>
<section>
<h3>Shortcomings</h3>
<ul>
<li>Tells <i>what</i> the network is doing, but not why.</li>
<li>Of limited use for understanding inner layers of an NN.</li>
<li>Too automated: Hard to test hypotheses.</li>
</ul>
<p><b>Solution: </b> Let users see the nodes themselves.</p>
</section>
<section>
<img src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
</section>
<section>
<h3>Desiderata</h3>
<dl>
<dt>Can't see what the network is doing.</dt>
<dd>Let users <i>compare</i> neuron activations.</dd>
<dt>Can't test hypotheses</dt>
<dd>Let users <i>pick</i> comparison points.</dd>
</dl>
</section>
<section>
<h3>Exploring comparison points</h3>
<ul>
<li>Compare by class / correctness</li>
<li>Compare by activation</li>
<li>Manually select classes</li>
</ul>
</section>
<section>
<h3>Select by Class</h3>
<img src="graphics/2019-10-01-SelectExampleByClass.svg" height="400px" />
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
</section>
<section>
<h3>Select by Activation</h3>
<div style="vertical-align: middle">
<img style="vertical-align: middle" src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
→<img style="vertical-align: middle" src="graphics/2019-10-01-SelectExampleByActivation.svg" height="400px" />
</div>
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
</section>
<section>
<h3>Select Manually</h3>
<img src="graphics/2019-10-01-SelectExampleManually.svg" height="400px" />
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
</section>
<section>
<h3>Compare Neurons</h3>
<img src="graphics/2019-10-01-Compare.svg" height="400px" />
<attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
</section>
</section>
<section>
<section>
<h3>Compare to Provenance/Slicing</h3>
<ul class="fragment">
<li>Everything contributes (but to varying degrees).</li>
<li>Minimal intuition for what individual "statements" does.</li>
<li>"Program" is more regular: Easier to find patterns.</li>
</ul>
</section>
</section>
<!--
- Light background on NNs
- Forward Propagation
- Matrix Multiplication
- Activation Function
- Motivation:
- What
- NNs used in sentencing (Bias)
- The Wolf/Husky example
- Why
- Overestimating model accuracy
- Signal in training data that is not present "in the wild"
- Overfitting
- Central Challenge
- Why does a neural network make the choices it does?
- When the NN breaks, how to fix it?
- What does a neuron represent
- Why is the represented feature relevant / how does it help classification
- Complexity Tradeoff
- Linear/Polynomial Regression: Fully explainable!
- Bayesian Models: Fully explainable!
- Decision Trees: Usually Explainable (at least at the top)
- SVMs: Visualizable
- Neural Networks : Lol
- Word Embeddings : Lol
- Classes of Summary
- Activation weights? (too fine-grained)
- Focus on a specific context
- Source weights / contribution (too fine grained)
- Focus on a specific context
- So what's a context
- What are the main reasons that the specific class is assigned to a specific input/image/document/etc?
- Pure Black-Box-Based Techniques
- e.g., Ribero: Global explanations are too complex. Focus on explaining a specific case
- Steps
- Treat the model as a black box: Features in / Class out
- Sample a set of inputs, biasing towards "nearby" inputs with features similar to target
- Train a linear classifier to distinguish nearby-biased inputs
- Restrict non-zero features based on a "complexity" measure
- Output:
- Directly output weights (e.g., for text classification on n-grams)
- Lasso regions with specific features (e.g., for image classification)
- Complexity reduced for contiguous regions of similar color (superpixels)
- Contextualizing Features
- Define feature importance as the number of samples for which the feature is relevant
- Pick a set of explanatory samples that cover the maximal importance (weighted set-cover)
- NN Activation Techniques
- Similar ideas:
- Trace the execution of a single example, or
- Contrast the execution of a set of examples.
Trade off between fidelity and interpretability (aggretation)
- also global vs local fidelity
- Explain Premise of NNs
- Explain Visualization Techniques
- Relate to Provenance / Slicing
- Exploration models
- Instance- based
- Subset-based
[10,26,35,38]
S. Chung, C. Park, S. Suh, K. Kang, J. Choo, and B. C. Kwon. ReVACNN:
Steering convolutional neural network via real-time visual analytics. In
Future of Interactive Learning Machines Workshop at the 30th Annual
Conference on Neural Information Processing Systems (NIPS)
, 2016
M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis of
deep convolutional neural networks.
IEEE Transactions on Visualization
and Computer Graphics
, 23(1):91100, 2017
D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viegas, and M. Watten-
berg. Embedding Projector: Interactive visualization and interpretation of
embeddings. In
Workshop on Interpretable Machine Learning in Complex
Systems at the 30th Annual Conference on Neural Information Processing
Systems (NIPS)
, 2016.
J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding
neural networks through deep visualization. In
Workshop on Visualization
for Deep Learning at the 33rd International Conference on Machine
Learning (ICML)
, 2016
-->

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 57 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.1 MiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 177 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.1 MiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.1 MiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 1.1 MiB

File diff suppressed because it is too large Load diff

After

Width:  |  Height:  |  Size: 57 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 83 KiB