slides

2019-09-30 23:14:24 -04:00 · 2019-09-30 23:14:24 -04:00 · 32e278d826
parent 70dd474ada
commit 32e278d826
11 changed files with 3783 additions and 2 deletions
--- a/db/cv/okennedy/grants.json
+++ b/db/cv/okennedy/grants.json
@ -1,5 +1,24 @@
 [
-
+  { "title" : "Collaborative Research: III: MEDIUM: U4U - Taming Uncertainty with Uncertainty-Annotated Databases", 
+    "agency" : "NSF: CISE: IIS: MEDIUM", 
+    "role" : "Co-PI", 
+    "amount" : 663996, 
+    "effort" : "100%", 
+    "status" : "submitted", 
+    "start" : "08/15/2020", "end" : "08/14/2024",
+    "type" : "grant",
+    "commitment" : { "summer" : 0.5 },
+    "projects" : ["mimir", "vizier"],
+    "copis" : [
+      "Atri Rudra"
+    ],
+    "collaborative" : [
+      { "institution" : "Illinois Inst. Tech.",
+        "pis" : ["Boris Glavic"],
+        "amount" : 535014 
+      }
+    ]
+  },
  { "title" : "SCC-IRG Track 1: A Sustainable and Connected Community-Scale Food System to Empower Consumers, Farmers, and Retailers in Buffalo, NY",
    "agency" : "NSF: CISE: CNS: CSR", 
    "role" : "Co-I", 
--- a/src/teaching/cse-662/2019fa/index.md
+++ b/src/teaching/cse-662/2019fa/index.md
@ -88,7 +88,7 @@ After the taking the course, students should be able to:
 * **Sep 17** - Program Slicing ([reading 1](https://cse.buffalo.edu/LRG/CSE705/Papers/Weiser-Static-Slicing.pdf) | [reading 2](http://sites.computer.org/debull/A07dec/cheney.pdf))
 * **Sep 19** - Learned Index Structures ([slides](slide/2019-09-18-LearnedIndexStructures.pdf))
 * **Sep 24** - Program Slicing ([slides](slide/2019-09-24-ProgramSlicing2.html))
-* **Oct 1** - Interpretable Deep Learning ([reading](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871))
+* **Oct 1** - Interpretable Deep Learning ([reading 1](https://ieeexplore-ieee-org.gate.lib.buffalo.edu/abstract/document/8022871) | [reading 2](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?Id=2939778))

 ---

--- a/src/teaching/cse-662/2019fa/slide/2019-10-01-ProvenanceInNNs.erb
+++ b/src/teaching/cse-662/2019fa/slide/2019-10-01-ProvenanceInNNs.erb
@ -0,0 +1,451 @@
+---
+template: templates/cse662_2019_slides.erb
+title: Explaining Machine Learning
+date: October 1
+---
+
+<section>
+  <section>
+    <h3>Machine Learning</h3>
+    <p>Given a set of sample points: discover their defining function</p>
+  </section>
+
+  <section>
+    <h3>Classical Regression</h3>
+    <p>given $\{(x_1,y_1), \ldots, (x_K,y_K)\}$, find $f(x) = y$</p>
+    <ul>
+      <li class="fragment">Linear (Fit $m\cdot x + b$)</li>
+      <li class="fragment">Polynomial (Fit $m_0\cdot x^0+m_1\cdot x^1+\ldots+m_N\cdot x^N$)</li>
+      <li class="fragment">Logarithmic (Fit $a\cdot \log(x+b) + c$)</li>
+      <li class="fragment">Multidimensional Variants (etc...)</li>
+    </ul>
+    <p class="fragment">Fit data to a predefined equation / family of equations.</p>
+  </section>
+
+  <section>
+    <h3>Limitations</h3>
+    <ul>
+      <li>Need to select equation(s) to fit upfront.</li>
+      <li>Difficult to cope with discontinuities.</li>
+      <li>Simple equations have very few parameters.</li>
+    </ul>
+  </section>
+
+  <section>
+    <h3>Neural Networks</h3>
+    <img style="float: left" width="300px" src="graphics/2019-10-01-BasicNN.svg" />
+
+    $$z = RELU(v_{z,1}y_1 + ... + v_{z,7}y_7)$$
+    $$y_i = RELU(v_{y,i,1}x_1 + ... + v_{y,i,7}x_7)$$
+    $$x_i = RELU(v_{x,i,1}w_1 + ... + v_{x,i,10}w_10)$$
+      
+    <p class="fragment">$RELU$: Discontinuity Function</p>
+    <p class="fragment">One big mega-function to fit to the data</p>
+  </section>
+
+  <section>
+    <h3>More complex flavors</h3>
+    <dl>
+      <div class="fragment">
+        <dt>Convolutional NNs</dt>
+        <dd>Train pattern-matching kernels for images.</dd>
+      </div>
+      <div class="fragment">
+        <dt>Recurrent NNs</dt>
+        <dd>Feed output of NN back into itself for time-series data.</dd>
+      </div>
+    </dl>
+
+    <p class="fragment"><b>In general: </b>Pre-specify layering structure as a NN workflow.</p>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>Today</h3>
+    <p>Not looking at how to train a NN!</p>
+    <p class="fragment">Why are NNs dangerous?</p>
+  </section>
+
+  <section>
+    <h3>NN Use Cases</h3>
+    <ul>
+      <li class="fragment">Sentencing / Parole decisions</li>
+      <li class="fragment">Loan decisions</li>
+      <li class="fragment">Housing/Employment opportunities</li>
+    </ul>
+
+    <p class="fragment">Why does it make the choices it does?</p>
+  </section>
+
+  <section>
+    <h3>Why is "why" important?</h3>
+    <dl>
+      <div class="fragment">
+        <dt>Legal reasons</dt>
+        <dd>Protected classes or variables that can act as proxies (redlining).</dd>
+      </div>
+      <div class="fragment">
+        <dt>Training-only signals</dt>
+        <dd>Image quality (tanks), Coincidental features (dogs).</dd>
+      </div>
+      <div class="fragment">
+        <dt>Debugging</dt>
+        <dd>Anticipate, react to unusual failure modes.</dd>
+      </div>
+      <div class="fragment">
+        <dt>Overfitting</dt>
+        <dd>Is the model just "memorizing" the training data?</dd>
+      </div>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Contrast with...</h3>
+    <dl>
+      <dt class="fragment">Classical Regression</dt>
+      <dd class="fragment">The formula is the formula!</dd>
+
+      <dt class="fragment">Graphical Models</dt>
+      <dd class="fragment">Model structure clearly connects related variables.</dd>
+
+      <dt class="fragment">Decision Trees</dt>
+      <dd class="fragment">Everything is phrased as Yes/No questions.</dd>
+
+      <dt class="fragment">Neural Networks</dt>
+      <dd class="fragment">🤷</dd>
+    </dl>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>Understanding NNs</h3>
+    <p>Today's discussion:</p>
+    <dl>
+      <dt>Slicing?</dt>
+      <dd>Not practical: Too many interdependencies.</dd>
+      <dt>Sensitivity Analysis</dt>
+      <dd>Figure out which "features" are the most relevant.</dd>
+      <dt>Direct Debugging</dt>
+      <dd>Help the user discover patterns in the neurons.</dd>
+    </dl>
+  </section>
+
+  <section>
+    <h3>Understanding NNs</h3>
+    <p></p>
+    <p><b>Observation 1:</b> Describing the <i>entire</i> model concisely is hard</p>
+    <p><b>Observation 2:</b> Describing the model on a single input is easier</p>
+    <p><b>Contrast with dynamic slicing</b></p>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>Sensitivity Analysis</h3>
+
+    <p>Given a target point, figure out which of the point's features are <i>most</i> responsible for the classification.</p>
+  </section>
+
+  <section>
+    <p><b>“Why Should I Trust You?” Explaining the Predictions of Any Classifier</b></p>
+    <p>Ribero, Sing, Guestrin</p>
+  </section>
+
+  <section>
+    <h3>Model</h3>
+    $$f : \mathbb R^d \rightarrow \mathbb R$$
+  </section>
+
+  <section>
+    <h3>Model</h3>    
+    <p>Arguments to $f$ are input <i>features</i>. For example:</p>
+    
+    <dl>
+      <dt>Probability that an email is spam</dt>
+      <dd>Each known word is a binary (0-1) feature</dd>
+      <dt>0-1 confidence that a greyscale image contains a "3"</dt>
+      <dd>The brightness of each pixel is a feature.</dd>
+      <dt>Chance of default on a potential loan</dt>
+      <dd>Customer characteristics (income, spending, education)</dd>
+    </ul>
+
+    <p><b>Note:</b> The source model doesn't have to be an NN.</p>
+  </section>
+
+  <section>
+    <h2>Abstractions</h2>
+  </section>
+
+  <section>
+    <p>Focusing on <u>similar inputs</u>, learn an <u>explainable</u> model $g$.</p>
+
+    <p>Define similarity by a distance function: </p>
+    $$\pi_x : R^d \times R^d \rightarrow [0,1]$$
+  </section>
+
+  <section>
+    <h3>Desiderata</h3>
+
+    <ul>
+      <li>$g$ should be simple (complexity function $\Omega(g)$</li>
+      <li>$g$ should be similar to $f$ on values similar to $x$ (error function $\mathcal L(f, g, \pi_x)$</li>
+    </ul>
+    <p><b>Overall Goal: </b> Find a $g$ that minimizes $\mathcal L(f, g, \pi_x) + \Omega(g)$</p>
+  </section>
+
+  <section>
+    <h3>Simpler Model</h3>
+    $$g : \mathbb \{0,1\}^d \rightarrow \mathbb R$$
+    $$g(x) = w_1\cdot x_1 + \ldots + w_d\cdot x_d$$
+    <p class="fragment">Boolean- (not Real-)valued features.</p>
+  </section>
+
+  <section>
+    <h3>Simplification: Thresholding</h3>
+    $$x'_i = \begin{cases} 1 & \textbf{if } x_i \geq T \\ 0 & \textbf{otherwise} \end{cases}$$
+  </section>
+
+  <section>
+    <h3>Sampling Around $x$</h3>
+    <ol>
+      <li class="fragment">Pick a number of features $\# \sim Uniform(1, |\{x'_i \neq 0\}|)$</li>
+      <li class="fragment">Randomly select features $F \subseteq \{i \;|\;x'_i \neq 0\}$ s.t. $|F| = \#$</li>
+      <li class="fragment">Pick $z$ (resp., $z'$) s.t. $z_i = \begin{cases} x_i & \textbf{if } i \in F \\ \textit{random} & \textbf{otherwise}\end{cases}$</li>
+      <li class="fragment">Repeat $K$ times to get a collection of samples $(z,z') \in \mathcal Z$</li>
+    </ol>
+  </section>
+
+  <section>
+    <h3>Simple Error Function</h3>
+    $$\mathcal L(f,g,\pi_x) = \sum_{(z,z')\in \mathcal Z} \pi_x(z) (f(z) - g(z'))^2$$
+    <p>Pick a $g$ (resp., $\{w_i\}$) that minimizes this!</p>
+  </section>
+
+  <section>
+    <h3>Example</h3>
+    <h4>Classifying Email Subjects</h4>
+
+    <p>Each feature is a word: '1' if the word is present, '0' if not. <br/> ($x = x'$)</p>
+
+    <p><b>Complexity Function: </b> (at most $K$ features)
+      $\Omega(g) =  \begin{cases} 0 & \textbf{if }|\{w_i > 0\}| > K \\ \infty & \textbf{otherwise}\end{cases}$
+    </p>
+
+    <p><b>Simplified model</b>: Find the $K$ features <i>most</i> responsible for differentiating between the target class.</p>
+  </section>
+
+  <section>
+    <img src="graphics/2019-10-01-Weights.svg" height="400px"/>
+    <attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
+  </section>
+
+  <section>
+    <h3>Example</h3>
+    <h4>Interpreting Image Classification</h4>
+
+    <p>Each feature is a "superpixel" (a contiguous region of similarly colored pixels): $x_i$ is the color of the pixel; $x_i'$ is 1 if the superpixel is identical to the original.</p>
+
+    <p><b>Complexity Function: </b> (at most $K$ superpixels)
+      "Lasso" (as in the lasso tool) pixels together into a contiguous region of no more than $K$ superpixels.
+    </p>
+
+    <p><b>Simplified model</b>: Find the $K$ superpixels  <i>most</i> responsible for differentiating between the target class.</p>
+  </section>
+
+  <section>
+    <img src="graphics/2019-10-01-Images.svg" height="400px"/>
+    <attribution>“Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribero et. al.)</attribution>
+  </section>
+
+  <section>
+    <h3>Summary</h3>
+    <ul>
+      <li>Generate a set of "similar" samples</li>
+      <li>See how those samples behave on the model</li>
+      <li>Train a simpler model $g$ to pick the most relevant features.</li>
+    </ul>
+
+    <p>If the model has the features identified by $g$, it is probably of the target class.</p>
+  </section>
+
+</section>
+
+<section>
+  <section>
+    <p><b>ACTIVIS: Visual Exploration of Industry-Scale
+Deep Neural Network Models</b></p>
+    <p>Kahng, Andrews, Kalro, Chau</p>
+  </section>
+
+  <section>
+    <h3>Shortcomings</h3>
+    <ul>
+      <li>Tells <i>what</i> the network is doing, but not why.</li>
+      <li>Of limited use for understanding inner layers of an NN.</li>
+      <li>Too automated: Hard to test hypotheses.</li>
+    </ul>
+    <p><b>Solution: </b> Let users see the nodes themselves.</p>
+  </section>
+
+  <section>
+    <img src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
+  </section>
+
+  <section>
+    <h3>Desiderata</h3>
+    <dl>
+      <dt>Can't see what the network is doing.</dt>
+      <dd>Let users <i>compare</i> neuron activations.</dd>
+
+      <dt>Can't test hypotheses</dt>
+      <dd>Let users <i>pick</i> comparison points.</dd>
+    </dl>
+  </section>
+
+
+
+  <section>
+    <h3>Exploring comparison points</h3>
+    <ul>
+      <li>Compare by class / correctness</li>
+      <li>Compare by activation</li>
+      <li>Manually select classes</li>
+    </ul>
+  </section>
+
+  <section>
+    <h3>Select by Class</h3>
+    <img src="graphics/2019-10-01-SelectExampleByClass.svg" height="400px" />
+    <attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
+  </section>
+
+  <section>
+    <h3>Select by Activation</h3>
+    <div style="vertical-align: middle">
+    <img style="vertical-align: middle" src="graphics/2019-10-01-WeightedNN.svg" height="400px"/>
+    →<img style="vertical-align: middle" src="graphics/2019-10-01-SelectExampleByActivation.svg" height="400px" />
+    </div>
+    <attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
+  </section>
+
+  <section>
+    <h3>Select Manually</h3>
+    <img src="graphics/2019-10-01-SelectExampleManually.svg" height="400px" />
+    <attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
+  </section>
+
+  <section>
+    <h3>Compare Neurons</h3>
+    <img src="graphics/2019-10-01-Compare.svg" height="400px" />
+    <attribution>ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models (Kahng et. al.)</attribution>
+  </section>
+</section>
+
+<section>
+  <section>
+    <h3>Compare to Provenance/Slicing</h3>
+
+    <ul class="fragment">
+      <li>Everything contributes (but to varying degrees).</li>
+      <li>Minimal intuition for what individual "statements" does.</li>
+      <li>"Program" is more regular: Easier to find patterns.</li>
+    </ul>
+  </section>
+</section>
+<!--
+
+- Light background on NNs
+  - Forward Propagation
+    - Matrix Multiplication
+    - Activation Function
+- Motivation: 
+  - What
+    - NNs used in sentencing (Bias)
+    - The Wolf/Husky example
+  - Why
+    - Overestimating model accuracy
+    - Signal in training data that is not present "in the wild"
+    - Overfitting
+- Central Challenge
+  - Why does a neural network make the choices it does?
+  - When the NN breaks, how to fix it?
+  - What does a neuron represent
+  - Why is the represented feature relevant / how does it help classification
+  - Complexity Tradeoff
+    - Linear/Polynomial Regression: Fully explainable!
+    - Bayesian Models: Fully explainable!
+    - Decision Trees: Usually Explainable (at least at the top)
+    - SVMs: Visualizable
+    - Neural Networks : Lol
+    - Word Embeddings : Lol
+- Classes of Summary
+  - Activation weights?  (too fine-grained)
+    - Focus on a specific context
+  - Source weights / contribution (too fine grained)
+    - Focus on a specific context
+  - So what's a context
+    - What are the main reasons that the specific class is assigned to a specific input/image/document/etc?
+- Pure Black-Box-Based Techniques
+  - e.g., Ribero: Global explanations are too complex.  Focus on explaining a specific case
+    - Steps
+      - Treat the model as a black box: Features in / Class out
+      - Sample a set of inputs, biasing towards "nearby" inputs with features similar to target
+      - Train a linear classifier to distinguish nearby-biased inputs 
+      - Restrict non-zero features based on a "complexity" measure
+      - Output: 
+        - Directly output weights (e.g., for text classification on n-grams)
+        - Lasso regions with specific features (e.g., for image classification)
+          - Complexity reduced for contiguous regions of similar color (superpixels)
+    - Contextualizing Features
+      - Define feature importance as the number of samples for which the feature is relevant
+      - Pick a set of explanatory samples that cover the maximal importance (weighted set-cover)
+- NN Activation Techniques
+  - Similar ideas:
+    - Trace the execution of a single example, or
+    - Contrast the execution of a set of examples.
+
+Trade off between fidelity and interpretability (aggretation)
+  - also global vs local fidelity
+
+
+- Explain Premise of NNs
+- Explain Visualization Techniques
+- Relate to Provenance / Slicing
+
+
+
+- Exploration models
+  - Instance- based
+  - Subset-based
+
+[10,26,35,38]
+
+S. Chung, C. Park, S. Suh, K. Kang, J. Choo, and B. C. Kwon. ReVACNN:
+Steering convolutional neural network via real-time visual analytics. In
+Future of Interactive Learning Machines Workshop at the 30th Annual
+Conference on Neural Information Processing Systems (NIPS)
+, 2016
+
+M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu. Towards better analysis of
+deep convolutional neural networks.
+IEEE Transactions on Visualization
+and Computer Graphics
+, 23(1):91–100, 2017
+
+D. Smilkov, N. Thorat, C. Nicholson, E. Reif, F. B. Viegas, and M. Watten-
+berg. Embedding Projector: Interactive visualization and interpretation of
+embeddings. In
+Workshop on Interpretable Machine Learning in Complex
+Systems at the 30th Annual Conference on Neural Information Processing
+Systems (NIPS)
+, 2016.
+
+J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson. Understanding
+neural networks through deep visualization. In
+Workshop on Visualization
+for Deep Learning at the 33rd International Conference on Machine
+Learning (ICML)
+, 2016
+-->
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-BasicNN.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-BasicNN.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Compare.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Compare.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Images.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Images.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleByActivation.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleByActivation.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleByClass.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleByClass.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleManually.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-SelectExampleManually.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-WeightedNN.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-WeightedNN.svg
--- a/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Weights.svg
+++ b/src/teaching/cse-662/2019fa/slide/graphics/2019-10-01-Weights.svg