From f19dc629eda2b466e3e8041d4bee9ada14f8a5c9 Mon Sep 17 00:00:00 2001 From: Oliver Date: Mon, 2 Sep 2019 15:09:00 -0600 Subject: [PATCH] slides for tuesday --- src/teaching/cse-662/2019fa/index.md | 4 +- .../cse-662/2019fa/slide/2019-08-29-Seeds.erb | 465 -------- .../2019fa/slide/2019-09-03-FunctionalDS.erb | 764 ++++++------ .../graphics/2018-08-31-AmortizedQueue.svg | 529 +++++++++ .../graphics/2018-08-31-FunctionalStack.svg | 1028 +++++++++++++++++ .../2018-08-31-FunctionalStackMerge.svg | 650 +++++++++++ .../2018-08-31-FunctionalTreeInsertion.svg | 362 ++++++ 7 files changed, 2907 insertions(+), 895 deletions(-) delete mode 100644 src/teaching/cse-662/2019fa/slide/2019-08-29-Seeds.erb create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-AmortizedQueue.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStack.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStackMerge.svg create mode 100644 src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalTreeInsertion.svg diff --git a/src/teaching/cse-662/2019fa/index.md b/src/teaching/cse-662/2019fa/index.md index 8bbdafb9..00d9e075 100644 --- a/src/teaching/cse-662/2019fa/index.md +++ b/src/teaching/cse-662/2019fa/index.md @@ -78,8 +78,8 @@ After the taking the course, students should be able to: ## Lecture Notes * **Aug 27** - Intro and Seeds ([slides](slide/2019-08-27-Introduction.html)) - - +* **Sep 3** - Functional Data Structures ([slides](slide/2019-09-03-FunctionalDS.html)) +* **Sep 5** - Lazy Transactions ([reading](https://dl-acm-org.gate.lib.buffalo.edu/citation.cfm?id=2610529)) --- diff --git a/src/teaching/cse-662/2019fa/slide/2019-08-29-Seeds.erb b/src/teaching/cse-662/2019fa/slide/2019-08-29-Seeds.erb deleted file mode 100644 index d8dd5edb..00000000 --- a/src/teaching/cse-662/2019fa/slide/2019-08-29-Seeds.erb +++ /dev/null @@ -1,465 +0,0 @@ - - - - - - - CSE 662 - Languages and Runtimes for Big Data - - - - - - - - - - - - - - - - - - - - - - - -
- - -
-
-
-

Project Seeds

-
- -
-

Reminder

-

Learned Index Structures due Weds (1 week)

-
- -
-

Expectations

- -
-

Checkpoint 1: Project Description (Due Sept 23, 11:59)

-
    -
  • What is the specific challenge that you will solve?
  • -
  • What metrics will you use to evaluate success?
  • -
  • What deliverables will you produce?
  • -
-
-

Checkpoint 2: Progress Report (Due Oct 21, 11:59)

-
    -
  • What challenges have you overcome so far?
  • -
  • How does your existing work compare to other, similar approaches?
  • -
  • What design decisions have you made so far and why?
  • -
  • How have your goals changed from checkpoint 1?
  • -
  • What challenges remain for you to overcome?
  • -
-
-

Checkpoint 3: Final Report (Due Dec 9, 11:59)

-
    -
  • What specific challenges did you solve?
  • -
  • How does your final solution compare to other, similar approaches?
  • -
  • Were the design decisions you made correct and why?
  • -
-
- -
-
- -
- -
-

Decentralized IoT Plumbing

-
- -
- -
- -
- - + - -
- -
- -
- -
- -
- -
-

What IoT Means

- -

Lots of devices with...

-
-
Sensors (Temperature, RFID, Cameras)
-
Inputs from the outside world.
-
-
-
Actuators (Robots, Lightbulbs, Conveyor Belts)
-
Outputs to affect the outside world.
-
-
-
Reasonable Compute Resources
-
The ability to actually decide how.
-
-

-
- -
- -
- -
- -
- -
-

Core Idea

-
-
The user gives you...
-
A list of nodes (sensors/actuators)
-
A list of activities (globally what to do and when)
-
Your code compiles and deploys...
-
Triggers for nodes (locally what to do and when)
-
-
- -
-

Things to Think About...

-
    -
  • How does the user specify activities to your system?
  • -
  • Which node(s) is(/are) responsible for required computation?
  • -
  • How do you get data from where it is to where the compute happens?
  • -
  • What resources (compute, network) will be needed to execute on your plan?
  • -
  • How do you optimize the necessary compute for one activity? across all activities?
  • -
-
- -
- -
- -
-

Uncertainty-Aware Machine Learning

-
- -
-
- -

Not all data sources are created equal.

-
-
- -
- -

Even within one data set, some data may be more trustworthy than others.

-
- -
-

Mixed-Quality Training

-

How do you train a classifier/neural net/markov model/etc... on mixed-quality data?

-
    -
  • Preprocess the data ("fix" the errors)
  • -
  • Train separate models on subsets of the data
  • -
  • Ignore the errors and hope for the best
  • -
-

Problem: Usually easier to "fix" than to label missing data.

-
- -
-

But what if the data is already labeled!

-
- -
-

Core Idea

-
-
You get...
-
A dataset
-
Descriptions of uncertainty (what kind is up to you)
-
You make...
-
A model (of some sort) that is of higher quality using labels than not using them.
-
-

Ideally the model is interpretable as well.

-
- -
-

Things to Think About

- -
    -
  • What statistical properties are you aiming for?
  • -
  • How should you describe uncertain data?
  • -
  • How should the model interact with missing data? ... to less reliable data?
  • -
  • How does uncertainty in the training data affect the model's predictions
  • -
-
- -
- -
- -
-

Web-of-Trust for Crowdsourced Data

-
- -
- -
- -
-

Crowdsourcing

- -

Have a question?

- -

Most people will give you a bad answer.

- -

A few will give you a bad answer.

- -

The average of a bunch of bad answers and a few good answers is a good answer?

-
- -
-

Crowdsourcing with Trust!

-
- -
-

Web of Trust

- - -
- -
- -
- -
-

Core Idea

-
-
You get...
-
A set of participants
-
A set of (possibly contradictory) facts stated by each participant
-
A set of trust levels for each pair of participants
-
You produce...
-
A (weighted?) set of facts for each user.
-
-
- -
-

Things to Think About

- -
    -
  • How do trust levels combine? (Transitively vs Additively)
  • -
  • How do derivations of contradictory facts combine (e.g., average trust vs most trusted wins)
  • -
  • Can the model be maintained incrementally as new facts arrive/users change how much they trust other users?
  • -
  • What happens for pairs of users who don't know how much they trust each other?
  • -
-
- -
- -
- -
-

Sensitivity Analysis in Mimir

-
- -
- -
- -
-

Problem: Often there is a very large number of possible worlds.

- -

Solution: Break down possible worlds by choices.

- -

Question: Which choices have the biggest impact on a query result?

-
- -
-

Sensitivity/Influence

- -

Sensitivity analysis and explanations for robust query evaluation in probabilistic databases.
- Kanagal, Li, Deshpande (SIGMOD 2011)

- -

Tracing data errors with view-conditioned causality
- Meliou, Gatterbauer, Nath, Suciu (SIGMOD 2011)

- -
- -
-

Approach

-

Unit of Choice: Is a tuple (fact) in the source data or not?

-
    -
  1. Compute the "derivative" of the query result with respect to the probability of each source tuple.
  2. -
  3. Find the tuple that maxizes the derivative.

    -
-
- -
-

Mimir

- -

Let queries call a nondeterministic "choice" function that decides which "world" to visit.

- -

-    SELECT CASE VGTerm("A", ROWID) WHEN 1 THEN "FOO" 
-                                          ELSE "BAR" 
-           END AS A, Input.*
-    FROM Input;
-            
- -

VGTerm("A", ROWID) generates a separate value for each row.

-
- - -
-

Core Idea

-
-
You get...
-
A deterministic database
-
A non-deterministic query (and a set of tools for sampling from its outputs).
-
You produce...
-
Which "call" to the query has the biggest influence on the output.
-
-
- -
-

Things to Think About

- -
    -
  • What kind(s) of influence measures make sense?
  • -
  • How to compute influence efficiently for all tuples in parallel?
  • -
  • Early pruning: Can some influence measures be computed exactly?
  • -
-
- -
- -
- -
-

Sandboxed Python

- - - - -
- -
- - - -
- -
- - - - -
- -
- - - -
- -
- -
- -
- -
- - -
-

Core Idea

-
-
You get...
-
Python Code
-
Inputs to the code (or a socket)
-
Your system produces...
-
Output for the code... without calling out of the sandbox.
-
-
- -
-

Things to Think About

- -
    -
  • What security guarantees are you providing?
  • -
  • How can you prove to yourselves that those guarantees are enforced?
  • -
  • What tooling can you use to wrap/execute python?
  • -
-
- -
- -
-

In-Class Assignment

-
    -
  • Form a group of 3-4 people that you'll work with for the duration of the semester.
  • -
  • Come up with a clever group name (or one will be made up for you).
  • -
  • Challenge: Form a group with people you don't know or don't know well.
  • -
-
- - -
- -
- - - - - - - - diff --git a/src/teaching/cse-662/2019fa/slide/2019-09-03-FunctionalDS.erb b/src/teaching/cse-662/2019fa/slide/2019-09-03-FunctionalDS.erb index baaac8fb..9cb5c238 100644 --- a/src/teaching/cse-662/2019fa/slide/2019-09-03-FunctionalDS.erb +++ b/src/teaching/cse-662/2019fa/slide/2019-09-03-FunctionalDS.erb @@ -1,432 +1,340 @@ - - - - - - - CSE 662 - Languages and Runtimes for Big Data - - - - - - - - - - - - - - - - - - - - - - - -
- - -
-
-
-

Functional Data Structures

-
- -
-

Reminder

-

Learned Index Structures due Weds (in class)

-

One person from each group should email me...

    -
  • [CSE-662] in the subject line
  • -
  • A list of group members (Names + UBITs)
  • -
  • The project seed that your group would like to work on
  • -

-
- -
- -
-
-

Mutable vs Immutable Data

-
- -
-

-              X = [ "Alice", "Bob", "Carol", "Dave" ]
-            
- - - - - - - - - -
X : AliceBobCarolDave
- -

-              print(X[2])  // ->  "Carol"
-            
- -

-              X[2] = "Eve"
-            
- - - - - - - - - -
X : AliceBobEveDave
- -

-              print(X[2])  // -> "Eve"
-            
- -
- -
-

-              X = [ Alice, Bob, Carol, Dave ]
-            
- - - - - - - - - -
X : AliceBobCarolDave
- - - - - - - - - - - -
Thread 1Thread 2

-     X[2] = "Eve"
-                

-      print(X[2])
-                
🤔
-
- -
-

Mutable Data Structures

-
    -
  • The programmer's intended ordering is unclear.
  • -
  • Atomicity/Correctness requires locking.
  • -
  • Versioning requires copying the data structure.
  • -
  • Cache coherency is expensive.
  • -
-

Can these problems be avoided?

-
-
- -
-
-

Mutable vs Immutable Data

-
- -
-

-              X = [ "Alice", "Bob", "Carol", "Dave" ]
-            
- - - - - - - - - -
X : AliceBobCarolDave
- -

-              print(X[2])  // ->  "Carol"
-            
- -

-              X[2] = "Eve"
-            
- -

Don't allow writes!

- -

But what if we need to update the structure?

- -
- -
-

Idea 1: Copy

- - - - - - - - - -
X : AliceBobCarolDave
- - - - - - - - - -
X' : AliceBobEveDave
- -

Slooooooooooooooooooooooow!

-
- -
-

Idea 2: Break it Down

- -

Data is always added, not replaced!

-
- -
-

Immutable Data Structures -
(aka 'Functional' or 'Persistent' Data Structures)

-
    -
  • Once an object is created it never changes.
  • -
  • The object persists until all pointers to it go away, at which point it is garbage collected.
  • -
  • Only the "root" pointer is ever allowed to change, to point to a new version.
  • -
-
- -
- -
- -
-

Linked List Stacks

-
- -
- -
-
- -
-
-

Class Exercise 1

- -

How would you implement:

-

-               list update(list, index, new_value)
-            
-
- -
-

Class Exercise 2

- -

Implement a set with:

-

-                 set init()
-                 boolean is_member(set, elem)
-                 set insert(set, elem)
-            
-
-
- -
-
-

Lazy Evaluation

- - -

Can we do better?

-
- -
-

Putting off Work

- - - - - - - - - - - - - - -
x = "expensive()"
Fast
(Just saving a 'todo')
print(x)
Slow
(Performing the 'todo')
print(x)
Fast
('todo' already done)
-
- -
-

Class Exercise 3

- - -

Make it better!

-
- -
-

Putting off Work

-

-              concatenate(a, b) { 
-                a', front = pop(a)
-                if a' is empty {
-                  return (front, b)
-                } else {
-                  return (front, "concatenate(a', b)")
-                }
-              }
-            
- -

What is the time complexity of this concatenate?

-

What happens to reads?

-
- -
-

Lazy Evaluation

-

Save work for later... -

    -
  • ... and avoid work that is never requred.
  • -
  • ... to spread out work over multiple calls.
  • -
  • ... for better "amortized" costs.
  • -
-

-
-
- -
-
-

Amortized Analysis

-

Allow operation A to 'pay it forward' for another operation B that hasn't happened yet.

- -

... or allow an operation B to 'borrow' from another operation A that hasn't happened yet.

- -
    -
  • A's time complexity goes up by X.
  • -
  • B's time complexity goes down by X.
  • -
-
- -
-

Example: Amortized Queues

- - -
- -
-

Example: Amortized Queues

- -

-              queue enqueue(queue, item) {
-                return {
-                  current : queue.current, 
-                  todo : push(queue.todo, item)
-                )
-              }
-            
-

What is the cost?

-
- -
-

Example: Amortized Queues

- -

-  queue dequeue(queue) {
-    if(queue.current != NULL){
-
-      return { current: pop(queue.current), todo: queue.todo }
-
-    } else if(queue.todo != NULL) {
-      
-      return { current: reverse(queue.todo), todo: NULL }
-    
-    } else { 
-      return { current: NULL, todo: NULL }
+---
+template: templates/cse662_2019_slides.erb
+title: Functional Data Structures
+date: Sept. 3
+---
+
+
+
+

Mutable vs Immutable Data

+
+ +
+

+      X = [ "Alice", "Bob", "Carol", "Dave" ]
+    
+ + + + + + + + + +
X : AliceBobCarolDave
+ +

+      print(X[2])  // ->  "Carol"
+    
+ +

+      X[2] = "Eve"
+    
+ + + + + + + + + +
X : AliceBobEveDave
+ +

+      print(X[2])  // -> "Eve"
+    
+ +
+ +
+

+      X = [ Alice, Bob, Carol, Dave ]
+    
+ + + + + + + + + +
X : AliceBobCarolDave
+ + + + + + + + + + + +
Thread 1Thread 2

+    X[2] = "Eve"
+        

+     print(X[2])
+        
🤔
+
+ +
+

Mutable Data Structures

+
    +
  • The programmer's intended ordering is unclear.
  • +
  • Atomicity/Correctness requires locking.
  • +
  • Versioning requires copying the data structure.
  • +
  • Cache coherency is expensive.
  • +
+

Can these problems be avoided?

+
+
+ +
+
+

Mutable vs Immutable Data

+
+ +
+

+      X = [ "Alice", "Bob", "Carol", "Dave" ]
+    
+ + + + + + + + + +
X : AliceBobCarolDave
+ +

+      print(X[2])  // ->  "Carol"
+    
+ +

+      X[2] = "Eve"
+    
+ +

Don't allow writes!

+ +

But what if we need to update the structure?

+ +
+ +
+

Idea 1: Copy

+ + + + + + + + + +
X : AliceBobCarolDave
+ + + + + + + + + +
X' : AliceBobEveDave
+ +

Slooooooooooooooooooooooow!

+
+ +
+

Idea 2: Break it Down

+ +

Data is always added, not replaced!

+
+ +
+

Immutable Data Structures +
(aka 'Functional' or 'Persistent' Data Structures)

+
    +
  • Once an object is created it never changes.
  • +
  • The object persists until all pointers to it go away, at which point it is garbage collected.
  • +
  • Only the "root" pointer is ever allowed to change, to point to a new version.
  • +
+
+ +
+ +
+ +
+

Linked List Stacks

+
+ +
+ +
+
+ +
+
+

Class Exercise 1

+ +

How would you implement:

+

+       list update(list, index, new_value)
+    
+
+ +
+

Class Exercise 2

+ +

Implement a set with:

+

+         set init()
+         boolean is_member(set, elem)
+         set insert(set, elem)
+    
+
+
+ +
+
+

Lazy Evaluation

+ + +

Can we do better?

+
+ +
+

Putting off Work

+ + + + + + + + + + + + + + +
x = "expensive()"
Fast
(Just saving a 'todo')
print(x)
Slow
(Performing the 'todo')
print(x)
Fast
('todo' already done)
+
+ +
+

Class Exercise 3

+ + +

Make it better!

+
+ +
+

Putting off Work

+

+      concatenate(a, b) { 
+        a', front = pop(a)
+        if a' is empty {
+          return (front, b)
+        } else {
+          return (front, "concatenate(a', b)")
+        }
+      }
+    
+ +

What is the time complexity of this concatenate?

+

What happens to reads?

+
+ +
+

Lazy Evaluation

+

Save work for later... +

    +
  • ... and avoid work that is never requred.
  • +
  • ... to spread out work over multiple calls.
  • +
  • ... for better "amortized" costs.
  • +
+

+
+
+ +
+
+

Amortized Analysis

+

Allow operation A to 'pay it forward' for another operation B that hasn't happened yet.

+ +

... or allow an operation B to 'borrow' from another operation A that hasn't happened yet.

+ +
    +
  • A's time complexity goes up by X.
  • +
  • B's time complexity goes down by X.
  • +
+
+ +
+

Example: Amortized Queues

+ + +
+ +
+

Example: Amortized Queues

+ +

+      queue enqueue(queue, item) {
+        return {
+          current : queue.current, 
+          todo : push(queue.todo, item)
+        )
+      }
+    
+

What is the cost?

+
+ +
+

Example: Amortized Queues

+ +

+    queue dequeue(queue) {
+      if(queue.current != NULL){
+
+        return { current: pop(queue.current), todo: queue.todo }
+
+      } else if(queue.todo != NULL) {
+
+        return { current: reverse(queue.todo), todo: NULL }
+
+      } else { 
+
+        return { current: NULL, todo: NULL }
+
+      }
     }
-  }
-            
-

What is the cost?

-
+
+

What is the cost?

+
-
-

Example: Amortized Analysis

-
-
enqueue(): Push onto todo stack
-
$O(1) + \text{create } 1 \text{ credit}$
+
+

Example: Amortized Analysis

+
+
enqueue(): Push onto todo stack
+
$O(1) + \text{create } 1 \text{ credit}$
-
dequeue(): Pop current OR Reverse todo
-
Either:
    -
  • Pop current queue: $O(1)$
  • -
  • Reverse stack: $O(1) + \text{consume } N \text{ credits}$
  • -
-
+
dequeue(): Pop current OR Reverse todo
+
Either:
    +
  • Pop current queue: $O(1)$
  • +
  • Reverse stack: $O(1) + \text{consume } N \text{ credits}$
  • +
+
-

Critical requirement of amortized analysis: Must ensure that every credit consumed is created.

-
-
- -
- -
- - - - - - - - +

Critical requirement of amortized analysis: Must ensure that every credit consumed was once created.

+ + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-AmortizedQueue.svg b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-AmortizedQueue.svg new file mode 100644 index 00000000..f6b14dc2 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-AmortizedQueue.svg @@ -0,0 +1,529 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + aq + + + + + + + + + + + + + + + + + + + + 5 + 4 + 3 + + + Active Queue + Todo Stack + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStack.svg b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStack.svg new file mode 100644 index 00000000..9ffe0265 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStack.svg @@ -0,0 +1,1028 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + xs + + + + + + + + + + + + + + + + + + 3 + 4 + 5 + ys + + + xs = pop(xs) + + + + + + + + + + + + + + + + 0 + 1 + 2 + xs + + + + + + ys = push(ys,1) + + + + + + + + + + + + + + + + 3 + 4 + 5 + ys + + + + + + + + + + + 6 + + zs = append(xs,ys) + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + zs + + + + + + + + + + + + + + + + + + 3 + 4 + 5 + + + + + + This entire part needs to be copied + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStackMerge.svg b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStackMerge.svg new file mode 100644 index 00000000..f5fea9a6 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalStackMerge.svg @@ -0,0 +1,650 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + xs + + + + + + + + + + + + + + + + + + 3 + 4 + 5 + ys + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + zs + + + + diff --git a/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalTreeInsertion.svg b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalTreeInsertion.svg new file mode 100644 index 00000000..364f8d11 --- /dev/null +++ b/src/teaching/cse-662/2019fa/slide/graphics/2018-08-31-FunctionalTreeInsertion.svg @@ -0,0 +1,362 @@ + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + Eve + + + + + + + + + + + + + + + + + + + + + Alice + + + + Bob + + + + + + + + + + + + + Carol + + + + Dave + + + + + + + + + +