CSE 662 - Languages and Runtimes for Big Data

Functional Data Structures

Reminder

Learned Index Structures due Weds (in class)

One person from each group should email me...

[CSE-662] in the subject line
A list of group members (Names + UBITs)
The project seed that your group would like to work on

Mutable vs Immutable Data


              X = [ "Alice", "Bob", "Carol", "Dave" ]

X :

Alice

Bob

Carol

Dave


              print(X[2])  // ->  "Carol"


              X[2] = "Eve"

X :

Alice

Bob

Eve

Dave


              print(X[2])  // -> "Eve"


              X = [ Alice, Bob, Carol, Dave ]

X :

Alice

Bob

Carol

Dave

Thread 1	Thread 2
`X[2] = "Eve"`	`print(X[2])`
	🤔

Mutable Data Structures

The programmer's intended ordering is unclear.
Atomicity/Correctness requires locking.
Versioning requires copying the data structure.
Cache coherency is expensive.

Can these problems be avoided?

Mutable vs Immutable Data


              X = [ "Alice", "Bob", "Carol", "Dave" ]

X :

Alice

Bob

Carol

Dave


              print(X[2])  // ->  "Carol"


              X[2] = "Eve"

Don't allow writes!

But what if we need to update the structure?

Idea 1: Copy

X :

Alice

Bob

Carol

Dave

X' :

Alice

Bob

Eve

Dave

Slooooooooooooooooooooooow!

Idea 2: Break it Down

Data is always added, not replaced!

Immutable Data Structures
(aka 'Functional' or 'Persistent' Data Structures)

Once an object is created it never changes.
The object persists until all pointers to it go away, at which point it is garbage collected.
Only the "root" pointer is ever allowed to change, to point to a new version.

Linked List Stacks

Class Exercise 1

How would you implement:


               list update(list, index, new_value)

Class Exercise 2

Implement a set with:


                 set init()
                 boolean is_member(set, elem)
                 set insert(set, elem)

Lazy Evaluation

Can we do better?

Putting off Work

x = "expensive()"	Fast (Just saving a 'todo')
print(x)	Slow (Performing the 'todo')
print(x)	Fast ('todo' already done)

Class Exercise 3

Make it better!

Putting off Work


              concatenate(a, b) { 
                a', front = pop(a)
                if a' is empty {
                  return (front, b)
                } else {
                  return (front, "concatenate(a', b)")
                }
              }

What is the time complexity of this concatenate?

What happens to reads?

Lazy Evaluation

Save work for later...

... and avoid work that is never requred.
... to spread out work over multiple calls.
... for better "amortized" costs.

Amortized Analysis

Allow operation A to 'pay it forward' for another operation B that hasn't happened yet.

... or allow an operation B to 'borrow' from another operation A that hasn't happened yet.

A's time complexity goes up by X.
B's time complexity goes down by X.

Example: Amortized Queues


              queue enqueue(queue, item) {
                return {
                  current : queue.current, 
                  todo : push(queue.todo, item)
                )
              }

What is the cost?

Example: Amortized Queues


  queue dequeue(queue) {
    if(queue.current != NULL){

      return { current: pop(queue.current), todo: queue.todo }

    } else if(queue.todo != NULL) {
      
      return { current: reverse(queue.todo), todo: NULL }
    
    } else { 
      return { current: NULL, todo: NULL }
    }
  }

What is the cost?

Example: Amortized Analysis

enqueue(): Push onto todo stack

$O(1) + \text{create } 1 \text{ credit}$

dequeue(): Pop current OR Reverse todo

Either:

Pop current queue: $O(1)$
Reverse stack: $O(1) + \text{consume } N \text{ credits}$

Critical requirement of amortized analysis: Must ensure that every credit consumed is created.

Functional Data Structures

Reminder

Mutable Data Structures

Idea 1: Copy

Idea 2: Break it Down

Immutable Data Structures (aka 'Functional' or 'Persistent' Data Structures)

Linked List Stacks

Class Exercise 1

Class Exercise 2

Lazy Evaluation

Putting off Work

Class Exercise 3

Putting off Work

Lazy Evaluation

Amortized Analysis

Example: Amortized Queues

Example: Amortized Queues

Example: Amortized Queues

Example: Amortized Analysis

Immutable Data Structures
(aka 'Functional' or 'Persistent' Data Structures)