Functional Data Structures

Reminder

Learned Index Structures due Weds (in class)

One person from each group should email me...

  • [CSE-662] in the subject line
  • A list of group members (Names + UBITs)
  • The project seed that your group would like to work on

Mutable vs Immutable Data


              X = [ "Alice", "Bob", "Carol", "Dave" ]
            
X : Alice Bob Carol Dave

              print(X[2])  // ->  "Carol"
            

              X[2] = "Eve"
            
X : Alice Bob Eve Dave

              print(X[2])  // -> "Eve"
            

              X = [ Alice, Bob, Carol, Dave ]
            
X : Alice Bob Carol Dave
Thread 1 Thread 2

     X[2] = "Eve"
                

      print(X[2])
                
🤔

Mutable Data Structures

  • The programmer's intended ordering is unclear.
  • Atomicity/Correctness requires locking.
  • Versioning requires copying the data structure.
  • Cache coherency is expensive.

Can these problems be avoided?

Mutable vs Immutable Data


              X = [ "Alice", "Bob", "Carol", "Dave" ]
            
X : Alice Bob Carol Dave

              print(X[2])  // ->  "Carol"
            

              X[2] = "Eve"
            

Don't allow writes!

But what if we need to update the structure?

Idea 1: Copy

X : Alice Bob Carol Dave
X' : Alice Bob Eve Dave

Slooooooooooooooooooooooow!

Idea 2: Break it Down

Data is always added, not replaced!

Immutable Data Structures
(aka 'Functional' or 'Persistent' Data Structures)

  • Once an object is created it never changes.
  • The object persists until all pointers to it go away, at which point it is garbage collected.
  • Only the "root" pointer is ever allowed to change, to point to a new version.

Linked List Stacks

Class Exercise 1

How would you implement:


               list update(list, index, new_value)
            

Class Exercise 2

Implement a set with:


                 set init()
                 boolean is_member(set, elem)
                 set insert(set, elem)
            

Lazy Evaluation

Can we do better?

Putting off Work

x = "expensive()"
Fast
(Just saving a 'todo')
print(x)
Slow
(Performing the 'todo')
print(x)
Fast
('todo' already done)

Class Exercise 3

Make it better!

Putting off Work


              concatenate(a, b) { 
                a', front = pop(a)
                if a' is empty {
                  return (front, b)
                } else {
                  return (front, "concatenate(a', b)")
                }
              }
            

What is the time complexity of this concatenate?

What happens to reads?

Lazy Evaluation

Save work for later...

  • ... and avoid work that is never requred.
  • ... to spread out work over multiple calls.
  • ... for better "amortized" costs.

Amortized Analysis

Allow operation A to 'pay it forward' for another operation B that hasn't happened yet.

... or allow an operation B to 'borrow' from another operation A that hasn't happened yet.

  • A's time complexity goes up by X.
  • B's time complexity goes down by X.

Example: Amortized Queues

Example: Amortized Queues


              queue enqueue(queue, item) {
                return {
                  current : queue.current, 
                  todo : push(queue.todo, item)
                )
              }
            

What is the cost?

Example: Amortized Queues


  queue dequeue(queue) {
    if(queue.current != NULL){

      return { current: pop(queue.current), todo: queue.todo }

    } else if(queue.todo != NULL) {
      
      return { current: reverse(queue.todo), todo: NULL }
    
    } else { 
      return { current: NULL, todo: NULL }
    }
  }
            

What is the cost?

Example: Amortized Analysis

enqueue(): Push onto todo stack
$O(1) + \text{create } 1 \text{ credit}$
dequeue(): Pop current OR Reverse todo
Either:
  • Pop current queue: $O(1)$
  • Reverse stack: $O(1) + \text{consume } N \text{ credits}$

Critical requirement of amortized analysis: Must ensure that every credit consumed is created.