Google: Databases suck! Use Map/Reduce Instead
Yahoo: Our Map/Reduce implementation is open source
A function that reads in one row and returns any number of rows.
A function that reads in one row and returns one row.
A function that reads in one row and returns true (keep) or false (toss).
RDDs with Schemas: Every row has a set of attributes and all of the records have the same attributes.