diff --git a/slides/talks/2018-3-DBSpark/graphics/FullText-white.png b/slides/talks/2018-3-DBSpark/graphics/FullText-white.png new file mode 100644 index 00000000..b3f42b46 Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/FullText-white.png differ diff --git a/slides/talks/2018-3-DBSpark/graphics/ferrari.jpg b/slides/talks/2018-3-DBSpark/graphics/ferrari.jpg new file mode 100644 index 00000000..61e9594e Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/ferrari.jpg differ diff --git a/slides/talks/2018-3-DBSpark/graphics/hadoop.png b/slides/talks/2018-3-DBSpark/graphics/hadoop.png new file mode 100644 index 00000000..995a8701 Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/hadoop.png differ diff --git a/slides/talks/2018-3-DBSpark/graphics/hadoopVSdbs.svg b/slides/talks/2018-3-DBSpark/graphics/hadoopVSdbs.svg new file mode 100644 index 00000000..adbc8a09 --- /dev/null +++ b/slides/talks/2018-3-DBSpark/graphics/hadoopVSdbs.svg @@ -0,0 +1,19834 @@ + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + vs + + diff --git a/slides/talks/2018-3-DBSpark/graphics/mapreduce.pdf b/slides/talks/2018-3-DBSpark/graphics/mapreduce.pdf new file mode 100644 index 00000000..1ed1aeec Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/mapreduce.pdf differ diff --git a/slides/talks/2018-3-DBSpark/graphics/mapreduce.png b/slides/talks/2018-3-DBSpark/graphics/mapreduce.png new file mode 100644 index 00000000..bcf88d58 Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/mapreduce.png differ diff --git a/slides/talks/2018-3-DBSpark/graphics/spark.png b/slides/talks/2018-3-DBSpark/graphics/spark.png new file mode 100644 index 00000000..6b9c939f Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/spark.png differ diff --git a/slides/talks/2018-3-DBSpark/graphics/sparkstack.svg b/slides/talks/2018-3-DBSpark/graphics/sparkstack.svg new file mode 100644 index 00000000..c0c65ff8 --- /dev/null +++ b/slides/talks/2018-3-DBSpark/graphics/sparkstack.svg @@ -0,0 +1,255 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + Text Files + + + + RDDs + + + + + Hadoop + + + + Relational DB + + + + + Data Frames + + + + Raw Access + + + + + SparkSQL + + + + SparkML + + + + + diff --git a/slides/talks/2018-3-DBSpark/graphics/tank.jpg b/slides/talks/2018-3-DBSpark/graphics/tank.jpg new file mode 100644 index 00000000..ae4eaf89 Binary files /dev/null and b/slides/talks/2018-3-DBSpark/graphics/tank.jpg differ diff --git a/slides/talks/2018-3-DBSpark/index.html b/slides/talks/2018-3-DBSpark/index.html new file mode 100644 index 00000000..11292577 --- /dev/null +++ b/slides/talks/2018-3-DBSpark/index.html @@ -0,0 +1,287 @@ + + + + + + + Spark + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ + Spark +
+ + +
+ +
+
+

Spark

+

(NoSQL, but with SQL)

+
+ +
+

First a little history

+ +
+
+
Early-Mid 1900s
+
Computers used for tabulating data
+
+
+
1970s
+
Relational model, Postgres, System-R, Oracle, DB2
+
+
+
1980
+
Lotus, dBase
+
+
+
1990s
+
Object/Object-Relational Databases, Distributed Databases
+
+
+
2000s
+
The Dark Ages...
+
+
+
+ +
+

Google: Databases suck! Use Map/Reduce Instead

+ +
+ +
+

Yahoo: Our Map/Reduce implementation is open source

+ +
+ +
+

The Good

+
    +
  • Programmer-Friendly Language
  • +
  • Distributed-Computing-Friendly Metaphors
  • +
  • Extremely Resilient Runtime
  • +
+
+ +
+

The Bad

+
    +
  • Programmer-FriendlyNon-Declarative Language
  • +
  • Distributed-Computing-FriendlyProgrammer-Hostile Metaphors
  • +
  • Extremely ResilientSlow Runtime
  • +
+
+ +
+ +
+ +
+ +
+
+ +
+ +
+

Key Features

+
    +
  • High-performance resilience.
  • +
  • Use of metaphors to extract parallelism.
  • +
  • Lots of metaphors for distributed programming.
  • +
  • If you can do it in { Scala, Python, Java, R }, you can do it in Spark.
  • +
  • If you know SQL and { Scala, Python, Java, R }, you know Spark
  • +
+
+ +
+ +
+
+ +
+
+

Resilient Distributed Data Structures (RDDs)

+ +
+
+
Read-Only
+
You can't insert, update, or modify rows...
+
+ +
+
Transformable
+
... but you can create (cheaply) new RDDs by modifying existing RDDs.
+
+ +
+
Opaque
+
Spark just sees a bunch of rows. It doesn't know how to interpret them.
+
+ +
+
Lazy
+
Spark saves how to construct an RDD, but waits to actually do so.
+
+ +
+
Distributed
+
When Spark constructs an RDD, it automatically assigns rows to workers.
+
+
+
+ +
+

Where do RDDs come from

+ +
    +
  • Call "parallelize" on a { Scala, Python, Java, R } array/collection
  • +
  • Load a text file from disk or HDFS (1 row per line).
  • +
  • Load a database table (1 row per row).
  • +
  • Transform (map, flatMap, filter) an existing RDD.
  • +
+
+ +
+
+

FlatMap?

+

A function that reads in one row and returns any number of rows.

+
+
+

Map?

+

A function that reads in one row and returns one row.

+
+
+

Filter?

+

A function that reads in one row and returns true (keep) or false (toss).

+
+
+ +
+

Resilient Distributed Data Structures (RDDs)

+ +
+
+
Read-Only
+
You can't insert, update, or modify rows...
+
+ +
+
Transformable
+
... but you can create (cheaply) new RDDs by modifying existing RDDs.
+
+ +
+
Opaque
+
Spark just sees a bunch of rows. It doesn't know how to interpret them.
+
+ +
+
Lazy
+
Spark saves how to construct an RDD, but waits to actually do so.
+
+ +
+
Distributed
+
When Spark constructs an RDD, it automatically assigns rows to workers.
+
+
+
+
+ +
+
+

DataFrames

+ +

RDDs with Schemas: Every row has a set of attributes and all of the records have the same attributes.

+
+ +
+

Demo

+
+
+
+ +
+ + + + + + + + diff --git a/slides/talks/2018-3-DBSpark/ubodin.css b/slides/talks/2018-3-DBSpark/ubodin.css new file mode 100644 index 00000000..379adcd9 --- /dev/null +++ b/slides/talks/2018-3-DBSpark/ubodin.css @@ -0,0 +1,369 @@ +@font-face { + font-family: 'News Cycle'; + font-style: normal; + font-weight: 400; + src: local('News Cycle'), local('NewsCycle'), url(../reveal.js-3.1.0/fonts/9Xe8dq6pQDsPyVH2D3tMQsDdSZkkecOE1hvV7ZHvhyU.ttf) format('truetype'); +} +@font-face { + font-family: 'News Cycle'; + font-style: normal; + font-weight: 700; + src: local('News Cycle Bold'), local('NewsCycle-Bold'), url(../reveal.js-3.1.0/fonts/G28Ny31cr5orMqEQy6ljt8BaWKZ57bY3RXgXH6dOjZ0.ttf) format('truetype'); +} +@font-face { + font-family: 'Lato'; + font-style: normal; + font-weight: 400; + src: local('Lato Regular'), local('Lato-Regular'), url(../reveal.js-3.1.0/fonts/1EqTbJWOZQBfhZ0e3RL9uvesZW2xOQ-xsNqO47m55DA.ttf) format('truetype'); +} +@font-face { + font-family: 'Lato'; + font-style: normal; + font-weight: 700; + src: local('Lato Bold'), local('Lato-Bold'), url(../reveal.js-3.1.0/fonts/MZ1aViPqjfvZwVD_tzjjkwLUuEpTyoUstqEm5AMlJo4.ttf) format('truetype'); +} +@font-face { + font-family: 'Lato'; + font-style: italic; + font-weight: 400; + src: local('Lato Italic'), local('Lato-Italic'), url(../reveal.js-3.1.0/fonts/61V2bQZoWB5DkWAUJStypevvDin1pK8aKteLpeZ5c0A.ttf) format('truetype'); +} +@font-face { + font-family: 'Lato'; + font-style: italic; + font-weight: 700; + src: local('Lato Bold Italic'), local('Lato-BoldItalic'), url(../reveal.js-3.1.0/fonts/HkF_qI1x_noxlxhrhMQYECZ2oysoEQEeKwjgmXLRnTc.ttf) format('truetype'); +} + + + +/**@import url(https://fonts.googleapis.com/css?family=News+Cycle:400,700); +@import url(https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic); +**/ +/** + * A simple theme for reveal.js presentations, similar + * to the default theme. The accent color is darkblue. + * + * This theme is Copyright (C) 2012 Owen Versteeg, https://github.com/StereotypicalApps. It is MIT licensed. + * reveal.js is Copyright (C) 2011-2012 Hakim El Hattab, http://hakim.se + */ +/********************************************* + * GLOBAL STYLES + *********************************************/ +body { + background: #fff; + background-color: #fff; } + +.reveal { + font-family: 'Lato', sans-serif; + font-size: 36px; + font-weight: normal; + color: #000; } + +::selection { + color: #fff; + background: rgba(0, 0, 0, 0.99); + text-shadow: none; } + +.reveal .slides > section, .reveal .slides > section > section { + line-height: 1.3; + font-weight: inherit; } + +/********************************************* + * STATIC HEADER/FOOTER + *********************************************/ + +.reveal .header { + position: absolute; + top: 0px; + left: 0px; + right: 0px; + height: 25px; + text-align: center; + padding-left: 15px; + padding-right: 15px; + padding-bottom: 10px; + padding-top: 15px; + background-color: #041a9b; + color: white; + font-size: 0.5em; + z-index: 100; +} +.reveal .footer { + position: absolute; + bottom: 0px; + left: 0px; + right: 0px; + height: 40px; + text-align: center; + padding-left: 15px; + padding-right: 15px; + padding-bottom: 10px; + padding-top: 20px; + background-color: #041a9b; + color: white; + font-size: 0.5em; + z-index: 100; +} + + +/********************************************* + * HEADERS + *********************************************/ +.reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal h5, .reveal h6 { + margin: 0 0 20px 0; + color: #000; + font-family: 'News Cycle', Impact, sans-serif; + font-weight: normal; + line-height: 1.2; + letter-spacing: normal; + text-transform: none; + text-shadow: none; + word-wrap: break-word; } + +.reveal h1 { + font-size: 3.77em; } + +.reveal h2 { + font-size: 2.11em; } + +.reveal h3 { + font-size: 1.55em; } + +.reveal h4 { + font-size: 1em; } + +.reveal h1 { + text-shadow: none; } + +/********************************************* + * OTHER + *********************************************/ +.reveal p { + margin: 20px 0; + line-height: 1.3; } + +.reveal imagecredits { + font-size: 12pt; + position: absolute; + right: -10px; + bottom: -10px; + text-align: right; +} +.reveal citation { + font-size: 12pt; + position: absolute; + right: -10px; + bottom: -10px; + text-align: right; +} + +/* Ensure certain elements are never larger than the slide itself */ +.reveal img, .reveal video, .reveal iframe { + max-width: 95%; + max-height: 95%; } + +.reveal strong, .reveal b { + font-weight: bold; } + +.reveal em { + font-style: italic; } + +.reveal ol, .reveal dl, .reveal ul { + display: inline-block; + text-align: left; + margin: 0 0 0 1em; } + +.reveal ol { + list-style-type: decimal; } + +.reveal ul { + list-style-type: disc; } + +.reveal ul > li { + margin-top: 20px; } + +.reveal ul ul { + list-style-type: square; } + +.reveal ul ul ul { + list-style-type: circle; } + +.reveal ul ul, .reveal ul ol, .reveal ol ol, .reveal ol ul { + display: block; + margin-left: 40px; } + +.reveal dt { + font-weight: bold; } + +.reveal dd { + margin-left: 40px; } + +.reveal q, .reveal blockquote { + quotes: none; } + +.reveal blockquote { + display: block; + position: relative; + width: 70%; + margin: 20px auto; + padding: 5px; + font-style: italic; + background: rgba(255, 255, 255, 0.05); + box-shadow: 0px 0px 2px rgba(0, 0, 0, 0.2); } + +.reveal blockquote p:first-child, .reveal blockquote p:last-child { + display: inline-block; } + +.reveal q { + font-style: italic; } + +.reveal pre { + display: block; + position: relative; + width: 90%; + margin: 20px auto; + text-align: left; + font-size: 0.55em; + font-family: monospace; + line-height: 1.2em; + word-wrap: break-word; + box-shadow: 0px 0px 6px rgba(0, 0, 0, 0.3); } + +.reveal code { + font-family: monospace; +} + +.reveal pre code { + display: block; + padding: 5px; + overflow: auto; + max-height: 400px; + word-wrap: normal; + background: #3F3F3F; + color: #DCDCDC; } + +.reveal table { + margin: auto; + border-collapse: collapse; + border-spacing: 0; } + +.reveal table th { + font-weight: bold; + border-bottom: 1px solid; } + +.reveal table th, .reveal table td { + text-align: center; + padding: 0.2em 0.5em 0.2em 0.5em;} + +.reveal table th[align="left"], .reveal table td[align="left"] { + text-align: left; } + +.reveal table th[align="right"], .reveal table td[align="right"] { + text-align: right; } + +.reveal table tr:last-child td { + border-bottom: none; } + +.reveal sup { + vertical-align: super; } + +.reveal sub { + vertical-align: sub; } + +.reveal small { + display: inline-block; + font-size: 0.6em; + line-height: 1.2em; + vertical-align: top; } + +.reveal small * { + vertical-align: top; } + +/********************************************* + * LINKS + *********************************************/ +.reveal a { + color: #00008B; + text-decoration: none; + -webkit-transition: color 0.15s ease; + -moz-transition: color 0.15s ease; + transition: color 0.15s ease; } + +.reveal a:hover { + color: #0000f1; + text-shadow: none; + border: none; } + +.reveal .roll span:after { + color: #fff; + background: #00003f; } + +/********************************************* + * IMAGES + *********************************************/ +.reveal section img { + margin: 15px 0px; + background: rgba(255, 255, 255, 0.12); +} + +.reveal section img.bordered +{ + border: 4px solid #000; + box-shadow: 0 0 10px rgba(0, 0, 0, 0.15); +} + +.reveal a img { + -webkit-transition: all 0.15s linear; + -moz-transition: all 0.15s linear; + transition: all 0.15s linear; } + +.reveal a:hover img { + background: rgba(255, 255, 255, 0.2); + border-color: #00008B; + box-shadow: 0 0 20px rgba(0, 0, 0, 0.55); } + +/********************************************* + * NAVIGATION CONTROLS + *********************************************/ +.reveal .controls div.navigate-left, .reveal .controls div.navigate-left.enabled { + border-right-color: #00008B; } + +.reveal .controls div.navigate-right, .reveal .controls div.navigate-right.enabled { + border-left-color: #00008B; } + +.reveal .controls div.navigate-up, .reveal .controls div.navigate-up.enabled { + border-bottom-color: #00008B; } + +.reveal .controls div.navigate-down, .reveal .controls div.navigate-down.enabled { + border-top-color: #00008B; } + +.reveal .controls div.navigate-left.enabled:hover { + border-right-color: #0000f1; } + +.reveal .controls div.navigate-right.enabled:hover { + border-left-color: #0000f1; } + +.reveal .controls div.navigate-up.enabled:hover { + border-bottom-color: #0000f1; } + +.reveal .controls div.navigate-down.enabled:hover { + border-top-color: #0000f1; } + +/********************************************* + * PROGRESS BAR + *********************************************/ +.reveal .progress { + background: rgba(0, 0, 0, 0.2); } + +.reveal .progress span { + background: #00008B; + -webkit-transition: width 800ms cubic-bezier(0.26, 0.86, 0.44, 0.985); + -moz-transition: width 800ms cubic-bezier(0.26, 0.86, 0.44, 0.985); + transition: width 800ms cubic-bezier(0.26, 0.86, 0.44, 0.985); } + +/********************************************* + * SLIDE NUMBER + *********************************************/ +.reveal .slide-number { + color: #00008B; }