Sustainable Data On-boarding
Images from
OpenClipArt
50-80% of a Data Scientist's time is spent on
curation
- NY Times
https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
Bad Data Quality costs
$3.1 Trillion
per year in the US alone
- IBM
https://www.ibmbigdatahub.com/infographic/four-vs-big-data
NYS Open Data Portal
NYS Open Data Portal: Causes of Death in NYC 2008-2014
NYS Open Data Portal: Causes of Death in NYC 2008-
2016
VizierDB
Explore
Validate
Curate
Audit
Reuse
Vizier Tracks Data Bugs
Recoverable Errors
Automatic Data Warnings
User-Provided Unit Tests
Error Alerts in the Data
Fixing Errors
Python
SQL
Scala
Automatic Suggestions
Spreadsheets
Spreadsheet View
Easily Fix One-Off Errors
Edits are tracked and versioned to make auditing and debugging easy
Available Now
Cloud Deployment
On-Prem with Support
Workflow Appliance
VizierDB
Explore
Validate
Curate
Audit
Reuse
https://vizierdb.info
Demo