Sustainable Data On-boarding

Images from OpenClipArt

50-80% of a Data Scientist's time is spent on curation

- NY Times

https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html

Bad Data Quality costs

$3.1 Trillion

per year in the US alone

- IBM

https://www.ibmbigdatahub.com/infographic/four-vs-big-data

NYS Open Data Portal

NYS Open Data Portal: Causes of Death in NYC 2008-2014
NYS Open Data Portal: Causes of Death in NYC 2008-2016
VizierDB
  • Explore
  • Validate
  • Curate
  • Audit
  • Reuse

Vizier Tracks Data Bugs

  • Recoverable Errors
  • Automatic Data Warnings
  • User-Provided Unit Tests

Error Alerts in the Data

Fixing Errors

  • Python
  • SQL
  • Scala
  • Automatic Suggestions
  • Spreadsheets

Spreadsheet View

Easily Fix One-Off Errors

Edits are tracked and versioned to make auditing and debugging easy

Available Now

  • Cloud Deployment
  • On-Prem with Support
  • Workflow Appliance
VizierDB
  • Explore
  • Validate
  • Curate
  • Audit
  • Reuse

https://vizierdb.info

Demo