documentation/publicity/VizierPublicitySummary.txt

The ubiquity of data has greatly impacted our personal and professional lives. There are sensors everywhere: From smartphones, to smart watches, to smart homes, smart roads, and even smart factories, sensors monitor a significant portion of our daily lives.  Open government regulations put statistics like health code violations and legislative decisionmaking within reach of an average person.  Today, this data is used by doctors, sociologists, business owners, and even ordinary citizens trying to improve their communities.


Unfortunately, using such data to answer simple questions like “Where do police issue the most traffic tickets?” or “What am I doing when my heart rate goes over 90 bpm?” is still hard.  The data might be available, but this does not imply that it is “fit for use” - it may exhibit errors, inconsistencies, and other data quality problems. For example, your smartwatch might report a heart rate of 9999 if it has no data (for example if you took your watch off).  Data errors like this are everywhere, and need to be resolved to ensure that analysis results are correct.  In corporate settings, analysts will spend days, weeks, or even months “cleaning” their data even before asking a single question.


As part of a joint effort between the University at Buffalo, Illinois Institute of Technology and New York University, researchers are working to make data cleaning and wrangling easier.  The newly formed Vizier project will streamline the data curation process, making it easier and faster to explore and analyze raw data. Vizier provides data scientists with an end-to-end solution that guides them in all stages of data cleaning and analysis.  The three-university coalition’s efforts begin with a $2.7 million grant from the NSF to develop new software for data exploration, cleaning, curation, and visualization.