A few publicity pieces (including a backup copy of the google doc

master
Oliver Kennedy 2017-01-06 10:23:41 -05:00
parent 48548657a1
commit 0dc202f103
2 changed files with 19 additions and 0 deletions

View File

@ -0,0 +1,12 @@
For years, companies like Google, Microsoft, and Apple have taken advantage of the power of Big Data. That same power is slowly being democratized. In many countries including the US, government agencies have public disclosure requirements and these disclosures are increasingly being published on the web. Open Data Portals for NYC (https://nycopendata.socrata.com/), Boston (https://www.boston.gov/), NYS (https://data.ny.gov/), and the federal government's Data.Gov (https://www.data.gov/) make it possible for anyone to download and ask questions about their government.
In New York City, ordinary individuals have used this information to show evidence of police bias when writing parking tickets (https://www.inverse.com/article/15564-how-new-york-city-s-open-data-revealed-the-nypd-was-issuing-illegal-parking-tickets). Groups like the New York University's Center for Urban Science and Progress (http://cusp.nyu.edu/) are using it to better understand effects of gentrification like the availability of taxi service.
Unfortunately, a lot of that data is messy. Data entry errors, GPS units without signal, malfunctioning computers all contribute to outliers that could seriously mislead someone who tried to use the dataset. For example, a study on NYC taxi data (https://pdfs.semanticscholar.org/a20d/44cb70c3321df06ebc89a5273302c403d341.pdf) discovered data errors like negative fares and a 16 million mile trip, all of which could lead someone asking questions about the data to very misleading conclusions.
A $2.7 grant awarded jointly to the University at Buffalo, New York University, and the Illinois Institute of Technology aims to change all that. The universities will join forces to design a new tool called Vizier that will help users to understand data errors as they explore and ask questions. In Vizier, users will be able to work with large data in the same way that they'd work with a spreadsheet, quickly creating visualizations, deriving results and correcting errors. However, Vizier also pays attention: "Our goal is to create a tool that lets you work with the data you have, but that unobtrusively makes helpful observations like 'Hmm... have you noticed that two out of a million records make a 10% difference in this average?'" says Oliver Kennedy, Lead PI on the grant. If successful, Vizier will make it easier than ever for ordinary people to work with open data.

View File

@ -0,0 +1,7 @@
The ubiquity of data has greatly impacted our personal and professional lives. There are sensors everywhere: From smartphones, to smart watches, to smart homes, smart roads, and even smart factories, sensors monitor a significant portion of our daily lives. Open government regulations put statistics like health code violations and legislative decisionmaking within reach of an average person. Today, this data is used by doctors, sociologists, business owners, and even ordinary citizens trying to improve their communities.
Unfortunately, using such data to answer simple questions like “Where do police issue the most traffic tickets?” or “What am I doing when my heart rate goes over 90 bpm?” is still hard. The data might be available, but this does not imply that it is “fit for use” - it may exhibit errors, inconsistencies, and other data quality problems. For example, your smartwatch might report a heart rate of 9999 if it has no data (for example if you took your watch off). Data errors like this are everywhere, and need to be resolved to ensure that analysis results are correct. In corporate settings, analysts will spend days, weeks, or even months “cleaning” their data even before asking a single question.
As part of a joint effort between the University at Buffalo, Illinois Institute of Technology and New York University, researchers are working to make data cleaning and wrangling easier. The newly formed Vizier project will streamline the data curation process, making it easier and faster to explore and analyze raw data. Vizier provides data scientists with an end-to-end solution that guides them in all stages of data cleaning and analysis. The three-university coalitions efforts begin with a $2.7 million grant from the NSF to develop new software for data exploration, cleaning, curation, and visualization.