documentation/publicity/VizierOpenData.txt

13 lines
2.4 KiB
Plaintext

For years, companies like Google, Microsoft, and Apple have taken advantage of the power of Big Data. That same power is slowly being democratized. In many countries including the US, government agencies have public disclosure requirements and these disclosures are increasingly being published on the web. Open Data Portals for NYC (https://nycopendata.socrata.com/), Boston (https://www.boston.gov/), NYS (https://data.ny.gov/), and the federal government's Data.Gov (https://www.data.gov/) make it possible for anyone to download and ask questions about their government.
In New York City, ordinary individuals have used this information to show evidence of police bias when writing parking tickets (https://www.inverse.com/article/15564-how-new-york-city-s-open-data-revealed-the-nypd-was-issuing-illegal-parking-tickets). Groups like the New York University's Center for Urban Science and Progress (http://cusp.nyu.edu/) are using it to better understand effects of gentrification like the availability of taxi service.
Unfortunately, a lot of that data is messy. Data entry errors, GPS units without signal, malfunctioning computers all contribute to outliers that could seriously mislead someone who tried to use the dataset. For example, a study on NYC taxi data (https://pdfs.semanticscholar.org/a20d/44cb70c3321df06ebc89a5273302c403d341.pdf) discovered data errors like negative fares and a 16 million mile trip, all of which could lead someone asking questions about the data to very misleading conclusions.
A $2.7 grant awarded jointly to the University at Buffalo, New York University, and the Illinois Institute of Technology aims to change all that. The universities will join forces to design a new tool called Vizier that will help users to understand data errors as they explore and ask questions. In Vizier, users will be able to work with large data in the same way that they'd work with a spreadsheet, quickly creating visualizations, deriving results and correcting errors. However, Vizier also pays attention: "Our goal is to create a tool that lets you work with the data you have, but that unobtrusively makes helpful observations like 'Hmm... have you noticed that two out of a million records make a 10% difference in this average?'" says Oliver Kennedy, Lead PI on the grant. If successful, Vizier will make it easier than ever for ordinary people to work with open data.