Website/src/grants/2016-OracleIntuitiveDataInt...

14 lines
1.9 KiB
Plaintext

{\rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf340
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\b\fs24 \cf0 Dynamic Relational Views for In-Situ Hierarchical Data
\b0 \
\
The Mimir project has been working to limit the need for up-front data wrangling by enabling a deferred, \'91On-Demand\'92 form of data curation that allows analysts to defer data wrangling costs until they are actually required. We introduced \'91lenses\'92, a relational operator that applies a data cleaning heuristic (e.g., data interpolation), but requires little to no
\i upfront
\i0 configuration or validation. Instead of asking the user to manually perform data wrangling tasks before analyzing her data, the Mimir system makes a best-effort heuristic guess. Mimir annotates the output of lenses with provenance markers that track what query results are affected by these guesses, and how big of an effect the guess has. \
\
Our initial efforts focused on the data cleaning aspects of curation: repairing missing values, data interpolation, and related issues. Our next target is In-Situ query processing, particularly over nested data models like JSON, XML, and Graph Data if time permits. The primary challenges in this setting are identifying a relevant set of relations to extract from the JSON/XML object, assigning appropriate schemas to these relations, and unifying data from JSON/XML documents from different sources. Signals that can be considered are existing relationships in the data (parent/child relationships, edges), as well as information drawn from queries (which assume specific schemas from source data). Our goal is to create a 360 degree lens that implements \'91best-guess zero-configuration\'92 query evaluation over nested data models. }