Website/src/research/mimir/index.erb

124 lines
7.2 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
title: Mimir
acronym: Modular Interface for Managing Incomplete Records
---
<a href="https://github.com/UBOdin/mimir"><img style="position: absolute; top: 30; right: 0; border: 0;" class="img-responsive" src="https://camo.githubusercontent.com/a6677b08c955af8400f44c6298f40e7d19cc5b2d/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f677261795f3664366436642e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_gray_6d6d6d.png"></a>
<center style="margin-top:100px; margin-bottom:100px">
<img src="../../assets/logos/mimir_logo_final.png" class="img-responsive" alt="mimir_logo_final" style="margin-bottom: 80px;" width="809" height="321" id="pageTop"/>
</center>
<div class="jumbotron">
<h1>Don't Wrangle, Guess</h1>
<p>One of the biggest costs in analytics is data wrangling: Getting your messy, mis-labeled, disorganized data together so you can actually ask your questions. All data wrangling tools force you to do all this work upfront, before you actually know what you even want to do with the data. Mimir lets you at your data sooner by tracking your cleaning todos. Ask first, clean later, with Mimir.</p>
<p style="text-align: right"><a class="btn btn-primary btn-lg" href="https://github.com/UBOdin/mimir/blob/master/README.md#quick-start" role="button">Get Mimir</a></p>
</div>
<p class="lead">Mimir is about getting you to your analysis as fast as possible. It lets you harness the raw power of SQL, StackOverflow's <a href="https://insights.stackoverflow.com/survey/2016#technology">second-most popular</a> language for 4 years running. Mimir then adds a ton of powerful SQL extensions designed to dealing with messy data easier:
<div class="row show-grid">
<div class="col-sm-6 col-md-4">
<div class="thumbnail">
<a href="screenshots/cli_load.png" class="thumbnail">
<img src="screenshots/cli_load.png" alt="LOAD" />
</a>
<div class="caption">
<h3>LOAD</h3>
<p>Stop messing with data import and relational schema design. The versatile <a href="https://github.com/UBOdin/mimir/wiki/Mimir-SQL#load">LOAD</a> command allows you to quickly transform documents into relational tables without the muss and fuss of upfront schema design or defining complex transformation operators.</p>
</div>
</div>
</div>
<div class="col-sm-6 col-md-4">
<div class="thumbnail">
<a href="screenshots/cli_plot.png" class="thumbnail">
<img src="screenshots/cli_plot.png" alt="PLOT"/>
</a>
<div class="caption">
<h3>PLOT</h3>
<p>Stop writing messy scripts to visualize your data. The (soon™ to be released) <a href="https://github.com/UBOdin/mimir/wiki/Mimir-SQL#plot">PLOT</a> command lets you take SQL queries and see them directly notebook style, PDF/PNG, or Javascript, take your pick. Mimir even keeps track of unknowns in your data.</p>
</div>
</div>
</div>
<div class="col-sm-6 col-md-4">
<div class="thumbnail">
<a href="screenshots/cli_analyze.png" class="thumbnail">
<img src="screenshots/cli_analyze.png" alt="ANALYZE"/>
</a>
<div class="caption">
<h3>ANALYZE</h3>
<p>Mimir keeps track of your wrangling to-dos, marking query results that might have errors. When you need to be more precise, the <a href="https://github.com/UBOdin/mimir/wiki/Mimir-SQL#analyze">ANALYZE</a> command zeroes in on the specific wrangling you need <b>right now</b>.
</div>
</div>
</div>
</div>
</p>
<p class="lead">Unlike most other SQL-based systems, Mimir lets you make decisions during and after data exploration. All of Mimir's functionality is based on three ideas: (1) Mimir provides sensible best guess defaults, and (2) Mimir warns you when one of its guesses is going to affect what it's telling you, and (3) Mimir lets you easily inspect what it's doing to your data with <a href="https://github.com/UBOdin/mimir/wiki/Mimir-SQL#analyze">ANALYZE</a>.</p>
<p class="lead">Better still, you don't need any new infrastructure. Mimir attaches to ordinary relational databases through JDBC (We currently support SQLite, with SparkSQL and Oracle support in progress). If you don't care, Mimir just puts everything in a super portable SQLite database by default.</p>
<hr/>
<h2 id="how">Documentation</h2>
<div class="container-fluid">
<div class="row">
<div class="col-md-4">
<center><h4>If you want to use Mimir...</h4>
<h5><a href="https://github.com/UBOdin/mimir/blob/master/README.md#quick-start">Get Mimir</a></h5>
<h5><a href="whitepaper.html">5 minute overview</a></h5>
<h5><a href="https://github.com/UBOdin/mimir/wiki/Mimir-SQL">Mimir SQL</a></h5>
<h5><a href="https://github.com/UBOdin/mimir/wiki/Lenses-and-Adaptive-Schemas">Mimir's Lenses</a></h5>
</center>
</div>
<div class="col-md-4">
<center><h4>If you're having problems...</h4>
<h5><a href="https://github.com/UBOdin/mimir/issues">Issue Tracker</a></h5>
</center>
</div>
<div class="col-md-4">
<center><h4>If you want to hack on Mimir...</h4>
<h5><a href="https://github.com/UBOdin/mimir/wiki/Development">Setting Up a Dev Environment</a></h5>
<h5><a href="https://github.com/UBOdin/mimir/wiki/Concepts">Conceptual Introduction to Mimir</a></h5>
<h5><a href="https://github.com/UBOdin/mimir/wiki/Web%20Interface">Conceptual Introduction to the UI</a></h5>
<h5><a href="http://doc.odin.cse.buffalo.edu/mimir">ScalaDocs</a></h5>
<h5><a href="https://github.com/UBOdin/mimir/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20label%3A%22ramp-up%20project%22%20no%3Aassignee%20-label%3A%22Claimed%22%20-label%3A%22Awaiting%20Merge%22%20">Easy Projects to Start With</a></h5>
</center>
</div>
</div>
</div>
<hr/>
<h2 id="who">Who Are We?</h2>
<dl>
<dt>The Team</dt>
<dd><%=
(["Mike Brachman"] + LabMetadata::members_on_project("mimir")).map { |m| LabMetadata::link_for(m) }.join(", ") %></dd>
<dt>Research Advisors</dt>
<dd>Oliver Kennedy, Boris Glavic</dd>
<dt>Industry Advisors</dt>
<dd>Ronny Fehling (Airbus), Dieter Gawlick (Oracle), Zhen Hua Liu (Oracle), Beda Hammerschmidt (Oracle)</dd>
<dt>Alumni</dt>
<dd><%= LabMetadata::alumni_on_project("mimir").map { |m| LabMetadata::link_for(m) }.join(", ") %></dd>
</dl>
<p><i>Mimir is supported by gifts from Oracle, as well as grants from the NSF and Naval Postgraduate School</i></p>
<hr/>
<h2>Presentations</h2>
<div class="presentation"><a href="https://www.youtube.com/watch?v=jow4JmDOxPs">Video Demo</a> (2015)</div>
<div class="presentation"><a href="https://odin.cse.buffalo.edu/slides/talks/2015-2-Mimir">Overview Slides</a> (2015)</div>
<div class="presentation"><a href="{{rootPath}}rants/2015-08-13-incorrect-dbs.html">Rant: What if Databases Could Answer Incorrectly</a> (2015)</div>
<hr/>
<h2>Publications</h2>
<%= LabMetadata.render_pubs(
$db["publications"].
where { |pub| pub.fetch("projects", []).include? "mimir" }.
where { |pub| case LabMetadata::complete_venue(pub)["type"] when "conference", "journal", "workshop" then true else false end }.
sort { |a, b| b["year"].to_i <=> a["year"].to_i }.
take(10)) %>