Add new file

2017-02-09 19:36:14 -05:00 · 2017-02-09 19:36:14 -05:00 · 56f7951965
parent 55a90da775
commit 56f7951965
1 changed files with 61 additions and 0 deletions
--- a/src/news/2017-01-31-CIDR-Recap.md
+++ b/src/news/2017-01-31-CIDR-Recap.md
@ -0,0 +1,61 @@
+---
+title: CIDR Recap
+projects:
+  - mimir
+author: William Spoth
+---
+How big is BIG and how fast is FAST? This seemed to be a re-occurring theme of 
+the CIDR 2017 conference. A general consensus and major point of many 
+presentations is that RDBMS used to be the king of scaling to large data twenty 
+years ago but for some inexplicable reason has become lost to the ever changing 
+scope of BIG and FAST. Multiple papers attempted to address this problem in 
+different ways and added to multiple different tools on the market for data 
+stream processing and large calculations such as SPARK but there seemed to be 
+no silver bullet. To add to the theme that big data is too big, there were 
+keynote talks given by Emily Galt and Sam Madden that drove this point home and 
+gave different real work scenarios and outlooks on this problem.
+
+To break this theme apart I’ll split the papers into groups and explain the 
+different outlooks the authors took and how they addressed this common problem. 
+
+The papers, Prioritizing Attention in Analytic Monitoring, The Myria Big Data 
+Management and Analytics System and Cloud Services, Weld: A Common Runtime for 
+High Performance Data Analysis, A Database System with Amnesia, and Releasing 
+Cloud Databases for the Chains of Performance Prediction Models, were focused 
+on the theme that databases are not keeping pace with the rate that data is 
+growing. Sam Madden brought up an interesting point that the hardware 
+components like the bus are not the bottle neck in this system. With advances 
+in big data computing like apache spark, it feels like RDBMS are the end of the 
+line where data goes to die. These papers looked at different ways of 
+addressing this, ‘A Database System with Amnesia’ looked at throwing out unused
+data since most data in RDBMS gets put in and never used again and with the 
+increasing use of data streams the problem of not being able to process and 
+store this data fast enough becomes exemplified.
+
+
+
+The second common ground problem is even if you can efficiently store and 
+perform queries over your data lakes, humans often lack the ability to 
+efficiently create queries or have the necessary insight into how the data is 
+formatted. The papers, The Data Civilizer System, Establishing Common Ground 
+with Data Context, Adaptive Schema Databases, Combining Design, and Performance 
+in a Data Visualization Management System, all try to address this problem but 
+from slightly different angles. The data civilizer system and adaptive 
+databases look at aiding an analyst in schema and table exploration and to help 
+an analyst discover unknown or desired qualities about their data source. These 
+papers approach user insight in a way that would otherwise exist as internal 
+middleware in large companies, the problem is that big data and messy data 
+lakes are becoming more and more prevalent for other users. Medium sized 
+businesses can be buried in data following user surges or new product upgrades, 
+government agencies can have large amounts of uncleaned sensor and user 
+submitted data that they do not have the abilities or tools to manage.
+
+To me a large take away from this conference was databases need a better way to 
+handle big data. Databases are the hero big data needs AND the one it deserves. 
+To achieve these goals databases are going to need to relax the constraints on 
+ridged schemas and ‘perfect’ data, which open up a large amount of research 
+opportunities and the realization that there might not currently be a ‘right’ 
+answer to this problem. Either way it should be interesting to see what 
+sacrifices RDBMS make to compete with the growing amount of data and if they 
+are able to apply decades worth of research to this hot field that is looking 
+for an answer.