Bigdata Paper

2014 - Stanford - Mining of Massive Datasets.
2013 - AMPLab - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
2013 - AMPLab - MLbase: A Distributed Machine-learning System.
2013 - AMPLab - Shark: SQL and Rich Analytics at Scale.
2013 - AMPLab - GraphX: A Resilient Distributed Graph System on Spark.
2013 - Google - HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
2013 - Microsoft - Scalable Progressive Analytics on Big Data in the Cloud.
2013 - Metamarkets - Druid: A Real-time Analytical Data Store.
2013 - Google - Online, Asynchronous Schema Change in F1.
2013 - Google - F1: A Distributed SQL Database That Scales.
2013 - Google - MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
2013 - Facebook - Scuba: Diving into Data at Facebook.
2013 - Facebook - Unicorn: A System for Searching the Social Graph.
2013 - Facebook - Scaling Memcache at Facebook.

2012 - Twitter - The Unified Logging Infrastructure for Data Analytics at Twitter.
2012 - AMPLab - Blink and It’s Done: Interactive Queries on Very Large Data.
2012 - AMPLab - Fast and Interactive Analytics over Hadoop Data with Spark.
2012 - AMPLab - Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
2012 - Microsoft - Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
2012 - Microsoft - Paxos Made Parallel.
2012 - AMPLab - BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
2012 - Google - Processing a trillion cells per mouse click.
2012 - Google - Spanner: Google’s Globally-Distributed Database.
2011 - AMPLab - Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
2011 - AMPLab - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
2011 - Google - Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

2010 - Facebook - Finding a needle in Haystack: Facebook’s photo storage.
2010 - AMPLab - Spark: Cluster Computing with Working Sets.
2010 - Google - Pregel: A System for Large-Scale Graph Processing.
2010 - Google - Large-scale Incremental Processing Using Distributed Transactions and Notiﬁcations base of Percolator and Caffeine.
2010 - Google - Dremel: Interactive Analysis of Web-Scale Datasets.
2010 - Yahoo - S4: Distributed Stream Computing Platform.
2009 - HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
2008 - AMPLab - Chukwa: A large-scale monitoring system.
2007 - Amazon - Dynamo: Amazon’s Highly Available Key-value Store.
2006 - Google - The Chubby lock service for loosely-coupled distributed systems.
2006 - Google - Bigtable: A Distributed Storage System for Structured Data.
2004 - Google - MapReduce: Simplied Data Processing on Large Clusters.
2003 - Google - The Google File System.

Resource