2015 - 2016
2013 - 2014
- 2014 - Stanford - Mining of Massive Datasets.
- 2013 - AMPLab - Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices.
- 2013 - AMPLab - MLbase: A Distributed Machine-learning System.
- 2013 - AMPLab - Shark: SQL and Rich Analytics at Scale.
- 2013 - AMPLab - GraphX: A Resilient Distributed Graph System on Spark.
- 2013 - Google - HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm.
- 2013 - Microsoft - Scalable Progressive Analytics on Big Data in the Cloud.
- 2013 - Metamarkets - Druid: A Real-time Analytical Data Store.
- 2013 - Google - Online, Asynchronous Schema Change in F1.
- 2013 - Google - F1: A Distributed SQL Database That Scales.
- 2013 - Google - MillWheel: Fault-Tolerant Stream Processing at Internet Scale.
- 2013 - Facebook - Scuba: Diving into Data at Facebook.
- 2013 - Facebook - Unicorn: A System for Searching the Social Graph.
- 2013 - Facebook - Scaling Memcache at Facebook.
2011 - 2012
- 2012 - Twitter - The Unified Logging Infrastructure for Data Analytics at Twitter.
- 2012 - AMPLab - Blink and It’s Done: Interactive Queries on Very Large Data.
- 2012 - AMPLab - Fast and Interactive Analytics over Hadoop Data with Spark.
- 2012 - AMPLab - Shark: Fast Data Analysis Using Coarse-grained Distributed Memory.
- 2012 - Microsoft - Paxos Replicated State Machines as the Basis of a High-Performance Data Store.
- 2012 - Microsoft - Paxos Made Parallel.
- 2012 - AMPLab - BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data.
- 2012 - Google - Processing a trillion cells per mouse click.
- 2012 - Google - Spanner: Google’s Globally-Distributed Database.
- 2011 - AMPLab - Scarlett: Coping with Skewed Popularity Content in MapReduce Clusters.
- 2011 - AMPLab - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
- 2011 - Google - Megastore: Providing Scalable, Highly Available Storage for Interactive Services.
2001 - 2010
- 2010 - Facebook - Finding a needle in Haystack: Facebook’s photo storage.
- 2010 - AMPLab - Spark: Cluster Computing with Working Sets.
- 2010 - Google - Pregel: A System for Large-Scale Graph Processing.
- 2010 - Google - Large-scale Incremental Processing Using Distributed Transactions and Notifications base of Percolator and Caffeine.
- 2010 - Google - Dremel: Interactive Analysis of Web-Scale Datasets.
- 2010 - Yahoo - S4: Distributed Stream Computing Platform.
- 2009 - HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
- 2008 - AMPLab - Chukwa: A large-scale monitoring system.
- 2007 - Amazon - Dynamo: Amazon’s Highly Available Key-value Store.
- 2006 - Google - The Chubby lock service for loosely-coupled distributed systems.
- 2006 - Google - Bigtable: A Distributed Storage System for Structured Data.
- 2004 - Google - MapReduce: Simplied Data Processing on Large Clusters.
- 2003 - Google - The Google File System.
Resource