The 42nd IPP Symposium

Yahoo!Cloud Serving Benchmark

Adam Silberstein, Yahoo!

A prominent trend in cloud computing is the explosion in the number of systems targeting data serving. Unlike MapReduce systems (e.g. Hadoop), which target large-scale data analysis, cloud data serving systems support simple query types (insert/update/get/delete). A common use case for such systems is to store user profiles. When a user arrives at, for example, a News website, the site uses profile information to decide what content to present. In isolation this type of query is simple; the challenges are to scale the system to store massive amounts of data, field large numbers of parallel queries, and answer queries with extremely low latency.

At Yahoo! we are very interested in cloud data serving. We reach 500 million unique users per month from a wide variety of applications. By making cloud serving available to our applications, we allow them to innovate at the user-facing layer, and free them from worrying about where to store their data, how to make it durable, and how to scale their solution.

One of the challenges we have at Yahoo!, and in the community at large, is that there are a large number of cloud serving systems, and it is hard to do an apples-to-apples comparison among them. Each advertises performance numbers, but for different workloads and hardware, and each has their own set of features. In this talk I will discuss the open-source Yahoo! Cloud Serving Benchmark (YCSB). We have developed YCSB to facilitate comparison among cloud serving systems. I will describe our core benchmark, and experimental results from 4 systems: Cassandra, HBase, Yahoo!'s PNUTS, and a shared MySQL implementation. I will also discuss some of the challenging comparison axes that remain open problems.

Adam Silberstein is a Research Scientist at Yahoo! Research in Santa Clara, CA. His research interests are in the general area of large scale data management. Specifically, this includes both online transaction processing, and analytics, and bridging the gap between them, as well as techniques for generating user feeds in social networks. Past research areas include sensor network query processing and XML processing. Prior to joining Yahoo! in 2007, Adam completed his Ph.D. at Duke University.