New England Database Symposium

neds.gif (1190 bytes)

New England Database Society

Friday, May 8, 2009

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)

NEDS

Mixed Workload Management for Enterprise-Scale Business Intelligence Systems

Umeshwar Dayal
HP Labs

Friday, May 8, 2009, 4PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm, and followed by dinner at 6:00 pm)

Abstract:

Enterprises rely on business intelligence technologies (data integration, data warehousing, data mining, and analytics) to gain an understanding of how their business is performing. The enterprise BI architecture typically consists of a data warehouse that consolidates data from several operational databases, and serves a variety of front-end querying, reporting, and analytic tools. Increasingly, as enterprises become more automated, data-driven, and real-time, the BI architecture is evolving to support operational decision making. Operational BI workloads pose stringent performance requirements against enterprise data warehouses, and are notoriously difficult to manage, particularly since queries exhibit a huge variance in response times, ranging from fractions of a second to several hours. It is not well understood how effective existing database workload management policies are in the face of such complex, mixed workloads. Factors such as inaccurate cardinality estimates, data skew, and resource contention all make it difficult to predict how queries will behave. Experience has shown that a few "problem" queries can have drastic effects on system performance. Our goal is to automate the adaptive tuning and management of complex, mixed workloads on enterprise-scale data warehouses. There are many challenges in doing this. The first challenge is to estimate accurately how long a query will take, and what resources it will consume. Second, we must have effective strategies for admission control (which queries should be allowed to run), scheduling (which queries to run and when), and execution control (what to do when a problem query is detected). Third, we must be able to evaluate the performance of these strategies under different conditions, so we can design a system that automatically and dynamically selects the best policies to use. This talk will describe the approaches we are taking at HP Labs to address these challenges, and promising results we have obtained. We use machine learning techniques to predict query execution times and resource usage; and we are developing an experimental framework to understand the impact of existing and emerging workload management policies under different conditions.

Speaker Bio:

Umeshwar Dayal is an HP Fellow in Intelligent Information Management at Hewlett-Packard Laboratories, Palo Alto, California. In this role, he leads research programs in enterprise-scale data warehousing, business intelligence, scalable analytics,and information visualization. Umesh has 30 years of research experience in data management, and has made significant contributions to the field, especially in data integration, federated databases, active rule-based systems, query optimization, long-running transactions, business process management, and database design. He has published over 160 research papers and holds over 25 patents. In 2001, he received (with two co-authors) the 10-year best paper award from the International Conference on Very Large Data Bases for his paper on a transactional model for long-running activities.

Prior to joining HP Labs, Umesh was a senior researcher at DEC's Cambridge Research Lab, Chief Scientist at Xerox Advanced Information Technology and Computer Corporation of America, and on the faculty at the University of Texas-Austin. He received his PhD from Harvard University.

Umesh has served on the Editorial Board of several international journals, and has chaired and served on the Program Committees of numerous conferences. Currently, he serves as a General Co-Chair of ICDE 2010 in Long Beach, CA. He has been a member of the Board of the VLDB Endowment, the Board of the International Foundation for Cooperative Information Systems, the Executive Committee of the IEEE Technical Committee on Electronic Commerce, and the Steering Committee of the SIAM International Conference on Data Mining. He is an ACM Fellow.

Maintained by Olga Papaemmanouil olga AT cs.brandeis.edu