DATA
CENTERS: Profile-Driven Data Management
Stan Zdonik, PI
Mitch Cherniack, PI
Mike Franklin, PI
Dept. of Computer Science
Dept. of Computer Science
Dept. of Computer Science
Brown University
Brandeis University
University of California at Berkeley
Contact
Information
Stan Zdonik
Brown University, Dept. of Computer Science, P.O.
Box 1910, Providence, RI 02912
Phone: (401) 863-7648
Fax: (401) 863-7657 Email:
sbz@cs.brown.edu
URL: http://www.cs.brown.edu/people/sbz/
Mitch Cherniack
Brandeis University, Department of Computer Science
415 South St., Mailstop 018, Waltham, MA, 02454
Phone: (781) 736-2738
Fax: (781) 736-2741 Email: mfc@cs.brandeis.edu
Mike Franklin
University of California at Berkeley, Dept. of Computer Science
687 Soda Hall, Berkeley, CA 94720-1776
Phone: 510-642-1662
Fax: 510-642-5775 Email:
franklin@cs.berkeley.edu
WWW PAGE
http://www.cs.brown.edu/research/pddm/
List of Supported Students and Staff
(optional)
Brown: Greg Seidman (PhD student), Ying Xing (PhD student),Nesime
Tatbul (PhD student)
Brandeis: Eduardo Galvez (PhD student), David Brooks(MS
student)
Berkeley: Mathew Denny (PhD student), Yanlei Diao (PhD
student), Danny Tom (MS student)
Project
Award Information
-
Award Number: IIS-0086057
-
Duration: 9/30/2000-- 8/30/2005
-
Title: Data Centers: Profile-Driven Data Management
Keywords
network data management, profiles, continuous queries, information utility,
data staging, data management
Project
Summary
Networked information is often unmanaged. In a wide-area setting
like the Internet data sources are autonomous and provide little opportunity
for application-dependant data management. In a mobile environment,
data is often unavailable or out of date. A modern DBMS provides
data management tools (e.g., indices, clustering) that are used and tuned
by a DBA based on an overall understanding of the needs of the application
mix. Having a DBA in the network environments mentioned above is
impractical. Thus, we introduce the notion of a profile. A
profile is a declarative specification of a users interest as well as a
specification of the utility that a specific set of objects would provide
to that user. We use profiles to drive data management decisions
in a middleware service that sits between the data sources and the users.
This midldleware service uses profiles gathered from the user community
to make decisions about data gathering, data management, and data delivery.
Relevant data management concerns include prestaging, indexing, clustering,
declustering, replication, and precomputation.
Publications
and Products
Publications:
-
Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Expressing User
Profiles for Data Recharging", IEEE Personal Computing: Special Issue on
Pervasive Computing, July, 2001, To Appear.
-
Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Profile-Driven Data
Management", (Submitted).
-
M. Franklin. Challenges in Ubiquitous Data Management, Informatics: 10
Years Back, 10 Years Ahead, the 10th Anniversary Conference, R. Wilhelm
ed., Lecture Notes in Computer Science, #2000, Springer Verlag, 2001. pp
24-33.
-
D. Tom. Data Recharging: User Profiles and Download Scheduling. M.S. Research
Report, UC Berkeley, January 2001.
-
M. Denny and M. Franklin, Edison: Database-Supported Synchronization, (Submitted).
-
Yanlei Diao, Hao Zhang, Michael J. Franklin, NFA-based Filtering for Efficient
and Scalable XML Routing (Submitted).
Talks:
-
Mitch Cherniack, "Profile-Driven Data Management", University of Waterloo
(3/12/01), University of Toronto (3/13/01), Concordia University
(4/12/01)
-
M. Franklin, Data Dissemination and Synchronization, CS Department, Stanford
University, April 2001.
-
M. Franklin, Tutorial on Synchronization and Dissemination, 2nd International
Conference on Mobile Data Management, Hong Kong, January 2001.
-
M. Franklin, Challenges in Ubiquitous Data Management, Aether Systems Distinguished
Lecture Series, University of Maryland Baltimore County, December, 2000.
-
M. Franklin, Large-Scale Data Dissemination, GriPhyN project All
Hands meeting, Argonne National Laboratories, October, 2000.
-
M. Franklin, XFilter - Efficient Filtering for Dissemination of XML Documents.
Boeing Corporation, September, 2000.
Project
Impact
Education:
Cherniack is teaching a seminar course at Brandeis in query optimization
including topics of: (1) Persistent Queries, (2) Dynamic/Adaptive
Query Optimization and (3) Decision Support, all of which impact this project.
Zdonik is teaching a graduate seminar at Brown in which the class is writing
a book on the topic of Network Data Services. Student teams are writing
chapters on topics such as web proxy caching, web search engines, and customization
services.
Industry:
Cherniack is pursuing collaboration with Arnie Rosenthal and Len Seligman
at Mitre on project that would contribute to "Data Gathering" phase of
profile-driven data management, specifically targeting Data Integration.
Proposal for this work has been submitted internally at Mitre. Zdonik
is collaborating with the US Army Research Institute for Environmental
Medicine (USARIEM) to develop software architectures for sensor-based systems
including the use of profiles to manage data flow. He has also been
working with Verizon Technologies on the topic of web proxy cache performance.
Goals,
Objectives, and Targeted Activities
Our most immediate goal is to design and test a profile language since
such a language is an enabler for most of this activity. We have
an initial design, but feel that it must be tested in a real setting.
The Women's Writers Project at Brown has a large corpus of works by women
that have been marked up in XML and made available to over 100 research
groups. They also have a log of the queries that users have posed
over the last two years. We are working with them to analyze these
access patterns in the hope of better understanding the kinds of things
that their users might want to see in their profiles. This activity
will produce input in further refinement of our profile language. The design
of a good profile language must balance the need for expressibility with
the need for efficient processing. We are simultaneously working
on profile processing algorithms for problems such as static caching and
for recharging the cache of a potentially disconnected mobile computer.
Project
References
Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Expressing User
Profiles for Data Recharging", IEEE Personal Computing: Special Issue on
Pervasive Computing, July, 2001, To Appear.
Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Profile-Driven Data
Management", (Submitted).
Area
Background
Database systems store information at a higher semantic level than file
systems. As such, they are capable of capturing application level
semantics and of exploiting this semantics to provide better access capabilities
(e.g., high-level query languages) as well as more efficient access paths
(e.g., indices) to support highly optimized query evaluation.
The area of network data management is emerging as an attempt to bring
the same benefits that we saw in the traditional DBMS world to the new
world of network-based information systems. Special topics within
this new area include web search engines, proxy caches, data integration
services, web server technology, and wireless data management. It
should be pointed out that many of these topic areas exist as point solutions.
In this project,. we are attempting to provide some of the
basic principles and glue that can tie them all together. We believe
that profiles can play a central role in all of this.
Area References
-
[AAB+99] M. Altinel, D. Aksoy., T. Baby, M. Franklin, W. Shapiro, S. Zdonik,
"DBIS Toolkit: Adaptable Middleware for Large Scale Data Delivery", Proc.
ACM SIGMOD Conf., Philadelphia, PA, June, 1999.
-
[CDTW00] J. Chen, D. DeWitt, F. Tian, Y. Wang, "NiagaraCQ: A Scalable Continuous
Query System for Internet Databases", Proc. ACM SIGMOD Conf., Dallas, June,
2000.
-
[OPSS93] B. Oki, M. Pfluegl, A. Siegel, D. Skeen, ?The Information Bus
- An Architecture for Extensible Distributed Systems?, Proc. 14th SOSP,
Ashville, NC, December, 1993.
-
[YGM99] Tak W. Yan, Hector Garcia-Molina: The SIFT Information Dissemination
System. ACM Transactions on Database Systems, 24 (4): 529-565 (1999).
*All award information can be found on the on the NSF on-line
Awards Abstracts system http://www.fastlane.nsf.gov/a6/A6Start.htm.