DATA CENTERS: Profile-Driven Data Management
Stan Zdonik, PI                            Mitch Cherniack, PI                        Mike Franklin, PI
Dept. of Computer Science        Dept. of Computer Science            Dept. of Computer Science
Brown University                        Brandeis University                        University of California at Berkeley

Contact Information

Stan Zdonik
Brown University, Dept. of Computer Science, P.O. Box 1910, Providence, RI  02912
Phone:  (401) 863-7648       Fax:  (401) 863-7657    Email:  sbz@cs.brown.edu
URL:  http://www.cs.brown.edu/people/sbz/
 
Mitch Cherniack
Brandeis University, Department of Computer Science
415 South St., Mailstop 018, Waltham, MA, 02454
Phone: (781) 736-2738       Fax: (781) 736-2741    Email:  mfc@cs.brandeis.edu
 
Mike Franklin
University of California at Berkeley, Dept. of Computer Science
687 Soda Hall, Berkeley, CA 94720-1776
Phone:  510-642-1662       Fax:   510-642-5775    Email:  franklin@cs.berkeley.edu
 
WWW PAGE
 http://www.cs.brown.edu/research/pddm/ 
 
List of Supported Students and Staff (optional)
Brown:  Greg Seidman (PhD student), Ying Xing (PhD student),Nesime Tatbul (PhD student)
Brandeis:  Eduardo Galvez (PhD student), David Brooks(MS student)
Berkeley:  Mathew Denny (PhD student), Yanlei Diao (PhD student), Danny Tom (MS student)
 
Project Award Information
Keywords
network data management, profiles, continuous queries, information utility, data staging,  data management
Project Summary
Networked information is often unmanaged.  In a wide-area setting like the Internet data sources are autonomous and provide little opportunity for application-dependant data management.  In a mobile environment, data is often unavailable or out of date.  A modern DBMS provides data management tools (e.g., indices, clustering) that are used and tuned by a DBA based on an overall understanding of the needs of the application mix.  Having a DBA in the network environments mentioned above is impractical.  Thus, we introduce the notion of a profile.  A profile is a declarative specification of a users interest as well as a specification of the utility that a specific set of objects would provide to that user.  We use profiles to drive data management decisions in a middleware service that sits between the data sources and the users.  This midldleware service uses profiles gathered from the user community to make decisions about data gathering, data management, and data delivery.  Relevant data management concerns include prestaging, indexing, clustering, declustering, replication, and precomputation.
 
Publications and Products
Publications: Talks:
Project Impact
Education:
Cherniack is teaching a seminar course at Brandeis in query optimization including topics of:  (1) Persistent Queries, (2) Dynamic/Adaptive Query Optimization and (3) Decision Support, all of which impact this project.  Zdonik is teaching a graduate seminar at Brown in which the class is writing a book on the topic of Network Data Services.  Student teams are writing chapters on topics such as web proxy caching, web search engines, and customization services.

Industry:
Cherniack is pursuing collaboration with Arnie Rosenthal and Len Seligman at Mitre on project that would contribute to "Data Gathering" phase of profile-driven data management, specifically targeting Data Integration.  Proposal for this work has been submitted internally at Mitre.  Zdonik is collaborating with the US Army Research Institute for Environmental Medicine (USARIEM) to develop software architectures for sensor-based systems including the use of profiles to manage data flow.  He has also been working with Verizon Technologies on the topic of web proxy cache performance.

Goals, Objectives, and Targeted Activities
Our most immediate goal is to design and test a profile language since such a language is an enabler for most of this activity.  We have an initial design, but feel that it must be tested in a real setting.    The Women's Writers Project at Brown has a large corpus of works by women that have been marked up in XML and made available to over 100 research groups.  They also have a log of the queries that users have posed over the last two years.  We are working with them to analyze these access patterns in the hope of better understanding the kinds of things that their users might want to see in their profiles.  This activity will produce input in further refinement of our profile language. The design of a good profile language must balance the need for expressibility with the need for efficient processing.  We are simultaneously working on profile processing algorithms for problems such as static caching and for recharging the cache of a potentially  disconnected mobile computer.

Project References
  • Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Expressing User Profiles for Data Recharging", IEEE Personal Computing: Special Issue on Pervasive Computing, July, 2001, To Appear.
  • Mitch Cherniack, Michael J. Franklin and Stan Zdonik, "Profile-Driven Data Management", (Submitted).
  • Area Background
    Database systems store information at a higher semantic level than file systems.  As such, they are capable of capturing application level semantics and of exploiting this semantics to provide better access capabilities (e.g., high-level query languages) as well as more efficient access paths (e.g., indices) to support highly optimized query evaluation.   The area of network data management is emerging as an attempt to bring the same benefits that we saw in the traditional DBMS world to the new world of network-based information systems.  Special topics within this new area include web search engines, proxy caches, data integration services, web server technology, and wireless data management.  It should be pointed out that many of these topic areas exist as point solutions.  In this project,. we are  attempting  to provide some of the basic principles and glue that can tie them all together.  We believe that profiles can play a central role in all of this.

    Area References

     
      *All award information can be found on the on the NSF on-line
    Awards Abstracts system http://www.fastlane.nsf.gov/a6/A6Start.htm.
     

    Back to the IDM '01 homepage