
Stochastic Models for Web Agents and the Web Environment
Project Highlight
Web agents are complex software systems that operate in the world wide web, the Internet, and related corporate, government, or military intranets. They are designed to perform a variety of tasks from caching and routing to searching, categorizing, and filtering.
Our goal is to develop a theoretically well-founded framework for the design and analysis of Web agents and agent systems based on mathematical models of their environment. Our approach has three major building blocks:
- Stochastic models of the Web graph that take into account the distribution and connectivity of Web pages and provide important general guidelines for agent design by capturing unique properties of the environment in which these agents must operate.
- Statistical learning techniques to enable Web agents to learn about their environment by inferring stochastic models of Web page content and local link structure.
- Algorithms for autonomous planning and decision making in the Web environment to enable agents to pursue goals and adapt to their changing environment.
Members
Publications
- G. Pandurangan, P. Raghavan, and E. Upfal.
Using PageRank to Characterize Web Structure, Proceedings of the
8th International Computing and Combinatorics Conference (COCOON), 2002.
- G. Pandurangan ,
P. Raghavan , and Eli
Upfal, Building Low-Diameter
P2P Networks. Proceedings of the 42th IEEE Symp. on Foundations of Computer Science. 2001.
- S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal,
The Web as a graph. Proceedings of the 19th ACM Symposium on Principles of Database Systems, pp 1-10, 2000.
- Thomas Hofmann, Learning Probabilistic Models of the Web, ACM SIGIR 2000
- R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and Eli Upfal, Stochastic models for the Web graph. Proceedings of the 41th IEEE Symp. on Foundations of Computer Science. 2000.
Bibliography
Learning from Link Topology
- Jon Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997. [pdf]
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, Hypersearching the Web. Scientific American, June 1999 [html]
- D. Gibson, J. Kleinberg, P. Raghavan. Inferring Web communities from link topology. Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998 [pdf]
- Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, Trawling the web for emerging cyber-communities, WWW8 [html]
- Krishna Bharat and Monika R. Henzinger, Improved algorithms for topic distillation in a hyperlinked environment, SIGIR 98 [pdf]
- Ramesh Sarukkai, Link Prediction and Path Analysis Using Markov Chains, WWW9 [html]
- Lee Giles, Kurt Bollacker, Steve Lawrence, CiteSeer: An Automatic Citation Indexing System, Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89-98, 1998 [pdf]
Kleinberg's popular paper (first in the list) has started a whole new research area. It presents the Hubs and Authorities algorithm (HITS) which performs a SVD of the Web adjacency graph to identify authoritative Web pages and Web communities. More follow-up papers on using the HITS algorithm for searching and connectivity analysis on the Web.
Small World Networks
- Duncan Watts, Small Worlds : The Dynamics of Networks Between Order and Randomness (Princeton Studies in Complexity) , 1999. [amazon] (copy available @TH)
- Jon Kleinberg. The small-world phenomenon: An algorithmic perspective. Cornell Computer Science Technical Report 99-1776, October 1999. [ps]
- Lada A. Adamic, The Small World Web, 2000 [abstract]
Theory of small world networks to model various dynamical processes and systems: from social systems to electricity networks and the Web. The crucial question is to understand how the dynamics and global system behavior depends on the (local) network topology.
Web Searching and Crawling
- Soumen Chakrabarti, Martin van den Berg, Byron Dom, Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery, WWW8 [html]
- Steve Lawrence and Lee Giles, Searching the World Wide Web, Science 1998 [pdf]
- Jeffrey Dean, Monika R. Henzinger, Finding Related Pages in the World Wide Web, WWW8 [html]
- Oren Zamir and Oren Etzioni, Grouper: A Dynamic Clustering Interface to Web Search Results, WWW8, [html]
Web Caching
- David Karger, Alex Sherman, Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina, Ken Iwamoto, Brian Kim, Luke Matkins, Yoav Yerushalmi, Web Caching with Consistent Hashing, WWW8 [html]
- Boris Chidlovskii, Claudia Roncancio and Marie-Luise Schneider, Semantic Cache Mechanism for Heterogeneous Web Querying, WWW8 [html]
- Kun-Lung Wu, Philip S. Yu, Latency-Sensitive Hashing for Collaborative Web Caching, WWW9 [html]
Web Measurements and Sampling
- Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork, Measuring Index Quality Using Random Walks on the Web, WWW8 [html]
- Lada A. Adamic, The Small World Web, 2000 [abstract]
- Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, Marc Najork, On Near-Uniform URL Sampling, WWW9 [html]
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener, Graph structure in the web: experiments and models, WWW9 [html]
Web Surfing and User Modeling
- Rajan M. Lukose and Bernardo A. Huberman, Surfing as a Real Option [abstract]
- Bernardo A. Huberman and Rajan M. Lukose, Social Dilemmas and Internet Congestion [abstract]
- Bernardo A. Huberman, Peter L.T. Pirolli, James E. Pitkow, and Rajan M. Lukose, Strong Regularities in World Wide Web Surfing, Nature ??? [abstact]
- William W. Cohen, Wei Fan, Web-Collaborative Filtering: Recommending Music By Spidering the Web, WWW9 [html]
- Mike Perkowitz Oren Etzioni, Towards Adaptive Web Sites: Conceptual Framework and Case Study, WWW8 [html]
- Marc Langheinrich, Atsuyoshi Nakamura, Naoki Abe, Tomonari Kamba, Yoshiyuki Koseki, Unintrusive Customization Techniques for Web Advertising, WWW8 [html]
Web Mining and Clustering
- Neel Sundaresan, Jeonghee Yi, Mining the Web for Relations, WWW9 [html]
- Mourad Mechkour, David J. Harper and Gheorghe Muresan, The WebCluster project. Using clustering for mediating access to the World Wide Web, SIGIR 98 [pdf]
- Oren Zamir and Oren Etzioni, Web document clustering: a feasibility demonstration, SIGIR 98 [pdf]
Web Economies
- Lada A. Adamic and Bernardo A. Huberman, The Nature of Markets in the World Wide Web [abstract]
- Sebastian M. Maurer and Bernardo A. Huberman, Competitive Dynamics of Web Sites [abstract]
Web Robots
- Hayato Yamana, Kent Tamura, Hiroyuki Kawano, Satoshi Kamei, Masanori Harada, Hideki Nishimura, Isao Asai, Hiroyuki Kusumoto,Yoichi Shinoda and Yoichi Muraoka, Experiments of collecting WWW information using distributed WWW robots, SIGIR 98 [pdf]
Metadata
- Charlotte Jenkins, Mike Jackson, Peter Burden, Jon Wallis, Automatic RDF Metadata Generation for Resource Discovery, WWW8 [html]
General Interest Articles
- Jim Hendler, Is there an intelligent agent in your future? Nature, Web matters, 11 March 1999. [html]
- Steve Kirsch, The future of Internet search (keynote address), SIGIR 1999 [pdf]
Software
- NIST Leider prototype for automatic hypertext generation [rtf]
Links
Organizations, Researchers and Research Groups
Conferences & Workshops
Journals & Books
Misc