Communication and Document-Management System for an Electronic Notebook

Liye Ma
Department of Computer Science
Brown University
May 17, 2000

1 Introduction

Mobile computing technology is growing very fast today, due to the increasing need to access information anywhere and at any time. Currently, many portable computers, often called Personal Digital Assistants (PDAs), such as handheld PCs and Palm Pilots, are being used in different places for a variety of purposes[1]. Because of their small size and wireless connection, these devices can easily be taken anywhere, and thus can help with information management when a desktop computer is not available.

Many exciting scenarios have been proposed in this area. However, there are still lots of unresolved problems. Compared with desktop computers, mobile devices usually have slow processors and small memories. In addition, the connection of a mobile computer to the network, since it is wireless, has a limited bandwidth and often vulnerable. The low computation and communication power is the main challenge in using these devices, and dealing with it effectively has become one of the most important research issues[2]. Normally because of the limited power, mobile devices are used only as adjuncts to desktop computers.

As an extremely information-intensive locale, the university is a perfect place to do research in mobile computing, since mobile computing mainly deals with information management. Many systems have been designed and tested there. Coda[3][4], one of the earliest systems dealing with mobile computing, is a file system extended from the Andrew File System (AFS) that incorporates features like disconnected operation, data reintegration, and bandwidth adaptation. The drawback of Coda is that, as a sophisticated file system, it requires a fair amount of computation power and memory space on client computers. A laptop computer might benefit from this system, but a PDA is simply too low-powered (Coda was designed before PDAs became popular) to benefit. Furthermore, as the WWW continues to become the largest information reservoir, a file system is often not exactly what the user wants.

Another system, Pebbles[5], is currently being developed in Carnegie Mellon University. The goal of this system is to investigate uses of one or more PDAs connected to a PC. In this system, mobile devices are used simply as input/output devices rather than computers.

In this paper, we discuss a communication and document-management system we developed in the Computer Science Department of Brown University. This system is designed as the back-end support for an electronic notebook, which is utilized in a classroom scenario. In this scenario, students take handheld computers into the classroom. Using the electronic notebook system (including this back-end support and a front-end web-browser) installed there, they can download and view lecture slides, which are web pages, make annotations on them while listening to the instructor's lecture, and store those annotations for review later. Using this system, students can also group correlated web pages and annotations into document folders and directories and manage them efficiently. Web-pages and annotations can be backed up onto a server that is a desktop computer, and can be reloaded to any computer, so that after class, students can also view information on desktop computers. In the system, we provided a multi-way communication system to maximize the availability of information, as well as a two-level document-management system to handle various documents.

Though designed to support the electronic notebook, this system can actually be used far more than that, since essentially it supports the management of any files and web pages, as seen in the discussion below.

The paper is organized as follows. We first give a system overview, then introduce the client system, which consists of a communication subsystem and a document management subsystem, and then the server system, which has a communication component and a storage component. We then briefly discuss the implementation and testing and some possible future extensions, and finally we state some conclusions.

1.1 Terminology

Some terms that are frequently used in this paper are defined here.

1.2 Motivation

In general, our goal is to offer users an environment in which they can easily access information anywhere at any time, and manage documents easily and efficiently. To achieve this, we will address the following issues:

2 System Overview

The system runs in a wireless LAN environment, with a client-server structure. Each client, which is typically a mobile computer, runs the client system. One desktop computer runs the server system, functioning as a base server. In this system, we require only an intermittent connection between clients and the server, which means a client can still function well when it loses connection to the server for a fair amount of time (e.g. 2 days). This is crucial since the connection in the mobile world is vulnerable. Figure 2.1 shows the architecture of the system.

A client is typically a handheld computer, though it can be a desktop computer as well, which is connected to the Internet through a wireless LAN. The client system has two major components: a communication subsystem and a document management subsystem. The former manages the communication of a mobile computer to the base server, web servers, and other mobile computers. Through this multi-way communication it provides high availability and fair speed. The latter does the information management, which itself has three components: a web cache that stores web pages, an annotation store that stores annotation files or other user files, and a high-level file-system-like view, known as CMDS (Customizable Mobile Document Store), that enables users to group and handle documents hierarchically, as they handle files and directories in a general-purpose file system. The client system also provides a GUI/API as the interface to users or other applications, as well as a main control which coordinates all these components to function as a integrated system. Figure 2.2 shows the architecture of the client system.

A server is a desktop computer which is much more powerful than a mobile computer. As we said, a client is not required to connect with it all the time. A server consists of two components, a communication component which accepts and handles requests from clients, and a storage component which saves files or web pages. When the server is up, it facilitates the operations of clients in a variety of ways: caching web pages, broadcasting commonly interested files, etc. It can also back up users' annotation files and CMDS files. Figure 2.3 shows the architecture of a server system.

3 Client System

A client system typically runs on a handheld computer that is connected to the Internet through a wireless LAN. It is a fully autonomous system in that, though there might be a base server to facilitate its operation, when the server is not available it can still function well. This contrasts to an NFS client, which, if loses the connection to server, will stop functioning within minutes.

The architecture of a client system, shown in Figure 2.2, has the following components: a communication subsystem, a web cache that stores web pages, an annotation store that stores annotation files or any other user files, and a CMDS system that offers user a file-system like view of all documents. The last three form the document-management subsystem. There is also an API/GUI that provides interfaces to users or other programs, as well as a main control to coordinate all these components.

3.1 Web Cache

Since the wireless connection of a mobile computer is vulnerable, in order to maintain information availability, we need to cache important web pages, i.e. those web pages that are likely to be accessed in the near future, in the local memory of the mobile computers, so that information is still available during connection loss. We implemented a web cache to do this work (see Figure 3.1). There is a detailed discussion on caching in [6], from which we borrowed some intuition.

Each mobile computer has a local web cache to store web pages. The key to accessing this cache is the URL of web pages, and the corresponding object is the document itself. The web cache is organized in a tree structure, i.e., it corresponds to a directory tree in the underlying file system: basically, we have a root directory for the cache in which each web server corresponds to a subdirectory. Each web page is cached as a file in the file system, and the path name of the file (relative to the root directory corresponding to the server) is the same as the path-name part of the URL. In other words, the web cache is a partial view of the whole WWW, a very small part, of course, in which web pages are stored just as they are in the original server. A similar system is discussed in [7].

Different web pages in the web cache have different levels of importance: some are not important, and thus can be thrown away at any time, others are important, and thus need to be kept at the system's best effort, and still others are crucial and thus cannot be removed from the cache unless users explicitly specify that. We assign each web page in the cache a priority out of three levels: hoard, high, and low. This information is used when we flush the cache. The priority of a web page can be modified. The idea here is borrowed from Coda, which faces similar situations[4].

Like other caches, the web cache has finite size, so it must remove some pages from time to time. Unlike caches for processors that store machine instructions, the temporal locality for the web cache is not obvious, so that LRU is not helpful. In the web cache, the flushing is done as follows: traversing the whole cache, we delete pages with low priority. We delete those with high priority with a probability calculated according to the time when they were written into cache: a web page which came in earlier has higher probability to be deleted. We keep those with hoard priority in the cache, i.e., pages with hoard priority can only be manually deleted by users.

3.2 Annotation Store:

In class, students may want to take annotations while listening to the instructor's lecture. Our handheld PC has a touch screen, which enables students to draw directly on it. A front-end HTML browser made by other students in our group is also available. Using this browser, students can take annotation directly over lecture slides, through either typing or drawing on screen. The browser then automatically puts annotations into a file and pass it to this back-end system through the interface provided. To handle the saving, loading, and other operations for annotation files, we implemented an annotation store component in the client system.

The annotation store in each mobile computer is organized on a per-user basis, i.e., each user has its own space for loading/saving annotations. At first glance this seems unnecessary, since an H/PC is a personal computer that supposedly will be used by only one person. However, we believe that multi-user support is very helpful. This is because a multi-user support offers a global namespace without conflicts, so that those annotation files can easily be backed up onto base server, and loaded to other computers. A simple scenario that exploits this is: a student takes annotations in class, then saves them to base server and loads them to his workstation for review after class.

Annotations can be saved to and loaded from the base server, so that data won't be lost due to power failure of mobile computers or other factors. These operations are manually triggered by users.

In our system, the annotation store is designed for storing students' annotation files, but it is not restricted to this. In fact, any document, like a paper, a review, a homework, or other files, can be stored into and retrieved from it, in the same way as annotations. In other words, the annotation store is a rather general-purpose component that can potentially support a more complicated system.

3.3 CMDS-Customizable Mobile Document Store

Using the web cache and the annotation store, we can easily save and load documents, but these components are not sufficient for managing them. When users manage documents, they do not want to deal with lots of lengthy URLs and file names. Instead, they want an effective way to group correlated documents together, put them into some hierarchical structure, and manage them in different levels of granularity. For this, we need some file-system-like component that is both effective and easy to use.

CMDS (Customizable Mobile Document Store) is such a component. Like a file system, a CMDS consists of directories and files that are grouped together hierarchically. A directory in CMDS is just the same as a directory in a file system. It contains a set of other directories and files in the CMDS system. A file in CMDS, unlike a byte-stream file in a typical file system such as UNIX, is record-based. Each record in the file is the reference to a back-end document, i.e., a web page or an annotation file. Because of this, we call a file in CMDS a document folder. Figure 3.2 shows a sample CMDS structure and a document folder. CMDS supports common directory/file operations such as creating/deleting/moving a directory/file, editing a file, etc.

Both directories and document folders have various attributes, the most important of which is the status. The status of a directory or a document folder shows the status of the corresponding back-end documents, which might be normal (in the cache or annotation store of this mobile computer), absent (not in the mobile computer), or stale (in the mobile computer but in an outdated version). By checking the status, users can easily find out the status of those documents in the local computer, and handle them accordingly.

CMDS is a powerful system that greatly simplifies information management. By creating a document folder and putting references into it, users can group correlated documents together and handle them as a single entity. By creating directories, users can put different types of information into a hierachical structure and easily manage them, as they can in a file system. From the status of directories/document folders, users can easily find out the current status of those documents and handle them accordingly.

We provided a GUI for users to view and edit the CMDS system. Using this GUI, users can check the content of a directory or a document folder, and make modifications such as creating/deleting a directory/document folder or editing a document folder. The Appendix shows a sample view of this GUI.

To see how the CMDS can help users in managing documents, let us look at a simple scenario: a user groups (the URL or filename of) all lecture slides and related documents for one class into a document folder, and puts the document folders for different classes of one course into a directory. In the morning, before going out of town to a place where connection is not available, he or she loads all documents for lecture 1 to his/her mobile computer for review in the day. After returning back at night, he or she can remove all corresponding documents (not the document folder) to make more space for another lecture. Since the document folder is still there, he or she can easily reload all relevant documents anytime afterwards. Finally, when the semester ends and those documents are no longer needed, they can be deleted, both the document folders and actual documents. By using CMDS, all operations except grouping documents into folder can be done through a few mouse-clicks, while without CMDS those operations will be very troublesome.

3.4 Communication Subsystem

The above components together form the document management subsystem, which solves the problem of managing documents on a mobile computer. In addition to this subsystem, we also need a powerful communication system to ensure maximum availability of information. We designed and implemented a communication subsystem for this purpose.

The communication subsystem has several components that provide multiple ways of communicating among computers. Figure 3.3 below shows the architecture of a communication subsystem.

3.4.1 Proxy Server

To take advantage of the document management subsystem, we have to intercept the HTTP request from local web browsers, typically Pocket IE, and handle it accordingly. A proxy server does this work. When a proxy server is running, it constantly listens to a port for incoming requests. When a request for some web page comes in, it first checks the local web cache for it, and responds directly if there is a hit in cache. If the requested page is not currently in cache, it will contact the central controller of the communication subsystem to retrieve it in various ways.

As mentioned in section 3.2, another web browser has been implemented that enables students to make annotations on slides and save/load them as files. Thus the proxy server also provided an interface supporting those operations. This interface includes two commands: a save-annotation-file command and a load-annotation-file command.

3.4.2 Broadcast Listener

The bandwidth of a wireless LAN is limited (1-2Mbps in our case). If many mobile computers retrieve web pages at the same time, congestion problems will occur and the whole system will be saturated.

To save bandwidth, the base server broadcasts commonly interested files as a sequence of UDP packets, the format of which is discussed is discussed in section 4.2. The task of a broadcast listener on a mobile computer is to receive these broadcasted files. It listens to a specific port, receives incoming packets sent to that port, and reassembles related packets into a file when all have been received.

After the broadcast listener receives a file, it puts it into a swap space. The swap space is a fixed-size buffer shared by the broadcast listener and the central controller: the broadcast listener writes files into it and the central controller reads files from it. The swap space is flushed in strict FIFO order: all files are put into a queue, and whenever the space is full the file at the head of the queue is removed. If the central controller explicitly fetches a file, this file will also be removed from the swap space.

3.4.3 Central Controller

The central controller is the central component of the communication subsystem. It coordinates the operations of each communication component, and links all those components together to form an integrated entity. Also, most communication requests from the local system to other computers go through it.

The central controller primarily does two things. First, it retrieves a web page in a variety of ways. The algorithm for fetching a requested web page is:

    1. Check the local swap space. If the requested web page is in it, return its content.
    2. If it is not in the swap space but is expected to arrive there imminently, then wait till it arrives, and return its content. We say a web page is expected to arrive soon if some of its packets have been received by the broadcast listener.
    3. If the base server is up now, connect to it, fetch the web page, and return.
    4. Connect to the original web server, fetch the web page, and return.
    5. If all the above steps fail, then try to fetch the web page from a peer computer through the peer-to-peer communication component, as discussed in the next section.

The other main task of the central controller is to transfer documents between the local mobile computer and the base server. As we have discussed, the documents handled by this system are not only web pages, but also annotation files and CMDS directories/document folders. Since mobile computers have a limited power supply, data on them can get lost. Thus those files need to be backed up on the base server and loaded from it now and then. The central controller handles operations related to this, including saving/loading annotation files, saving/loading CMDS directories/document folders, etc., as discussed in more detail shortly.

3.4.4 Peer-to-Peer Communication

There are conditions in which neither base servers nor web servers are available, e.g., when the base unit, with which mobile computers are synchronized, is down. To make information still available under these circumstances, we implemented a peer-to-peer communication component. The idea here is that we group those web caches on many mobile computers, each of which has a small size, into a larger cache to increase the hit-rate (here the hit-rate means the availability of information). Similar issues are discussed in [8].

The peer-to-peer communication component has four subcomponents: a main unit, a listening thread, a message queue, and a server thread. The listening thread listens to a specific port for messages broadcast by peers; after receiving a message, it appends the message to message queue. The server thread listens to a port, accepts incoming connection requests for some web page, and sends the corresponding data. The main unit handles messages in the message queue one by one. It also provides an interface for the central controller to request web pages.

By coordinating these subcomponents, the peer-to-peer communication component can retrieve a web page requested by the central controller from a peer computer. This is the algorithm we use:

    1. The main unit broadcasts a message to peer computers requesting a web page. Then it sets a timer and waits for response.
    2. The listening thread on a peer computer, upon receiving a message, appends it to the message queue.
    3. The main unit on the peer computer fetches this message from the queue and checks the local web cache. If the requested web page exists, it responds to that message, otherwise it ignores the message.
    4. If the main unit cannot receive a response from peers and times out, the request fails.
    5. If the main unit receives a response message from a peer computer, it connects to the server thread of that computer and fetches the web page. There might be many response messages from different peer computers. The main unit will handle only the first one and ignore all others.

Traditionally, a URL indicates the location of a web page in the Internet, so that a client knows where to fetch it. We extended this concept in our system. In the peer-to-peer communication here, we use URL simply as an identifier for a document (each web page has its own unique URL), without caring about where this document is. In our whole system, we use URL both as a location indicator and an identifier, in that usually the URL is just an identifier for a web page that can be retrieved from various places such as local cache, base server, or a peer computer. However, when the web page is nowhere within the system, we can retrieve it from the original web-server, when the URL acts as a location indicator.

Peer-to-peer communication greatly increases information availability and thus is a very important topic in mobile computing. Because of hardware restrictions, and also because our system has no stringent requirement on this, we implemented only a simple component here that provides best-effort service for retrieving web pages in an anonymous way. However, this can be extended in a variety of ways. For example, we can implement a connection-oriented guaranteed service, or also enable the request for annotation files instead of just for web pages, etc. More research can be carried out in this category.

3.5 Main Control/GUI

There is a main control in the client system that coordinates the operations of all above components and integrates them into one whole entity. A GUI is also provided to give user a friendly interface to the system.

4 Server System

The server system runs on a base server that is a desktop computer, to facilitate the operations of client systems on mobile computers. It functions somewhat like a file server in that it responds to various client requests for loading or saving documents. However, unlike common file servers such as NFS servers, our server system is not very crucial for the functioning of whole system, in that the client system on mobile computers can still work normally even after it loses connection to the base server for a certain amount of time, with only a few functions disabled and with performance impact. This characteristic is necessary because the connection in mobile world is vulnerable. The server system has two components: a communication component and a backup-storage component. The structure of a server system was shown in Figure 2.3.

4.1 Communication Component

The task of the communication component in the server system, as suggested by the name, is to communicate with clients, handling clients' requests for saving/loading documents. The server component opens a TCP server socket, listens for incoming requests from client systems, and then handles them accordingly. Currently the requests it handles are:

Another function of the communication component is broadcasting. Sometimes there is predictable congestion in the environment. For example, in the beginning of a class all students want to download lecture slides to their mobile computers, and this could definitely cause congestion. However, this congestion, since it is predictable, can be resolved. A push-based technique, namely broadcasting, is a good way to do this, and therefore is implemented in the communication component. The broadcasting of a file is done in this way: first the file is partitioned into a sequence of disjoint UDP packets with the same size (except the last one), then these packets are broadcasted in round-robin order. As we discussed in the client system, each client has a broadcast listening thread that receives these packets and reassembles them into a whole file when all packets are received. The sender and receiver must agree on the packet format, of course.

This broadcasting algorithm, however, is not fault-tolerant, in that in order to reconstruct the file, a client has to receive all relevant UDP packets. As we know, UDP transmission is not reliable. Thus if the client missed a packet, it has to wait for a whole round for another chance, which can potentially cause high latency. This problem can be resolved by introducing redundancy, so that clients don't have to receive all packets in order to reconstruct the file. We tried a simple algorithm here: first the file is partitioned into a sequence of disjoint chunks; then we put the first two chunks into the first UDP packet, the second and third chunk into the second packet, and so on; then we broadcast UDP packets in round-robin order. However, the test shows that this new algorithm is not better than that before introducing redundancy, in fact even worse. Apparently it is not a good redundancy algorithm. In theory, there are algorithms in which we can partition a file into N packets, and reconstruct it by receiving any N/2 packets of them. But the implementation of such an algorithm is not trivial. Since the performance here is not our main concern now, we leave this for future work.

4.2 Backup Storage

One of the main purposes of the server system is to back up user files. To do this, we need a component that can store files somewhere safely and retrieve them when needed. This component is called a backup-storage component. Currently we have implemented three subcomponents according to the types of files we want to back up: a web cache, an annotation store, and a CMDS store.

A web cache is used to cache web pages used by clients. Whenever a web page is retrieved by the communication component through the web server, it is saved into this web cache, so that subsequent requests for it can be answered without contacting the original web server again and again (this might take a longer time, since web servers are usually farther from clients and get congested easily.).

An annotation store is used to store users' annotation files. It is a multi-user storage, in that each user can load/save his own files without influencing other users. Both the web cache and annotation store here are similar to their counterparts in client systems, though larger, of course.

The CMDS store, however, is different from the CMDS system in clients. It is simply a storage system that treats all CMDS directories and document folders as common byte-stream files. Thus it is more like an annotation store rather than a CMDS system on mobile computers. It also provides multi-user support.

5 Implementation and Testing

Both the server side and the client side are done in Java, due to its excellent inter-operability. The system was tested in the Computer Science Department of Brown University, and partially used in Computer Networks class. The base server used is a Sun workstation (Ultra 10). The mobile computers are NEC Mobile Pro 800 running Microsoft Windows CE Version 2.11, with Proxim Cards as network cards.

The peer-to-peer communication component in client systems should work without synchronizing to the base unit. Since current network cards do not support direct peer-to-peer communication, all tests on it are done in a simulated environment (through the base unit). However, since this component is not hardware-dependent, we consider that it works.

Since mobile computers have limited computational power and the wireless LAN has limited bandwidth and variable connection, our emphasis in testing is not on the speed. Rather, we care more about the system functionality, robustness, and ease of use. We carried out a fair amount of testing on this system, and basically it is functioning well.

6 Future Directions

This system, though somewhat complicated, should be viewed only as a basis or a testbed for future research in mobile computing. It can be extended in a variety of ways.

Group management was my first thought when considering system extensions. In the current system, we assume all users have similar interests, namely we assume there is only one group in the environment. In reality, however, there might exist many different groups, users in each of which have their own common interests. One user can also belong to different groups. How to manage groups is still an open question now, and the group management in UNIX could give us some useful intuitions.

Synchronization control is another interesting topic. In our system, we assume no shared-write operations, since each user will only write on his own annotations or other files. In reality, however, shared write is desired sometimes, e.g. when several people collaborate on a paper. It is true that we can achieve shared write in other ways, such as utilizing the underlying file system or WWW and managing those files in our system. But it will be much better if we can integrate this into our system.

A further topic is peer-to-peer communication. In the current system we only did preliminary work in this area, fetching web pages from peer computers. This did not demonstrate the importance it should have. A lot more work can be carried out here.

Mobile computing is still a fast-growing technology, in which there are many open questions. In the current project, this system is used to serve as an electronic notebook. But we intentionally made it a general-purpose system, in order to support future research work.

7 Conclusion

In this paper we discussed a Communication and Document-Management System that provides the back-end support for an Electronic Notebook System. This system gave users easy access and easy maintenance of documents. The whole Electronic Notebook System can be used in an academic department to improve classroom teaching. Furthermore, the back-end system discussed in this paper is actually a general-purpose system that provides a basis for future research work. We introduced its design and implementation in detail. We also proposed several directions in which in might be extended.

8 Acknowledgements

I am grateful to my advisor, Professor Thomas W. Doeppner, for his direction and help throughout my work in this project. I am thankful to Rebecca S. Schultz and Scott Lewandowski for their help on environment setup. I also want to thank my colleagues, including Michael Boilen, Ryan J. Evans, Benjamin Garrett, and Neelu Bedi, for their helpful suggestions.

9 References

[1] Tomasz Imielinski and B.R. Badrimath, "Mobile Wireless Computing", Communications of the ACM, October 1994.

[2] M. Satyanarayanan, "Fundamental Challenges in Mobile Computing", Carnegie Mellon University, 1998.

[3] Peter J. Braam, "The Coda Distributed File System", LINUX Journal, June 1998.

[4] Mahadev Satyanarayanan, "Mobile Information Access", IEEE Personal Communications, February 1996.

[5] Brad A. Myers, Herb Stiel, and Robert Gargiulo, "Collaboration Using Multiple PDAs Connected to a PC", Carnegie Mellon University, November 1998.

[6] R. Alonso, D. Barbara, and H.Garcia-Molina, "Data Caching Issues in an Information Retrieval System", Princeton University, 1990.

[7] Qun Ren and Margaret H. Dunham, "Using Clustering for Effective Management of a Semantic Cache in Mobile Computing", ACM 1999.

[8] Kun-Lung Wu and Philip S. Yu, "Local Replication for Proxy Web Caches with Hash Routing", IBM T.J. Watson Research Center, 1999.

10 Appendix: Sample GUI