Director's Pool
Telecollaboration for Mechanical Design and Manufacturing

Andries van Dam, Brown University, Donald Greenberg, Cornell University, Henry Fuchs, University of North Carolina - Chapel Hill, Richard Riesenfeld, University of Utah

The Long-Term Vision

Together we dream about the day when we can work as a multi-disciplinary distributed team of telecollaborators using an immersive shared environment to design and manufacture a wide variety of products. Such a facility would allow one to draw on design and manufacturing expertise from all around the globe, developing products better, cheaper, and faster than is currently possible. While telecollaboration systems using talking heads and shared white boards have improved, the 2D through-the-window paradigm itself often prevents much of the interaction that would otherwise take place if the (remote) collaborators were together in the same room. In particular we want to:

Feel as if our real individual environments are physically joined along some common junctions such as shared walls, or as if we are in the same virtual shared space as our collaborators;
See and interact with collaborators as naturally as we do when we are in the same physical room, gesturing, pointing, using all of the subtle nuances of both verbal and nonverbal communication;
Create (design), manipulate, and interact with shared objects of interest, both real and synthetic, in 3D.

Our vision is long term and is probably only fully realizable in twenty years or so. To implement the vision we will share our complementary knowledge, and learn what we can about working together remotely, using the best tools we currently have available. In addition, we will pursue our long-term goals in two particular aspects: improving our existing telecollaboration infrastructure, and collaborating to build products or artifacts. In the former we learn by building a working system, in the latter we learn by actually collaborating on a real product as a driving application for testing and improving our infrastructure.

To follow our vision in the short term we will use various displays with a wide spectrum of immersiveness, from the non-immersive desktop display, to a semi-immersive "fish tank" display and Active Desk (a Responsive Workbench variant that has a drafting table with a computer screen on the top face where user interaction is done directly on the screen) ), to an immersive FakeSpace BOOM, and a head-mounted-displays (HMD).

We will create a gestural interface to a full-featured design modeling system to build a new high-resolution, wide field-of-view video camera.

We will design and implement a structured light depth extraction scheme to achieve limited real-time remote scene reconstruction to allow remote display both of 3D avatars of designers, and of their physical context.

Structure of This and Related Proposals

We believe that the research leading toward the preceding long-term vision covers many distinct areas. Because several of these areas are sufficiently challenging on their own, we have identified and proposed to attack them separately. In particular, we note the following related Director's Pool proposals:

"Exploring the use of Gestures for Constructing Parametric Mechanical CAD Models" [Riesenfeld97],
"Unifying Image Synthesis and Analysis on a More Solid Scientific Basis" [Fuchs97], and
"Single Imaging Point, Multiple Imager, Wide Field of View, Camera System" [Keller97].

This proposal, "Telecollaboration for Mechanical Design and Manufacturing," has two purposes. The first is to include additional work that is not covered in separate individual proposals, but is part of our plans for an improved telecollaboration infrastructure. The second is to integrate the above individual efforts and the additional work into a complete working telecollaboration system, a system that we believe is the next realistic step toward our vision.

Status

Telecollaboration Infrastructure

Previously the STC developed the vrapp system for telecollaboration. We demonstrated this system at the June 1996 site visit which took place at UNC, showing a simulated three site design session. The system used the STC's T1 based telecollaboration infrastructure for data transmission as well as for transmission of video and audio. While the system demonstrated integration of video, audio, computer generated objects, and an industrial-strength modeler, vrapp has several shortcomings that we plan to address in the coming year.

Pro's and Con's of Current Display Devices

The current display devices and their respective viewing paradigms, e.g. BOOM, HMD, and high-resolution monitors, are deficient in varying degrees in the areas of resolution, field-of-view, and their sensitivity to poor head tracking. HMD's have many advantages. Most importantly they provide a user with a sense of total immersion, allowing him or her to look and walk around. In addition, they free the user's hands for model manipulation. However, the (typically) low display resolution makes precise viewing difficult. Furthermore there is the added discomfort of wearing an HMD.

The BOOM mechanical display paradigm offers better resolution, and is somewhat less encumbering than an HMD, but requires user "steering" with the hands, making it difficult to manipulate a model or object, both conceptually and physically. Freedom of movement is restricted because the BOOM is connected to a fixed central mechanical tower. On the other hand, tracking using a BOOM is less problematic than with an HMD because with the BOOM tracking is mechanical in nature. This form of tracking has relatively low latency and is relatively robust. The Push variant trades mobility for a lighter display that can be steered with the head alone.

A 2D monitor provides high resolution along with high frame rates and minimal display latency, but the user is not immersed. This "through-the-window" paradigm makes it difficult to use concurrently for both precise model interaction and shared environment or collaborator viewing. We want a display device that better meets both needs concurrently. A participant should be able to view and concentrate on the high-resolution details of a modeled object without being distracted by the remainder of the environment (real or synthetic) or the infrastructure, but should also be able to "look up at" a remote collaborator to make eye contact during verbal exchanges, and to maintain a sense of the presence of the remote collaborator.

Our Next Display Device

Collaboration participants are currently limited to interaction with and display of only synthetic (computer modeled and rendered) objects in the shared environment. We also want them to be able to share, show, and manipulate real objects in the same environment. For example, a mechanical engineer at one site might want to discuss a new prototype with an engineer at a foundry, and do a side-by-side comparison of the real prototype and the Alpha_1 model. We want to incorporate a new display paradigm that supports both close-in two-handed collaborative work on synthetic models or objects, and also (concurrently) realistic viewing of a remote collaborator, a portion of that collaborator's surrounding environment, and any real objects that the remote collaborator chooses to hold within the visible portion of his or her environment.

Pro's and Con's of the Current Reconstruction Technique

In vrapp we used avatars to represent the individual participants in a shared collaboration session. The avatars had live video of the user texture mapped onto one face of the cube. This worked fairly well when a user looked directly at an avatar, but it became less effective as the the texture mapped face became more tilted with respect to the user viewing the avatar. In addition, the video source was low quality and a user could not see the remote user's hands or have any idea how that user was interacting with the environment. We believe that the 2D video-based avatars do not effectively give the sense of being with another person that is necessary for effective interpersonal communication.

Reconstruction of Remote Environments

As implied in the long-term vision, we want some day to be able to present individuals with a realistic and accurate 3D reconstruction of a remote collaborator's environment. We have been pursuing the remote reconstruction problem on two fronts. First, we have been working on completely passive reconstruction techniques: image-based rendering and passive, computer-vision-based geometry. Second, we have been pursuing active reconstruction methods that combine live video textures with geometry obtained from structured light--these methods may yield useful results in the short term.

We see the former, the completely passive reconstruction techniques, as ideal but long-term solutions to the problem. While image-based rendering methods (see for example [McMillan95]) have the tremendous potential for providing individuals with realistic and accurate renderings of arbitrary scenes, they all rely on knowing something about the geometry of the scene. When the reconstructions are synthetic, the problem is trivial, the geometry is known. But for arbitrary real environments, the accurate acquisition of scene geometry (depth) is a very difficult problem, one that has kept computer vision researchers busy for a long time. Researchers at Cornell University, the University of North Carolina, and the University of Pennsylvania have been working together to apply specialized knowledge of lighting and surface BRDF information to traditional computer vision techniques with the goal of improving and eventually automating geometry acquisition, even for arbitrary uncontrolled scenes. Using Cornell's precisely controlled light measurement lab we have been measuring and characterizing various lighting and surface properties and working to synthesize arbitrary images from this information. With the University of Pennsylvania we have been working on applying such information to traditional computer-vision methods, and on including measurement uncertainty with the geometry results. Finally, the University of North Carolina has been working on building image-based rendering tools in preparation for the anticipated receipt of scene depth (disparity) data obtained using traditional computer vision techniques aided by a priori knowledge about scene lighting and surface BRDF's.

Because real-time versions of the completely passive reconstruction techniques are at the moment long-term goals, we are in the short term pursuing active structured light depth extraction methods to obtain scene geometry for a limited working volume, and combining the results with live video-based textures. This method has the promise of operating in real-time, albeit for a limited working volume, and with less realistic and accurate results.

Collaboration Artifacts

Distinct from our interest in telecollaboration infrastructures, there has always been a joint interest among the team members in system building, and sometimes these systems require complicated physical objects. As such we are committed to working together to develop useful objects that we could not develop individually. Besides producing useful products, this work teaches us more and more about collaboration in general, e.g. what aspects are most important.

This past year we designed and developed a new compact video-based see-through HMD. The new HMD provides the user with unprecedented visual integration of images on the display surfaces and objects in the real world. Collaboration on this particular artifact (project) served well to teach us about remote collaboration in general, both from a technological and an interpersonal standpoint. The multidisciplinary nature of the HMD meant that we had to draw heavily on individual expertise at each site, i.e. no one site could have easily "pulled it off" alone. We propose a similarly challenging multidisciplinary driving problem this year.

Other Related Work

Shared Telecollaboration Environments

There is much work related to wide-area collaboration, but we feel that the proposed vrapp system is combination that can not be found elsewhere. vrapp will be used on a variety of environments from non-immersive (desktop) to semi-immersive (Active Desk) to immersive (BOOM and HMD). We will have a gestural interface (instead of the traditional pointer interface) to design real world artifacts. Plus, the proposed methods in vrapp for object reconstruction have less start up time and have fewer restrictions than the techniques used in other wide-area collaborative systems. By "start up time" we mean the amount of time it takes for the user to be outfitted so they can take part in the system and reconstruction can occur.

Some systems work only on desktop, such as the Shastra project [Bajaj94] and the system described in [Ahlers95]. Shastra supports cooperative work like geometric modeling, simulation, interrogative visualization and design prototyping. The latter system combines the ideas of distributed virtual environments and collaboration. That system allows users to cooperate on furniture placement for interior design, but it has no reconstruction of real world objects in the virtual environment.

NPSNET [Macedonia95] is a system used for battlefield simulation that supports semi-immersive wide projection screens and immersive HMD's. Since the application is a battlefield simulation, NPSNET concentrates on different problems than those that exist in a collaborative design system, such as the problems of dealing with a large number of users in a large virtual environment instead of a few users in a much smaller environment doing fine manipulation. Some participants in the battlefield simulations wear suits that track body position, which requires a much longer start up time than what our depth extraction techniques will need.

The ATR group has produced a system [Yoshida95] that provides a good collaborative environment but only uses a semi-immersive display (wide-projection screen), has a long start up time for users, has interaction unsuited for mechanical design, and has no way to mix synthetic and real items in the environment. ATR's system only uses a semi-immersive wide-projection screen instead of the wide spectrum of displays available with the vrapp system. Start up time is increased by having to capture face geometry for each participant along with having to affix dots on each user's face before a session can begin. During a collaboration session, the system warps the stored face geometry using the movement of the tracked dots. This system used speech recognition and gestures from a CyberGlove for input. Unfortunately, gesture recognition based on noisy tracking is not exact enough for mechanical design. In the ATR system, the facial expressions and body movement of the participants were mapped into the environment. In our system, we will not be restrained to human geometry, but will allow any geometry that fits in the area that is being depth extracted to be mapped into the environment.

IInteraction and Modeling

Interaction with virtual models in an immersive environment has, for the most part, consisted of architectural walk-throughs, or modifying the viewpoints of a model, i.e., translating and rotating objects. Even multiparty immersive "games" have allowed only pre-computed models in the environment. A small amount of research has been done into actually creating and modifying models immersively. While it seems that this intrinsic 3D environment would be ideal for modifying 3D models, there are a large number of roadblocks. The earliest research was Jim Clark's thesis in which he moved B-spline control points for surfaces using a mechanical wand in a head mounted immersive environment. The hardware and lag problems dictated that only one simple surface could be manipulated at a time. There was an entire lack of haptic feedback. Nonetheless, interacting and modifying objects in the immersive environment has inspired movement towards our goal. More recently, the positions of a VPL glove have been used to create ordered 3D interpolation points which define a tensor product interpolating spline surfaces. The lack of precision limited the usefulness of both the data and the resulting surfaces.

For interactive creation and manipulation of surfaces, there has been work like [Banks90, Banks93], using B-splines and sampling, and [Snibbe92], using widgets (racks and ...). More recently, there has been research in interactive feature based sketching of solid models [Sturgill95, Sturgill97], and gestural sketching in architectural situations [Zelesnik96]. There is also recent research in haptic feedback for touching, tracing, and moving sculptured models, although not yet for their interactive shape modification. The problem of performing shape modification in an immersive environment is one requiring research. Adapting already existing computational methods is not straightforward. Limited bandwidth, systems lag, lack of interactive high quality rendering, poor understanding of what to show and what not to show, questions about when clutter takes over, and the like, make simple issues of meaningful display difficult. 2D screen methods mostly do not carry over to 3D immersive space. New techniques are required. These and other research issues make research for design in an immersive environment an important and unsolved problem. Adaptation of already existing methods will be nontrivial in this environment. For example, what does "snap to grid" mean if the designer is in the same environment? Other major research issues are created by having more than one participant in the environment, when more than one design action is simultaneously occurring. There are difficult issues concerning lag and refresh in updating and recomputing a design, and also in its distributed display.

3D Reconstruction Techniques

A 3D telecollaboration system is greatly enhanced by real-time acquisition, transmission, and display of a remote user and the user's surroundings. The problem of real-time acquisition of three-dimensional models has been studied by many groups. Nayar and Watanabe [Watanabe95] have developed a system which produces 512x480 depth maps of a scene at 30Hz. Two cameras detect two differently defocused images of the same scene and use a depth from defocus algorithm to determine depth at each point in the 512x480 grid. As this system was designed for computer vision and robotics research, the three-dimensional models are simple polygonal models with no texture maps. Kanade designed a video-rate stereo machine [Kanade95] which not only generates a dense (at least 200x200) depth map but aligns it with a color image of the scene--all at 30Hz. The system uses the input of up to six cameras to determine depth using the multi-baseline stereo method but requires high-frequency textures to determine corresponding points. Neither of these systems has been applied to three-dimensional teleconferencing.

It is worth mentioning the structured light system for three-dimensional measurement of industrial parts by Valkenburg and McIvor [Valkenburg96]. While this system is not in real-time, it measures depth in a manner very similar to the method we are investigating. Using a projector, successive patterns of light are cast onto an object. A camera records each pattern as it appears on the object. By processing these successive frames, the depth of each pixel can be determined by triangulation (it is assumed the camera is some distance away from the projector) and previous calibration measurements.

Saied Moezzi's group at the University of California at San Diego [Moezzi96] developed a system to composite multiple video streams onto a dynamic environment model. Several cameras capture video which is analyzed to determine a voxel representation of the scene. The system utilizes a priori knowledge about the geometry within the scene, as well as camera calibration parameters. Moving objects are found by finding the difference between successive frames. Once a voxel model is obtained, an isosurface algorithm derives a boundary surface representation. From the boundary representation, the system maps video onto each surface in the scene. Each camera is considered in turn and the best view of each pixel is chosen. The resulting system can synthesize a view from any camera direction. The scene reconstruction is not performed in real-time. Since the reconstruction must lie within the field of view of several cameras, the system works best on a large area with low resolution rather than closer with a high resolution.

Proposed Work (This Year)

Telecollaboration Infrastructure

Rather than waiting for others to build systems in the future, we want to build and try something now. Given today's technology, we continue to work together to use existing technologies to build the best possible shared telecollaborative environment that we can. We plan to address several shortcomings of the vrapp system in the coming year in order to provide both a sense of personal presence and a shared work space that supports detailed model viewing and manipulation. We will address these shortcomings by: improving interaction, improving audio, adding a new display paradigm, and adding new techniques to increase the fidelity of the vrapp environment.

The Semi-Immersive Active Desk (a Responsive Workbench variant)

We intend to build a system that provides both a sense of personal presence and a shared work space that supports detailed model viewing and manipulation. The better the sense of presence, the more the user "buys into" the virtual environment and can communicate via visual cues. We propose to improve our wide-area T1-based telecollaboration system by adding a new display paradigm that is better suited to collaborative mechanical design.

We feel that the combination of a close-up hands-on display paradigm, and a more immersive reconstruction of the remote environment, will offer an improved feeling of presence while also promoting easier interaction by users.

One avenue is suggested from the Clear Board project [Ishii94] where two users manipulate a shared 2D workspace directly on a large drafting-table sized screen. Each has a view of the other user in the background, which included a view of the user's hands as they draw on the screen. This can easily be extended to 3D by using an Active Desk with a separate display mounted about the Active Desk. In Clear Board, video of the remote user is projected in the background of the 2D work area. In the extension to 3D, the remote user could be viewed via teleconference video on a monitor mounted on top of the Active Desk. At first a shared work area would be created by having an icon represent the remote user's pointer location. Later the geometry of the "hand avatar" could be changed depending on what operation the remote user is currently doing. Finally, the hand avatar could be 3D geometry captured from a Data Glove.

"Hand avatars" provide dynamic geometry that represents remote users, and also draws the users into the shared work area. But the work area view and remote user view is disjoint; a better combination of both would be beneficial.

The addition of an Active Desk to vrapp provides ergonomics that are better suited for collaboration than the other environments that we have already tried. Users will sit in front of a desk and gesturally interact with the models directly on the Active Desk. The approach extends the gestural interface technology already available [Zeleznik96] that has been well received. Hand avatars are simpler than "virtual" people and may be more effective.

Improved Interaction

We have proposed separately [Riesenfeld97] to incorporate a gestural interface reminiscent of Sketch [Zeleznik96] into Utah's Alpha_1 design system. We plan on incorporating this system into vrapp to provide easy manipulation and creation of geometry.

In addition, we plan on investigating various other possibilities to improve interaction in vrapp. There are non-obtrusive techniques (using a camera, for example) for tracking the user's head and hand that we need to incorporate into the system. There are also techniques for gestural annotation that need to be explored. Finally, there are user interface methods for masking network lag that have already been explored in [Conner97] that need be explored further.

Improved Modeling

A vrapp systems architecture design decision has made interactive modification in the immersive environment a rather difficult problem. We had chosen to implement vrapp in Open Inventor. Providing an available neutral representation format, that decision was made it possible for modules from several sites to interoperate, to be plugged into the environment, both for model and environment creation, interaction, and distribution. Originally, we created this capability simply for viewing unchanging 3D objects. Most immersive "modeling" environments being created today are for the viewing, i.e., reviewing unchanging 3D objects. Some use polygonal formats, some use vrml, some use STEP. We have chosen OpenInventer. These immersive environments have allowed modified models to flow in one direction only. Model updates have flowed from a CAD system, or in house modeling environment, into an immersive environment.

Hence, we are proposing to do something quite unique. We are proposing to create a capability for a two-way dialog with the modeling system in the vrapp immersive environment. We want to be able to modify the model (in the modeling system) from vrapp, as well as show a model in vrapp.

In the long term, the approach to immersively creating design objects will require results from the project connecting the Sketch gesture inferencing system to Alpha_1 object constructions, and hence are longer-term. However, we believe that in a shorter time frame we can introduce modifying design models within vrapp. Initially, we will start with being able to modify already created feature based parametric models Later, we will extend it to creating new models. The new research here is to link vrapp and the modeler, two separate subsystems, (i.e., the modeler is over 1M lines of code - it will not be reimplemented in OpenInventer). This scenario is more realistic than to suppose that there will be one monolithic immersive collaborative design system. This must be done in a manner that supports interactive modification rates and multiple user control. We have to find methods for manifesting application semantics that lend themselves to intuitively display.

We plan to initiate this effort using the feature based design handle approach of Sturgill [Sturgill95], related to the work of [Snibbe92]. The term "handle" often is used in modeling in an analogous way to "widget" in graphics. However, a "design handle" is clearly linked to the feature and shape, not necessarily the operation, and must not hide the design information. All design objects intended for vrapp will not only have to be translated into Open Inventor, but we will have to automatically create design handles for display and manipulation in vrapp, as well as provide appropriate communications for sending the model server updated values. The design handles will need to be displayable and modifiable in the immersive environment, but obviously not be a part of the model and must not obscure the model object.

One scenario of parametric model manipulation involves bringing a model of an artifact being designed into the vrapp environment, and then having distributed design collaboration participants at multiple sites adjust the parameters of the proposed design to properly proportion aspects of the design artifact, according to the constraints of their design expert role. This approach offers a means to support multi-disciplinary design, where considerations from several fields, such as mechanical, optical, and manufacturing engineering, must all be satisfied by a single design solution. One challenge is to integrate calculation and display "views" specific to the design discipline into the immersive environment, such as optical path calculations or mechanical torque versus speed trade-offs. Another challenge is to offer meaningful adjustment handles that fundamentally and intuitively relate to the design calculations and constraints, rather than just dragging parts of the machine around in space.

Real-Time 3D Reconstruction, a Structured Light Approach

We have been pursuing the use of structured light to extract 3D geometry in real time, with the goal of adding "normal" video as real-time texture to vrapp. Through various experiments we have made progress on controlling the timing of our structured light source (a digital light projector or DLP) and a camera in terms of illumination and acquisition. We are currently working on extracting the light from the images so that we can generate sparse depth maps.

While depth extraction is often performed by using image feature correspondence or triangulation, we are pursuing a method that employs projective geometry. A light ray projected from the DLP and a view ray from the camera meet at a unique point. We use a camera to observe images produced by light from a DLP as it terminates on the subject, and then compare points in the images with reference data collected during a calibration phase to estimate the depth at that point.

To solve the correspondence problem (which ray from the projector is being imaged by camera at a given point), we use binary coding for each ray. In fact, each vertical column of the DLP is given a code. If 256 columns are being projected we need 8 bits to code them all. By looking at 8 successive vertical-bar images we can uniquely identify each column. We use a gray-code scheme whereby the first pattern is a half-dark and half-light image, the next is two of each (interleaved), then four, then eight, and so on to code each of the 256 columns. After the system is calibrated, the DLP sends out these 8 patterns (vertical bar patterns) , and the synchronized geometry camera processes these 8 images to figure out projector columns. Using a huge lookup table generated during calibration, one can then compute the intersection of each camera 'ray' and plane projected by one vertical column from DLP. An additional texture camera is then used to capture live color images to be superimposed on the depth data. This second camera shares the optical axis with the geometry camera.

The complete process then involves two main tasks. First we perform a one-time off-line calibration where we fill a 3D lookup table with projective depth information for the entire working volume. After this calibration we begin the real-time operation of extracting, combining, and displaying geometry and texture. For the single-camera-rig system (see above), which is what we plan on having for the site visit (one texture camera, one geometry or depth camera, both sharing optical axis) we expect to have a texture video stream (640x480, 30 fps) and then a stream of probably 256x256 depth data at something like 5 fps. Note that if this reconstruction data is going to be sent over wires to a remote collaborator, the separation of depth and texture allows for some standard compression scheme(s) to be used with the video, e.g. h261, and likewise for the depth. (The depth data is relatively sparse.)

Advantages of the DLP Approach

The projective DLP approach offers several advantages over other methods. First and probably foremost, the approach provides real-time, full-color, 3D digitization that does not rely on explicitly placed facial landmarks, models, pre-scans, or other infrastructure to do its job. This means that among other things, a user can spontaneously add and remove things to and from the digitization working space without thinking about the digitization process. In other words the digitization is "transparent" to the user. While there exist commercial products which can digitize full-color 3D objects, e.g. the Virtuoso Shape Camera (Visual Interface, Inc.) and the InSpeck-3D system (Laser InSpeck, Inc.), these products do not operate anywhere near real time.

Another advantage is that we believe we can perform the depth extraction with a better resolution over a larger area than other techniques because of the brightness of the projectors. Furthermore, the system is scalable and multiple projector-camera images pairs can be used to increase the work volume.

Finally, and most interesting in particular with respect to potential future work, the DLP offers the capability of complete digital control over the image generation both in terms of timing and pixel addressing. This means for example that we can synchronize a camera with the DLP for the purpose of imperceptible depth extraction, and that we can very precisely render structured light patterns dynamically in the scene. For example, we might project one type of pattern on smooth surfaces, another on ones with high frequency content, etc. We might also choose to concentrate the structured light in a particular portion of the working volume, e.g. a region with lots of motion (as detected by differencing camera images).

Combining 3D Reconstruction and the Semi-Immersive Active Desk

We propose to combine the Active Workbench ideas with new sketching and modeling ideas, and a structured light-based reconstruction of a remote collaborator and his or her immediate environment, including facial expression. We propose to augment the high-resolution, close-up, hands-on view provided by the Active Desk setup to show the remote user in the background. Using the DLP-based technique we will display the real-time 3D reconstructions of both a remote collaborator and his or her immediate environment.

In this way, the Active Bench participant can be looking at a 3D reconstruction of the remote collaborator and the mechanical artifact at the same time. This person enjoys all of the benefits of the new Active Workbench paradigm, including our new collaborative sketching and gestural interaction techniques, while at the same time they enjoy all of the subtle nuances of both verbal and nonverbal interpersonal communication. Additional participants, using for example HMD's or a BOOM, can then view the two main collaborators from the outside, looking in toward a synthetic bench with the respective avatar and 3D reconstruction.

Passive Reconstruction and Image-Based Rendering

Along with implementing the short-term structured light-based 3D reconstruction system, we plan to continue and work toward our long-term goal of a completely passive, remote reconstruction system. In particular, we feel that within the year we should be able to perform a single, completely passive reconstruction of a static scene from within the Cornell cube environment. The scene would be restricted to objects of certain known materials, and the lighting would be restricted to that environment, which has been very carefully characterized. We anticipate that a single reconstruction will take a relatively long time, and as such believe that we will not be able to achieve anything near real-time operation, even for the very restricted circumstances of our scene. This work will involve Cornell University, the University of North Carolina, and the University of Pennsylvania [Fuchs97].

We also plan to continue our image-based rendering work, in particular with the goal of better understanding the requirements for depth (disparity) data. We plan to use Cornell's unified measurement and rendering system to evaluate the accuracy of image-based renderings with varying numbers and position of "tie points", the scene points for which depth or disparity must be known in order to warp between reference images. A better understanding of the tie-point requirements will help us to develop the systems that we hope will eventually extract such points from the scene automatically. This work will mostly involve UNC, with considerable help (software, images, and expertise) from Cornell.

Collaboration Artifacts

We are committed to together developing useful objects that we could not develop individually. Besides producing useful products, this work teaches us more and more about collaboration in general, e.g. what aspects are most important. While ideally we would use our T1-based telecollaboration infrastructure to do all of the artifact development, the technology is not good enough yet. Our approach to the artifact development is then to not let the shortcomings of the infrastructure get in the way, instead we continue to use all means at our disposal on our collaboration projects.

A Wide-Field-of-View, High-Resolution Video Camera Cluster

In particular, we propose to work together to build a wide-field-of-view (WFOV), yet high resolution video camera cluster [Keler97]. The WFOV camera uses several conventional cameras in a single cluster and attempts to reconstruct a virtual camera with a 150 degree or greater field of view. All of the cameras have the same imaging point made possible by the use of angled mirrors. This allows for a seamless and optically correct image and, with a novel aperture clipping device, will be correct across the seams. The WFOV virtual camera forms a high-resolution image that is updated at real-time rates.

As an artifact, the WFOV camera cluster, is particularly well-matched to the Active Desk, because it is physically a good size, i.e. the camera cluster itself is small and can be held in the hands where a high resolution display will provide a useful view. Using our proposed infrastructure, remote collaborators could for example hold the model or the real object in his or her hands and

adjust the placement of cameras in the cluster;
identify and adjust mirror surfaces; and
observe the corresponding changes in representative light rays.

Bibliography

[Ahlers95] Klaus H. Ahlers, Andre Kramer, David E. Breen, Pierre-Yves Chevalier, Chris Crampton, Eric Rose, Mirharan Tuceryan, Ross T. Whittaker, Douglas Greer. "Distributed Augmented Reality for Collaborative Design Applications." in Computer Graphics Forum, Vol. 14, No. 3 (Eurographics '95, Maastricht, The Netherlands August 28 - September 1, 1995). pp. C-4 to C-14.

[Bajaj94] C. Bajaj, V. Anupam "SHASTRA - An Architecture for Development of Collaborative Applications", International Journal of Intelligent and Cooperative Information Systems, 3, 2, (1994), 155-172.

[Banks90] Banks, M., and Cohen, E. `Realtime Spline Curves from Interactively Sketched Data', 1990 Symposium on Interactive 3D Graphics, 1990.

[Banks93] Banks, M., Cohen, E., and Mueller, T., `An Envelop Approach to a Sketching Editor for Hierarchical Free-form Curve Design and Modification', in Proc. Knot Insertion and Deletion Algorithms (R. Goldman and T. Lyche, eds.), SIAM, 1993.

[Cohen97] Cohen, Elaine, Henry Fuchs, Rich Riesenfeld, "Single Imaging Point, Multiple Imager, Wide Field of View, Camera System," 1997 Director's pool proposal for the NSF Science and Technology Center for Computer Graphics and Scientific Visualization

[Conner97] Conner, D.B. and Holden, L.S., Providing a Low-Latency User Experience in a High-Latency Application. To appear in Proceedings of 1997 Symposium on Interactive 3D Graphics (Providence, Rhode Island, April 27-30, 1997).

[Fuchs97] Fuchs, Henry, Don Greenburg, Ruzena Bajcsy, "Unifying Image Synthesis and Analysis on a More Solid Scientific Basis," 1997 Director's pool proposal for the NSF Science and Technology Center for Computer Graphics and Scientific Visualization

[Ishii94] Ishii, Hiroshi, Minoru Kobayashi, Kazuho Arita, "Iterative Design of Seamless Collaboration Media," CACM, Volume 37, Number 8, August 1994, 83-97

[Kanade95] Kanade, T., et al, "Development of a Video-Rate Stereo Machine," Proc. of International Robotics and Systems Conference (IROS-95), Pittsburgh, PA, August 7-9, 1995.

[Macedonia95] Macedonia, Michael R., Brutzman, Donald P., Zyda, Michael J., Pratt, David R., Barham, Paul T., Falby, John and Locke, John. "NPSNET: A Multi-Player 3D Virtual Environment Over the Internet," in the Proceedings of the 1995 Symposium on Interactive 3D Graphics, 9-12 April, 1995, Monterey, California.

[McMillan95] McMillan, Leonard, and Gary Bishop. "Plenoptic Modeling," Proceedings of SIGGRAPH 95, (Los Angeles, CA), August 6-11, 1995. pp 39-46.

[Moezzi96] Moezzi, Saied, Katkere, Arun, Kuramura, Don Y. and Jain, Ramesh. "Immersive Video," Proceedings of VRAIS 1996, pp. 17-24.

[Ohya93] Ohya, Jun, Kitamura, Yasuichi, Takemura, Haruo, et. al. "Real-time Reproduction of 3D Human Images in Virtual Space Teleconferencing." IEEE Virtual Reality International Symposium. September 1993.

[Snibbe92] Snibbe, S.S., Herndon, K.P., Robbins, D.C., Conner, D.B. and van Dam, A.,"Using deformations to explore 3D widget design," Computer Graphics (Proceedings of SIGGRAPH '92), 26(2), ACM SIGGRAPH, July, 1992, pp. 351-352

[Riesenfeld97] Riesenfeld, R., van Dam, A., "Exploring the use of Gestures for Constructing Parametric Mechanical CAD Models," 1997 Director's pool proposal for the NSF Science and Technology Center for Computer Graphics and Scientific Visualization

[Sturgill95] Sturgill, M., Cohen, E., and Riesenfeld, R., `Feature-Based 3-D Sketching For Early Stage Design', Proceedings of the 1995 ASME Computers in Engineering Conference.

[Sturgill97] Sturgill M. "A feature-based, conceptual approach to early stages of part design", PhD Dissertation, University of Utah 1997.

[Valkenburg96] Valkenburg, R.J., and A.M. McIvor, "Accurate 3D measurement using a Structured Light System," SPIE 1996.

[Watanabe95] Watanabe, M., S. K. Nayar, and M. Noguchi, "Real-Time Implementation of Depth from Defocus," Proceedings of SPIE Conference, Philadelphia, October 1995.

[Yoshida95] Yoshida, Miko, Yuri A. Tijerino, Shinji Abe and Fumio Kishino, "A Virtual Space Teleconferencing System that Supports Intuitive Interaction for Creative and Cooperative Work", 1995 Symposium on Interactive 3D Graphics, Monterey, California, April 9-12, 1995, pp. 115-122.

[Zeleznik96] Zeleznik, R.C., Herndon, K.P., and Hughes, J.F., "SKETCH: An interface for sketching 3D scenes," Computer Graphics (Proceedings of SIGGRAPH '96), August,1996.

[Zeleznik97] Zeleznik, R.C., Forsberg, A.S., and Strauss, P.S., "Two pointer input for 3D interaction," to appear in Proceedings of 1997 Symposium on Interactive 3D Graphics (Providence, Rhode Island, April 27-30, 1997).

Home Research Outreach Televideo Admin Education


Home	Research	Outreach	Televideo	Admin	Education

Director's Pool Telecollaboration for Mechanical Design and Manufacturing

Andries van Dam, Brown University, Donald Greenberg, Cornell University, Henry Fuchs, University of North Carolina - Chapel Hill, Richard Riesenfeld, University of Utah

The Long-Term Vision

Structure of This and Related Proposals

Status

Telecollaboration Infrastructure

Pro's and Con's of Current Display Devices

Our Next Display Device

Pro's and Con's of the Current Reconstruction Technique

Reconstruction of Remote Environments

Collaboration Artifacts

Other Related Work

Shared Telecollaboration Environments

IInteraction and Modeling

3D Reconstruction Techniques

Proposed Work (This Year)

Telecollaboration Infrastructure

The Semi-Immersive Active Desk (a Responsive Workbench variant)

Improved Interaction

Improved Modeling

Real-Time 3D Reconstruction, a Structured Light Approach

Advantages of the DLP Approach

Combining 3D Reconstruction and the Semi-Immersive Active Desk

Passive Reconstruction and Image-Based Rendering

Collaboration Artifacts

A Wide-Field-of-View, High-Resolution Video Camera Cluster

Bibliography

Director's Pool
Telecollaboration for Mechanical Design and Manufacturing