Director's Pool
Inverse and Interpolation Methods for Image Synthesis: Renewal Proposal

J. Arvo, A. Barr, J. Hughes, P. Perona, K. Torrance

Summary

Several important course corrections have been made within this project since it began, as reflected in the new title. First, the recent success of image-based methods has made it vitally important to broaden the scope of the project to include interpolation as well as inversion, and to explore potential connections between the two approaches. There are sound theoretical reasons for this broadening of focus: most notable is the fact that interpolation problems are far easier to pose correctly; that is, to ensure the existence of a unique solution, which is a prerequisite for rapid and robust solution. In recognition of this fact, a new goal of this project will be to examine the process of light field interpolation from a mathematical perspective, and to establish limits on what it can achieve. For example, how much information (both measured and a priori) is required to attain a given uniform accuracy over a range of interpolated views? How well can we hope to do with n views in a given configuration? Given n cameras, which configuration minimizes the maximum interpolation error? These questions can be easily formalized and attacked using standard tools of approximation theory and information based complexity, which is the quantitative study of approximate computation that employs incomplete and/or contaminated data.

The second course correction has been a narrowing of focus within the class of inversion methods. A new approach to object recovery is being pursued that is quite promising, and builds upon one of the strengths of the graphics community. Given the intrinsic difficulty of inverting the rendering equation, we are pursuing a useful and apparently achievable intermediate step which will be far less susceptible to the severe mathematical difficulties of general geometry capture. The new approach focuses on a parametrized class of geometries defined by a generative model, such as the class of generalized cylinders, the class of human hands, or the class of human faces. The task of inversion then becomes one of parameter fitting, which remains highly non-linear, and potentially ill-posed, yet far more tractable by virtue of its finite dimension. The goal of this type of inversion would be to capture highly constrained geometries; that is, to infer the configuration of a hand, for example, given the a priori knowledge that a set of images corresponds to an element of the known universe of geometries, generated by a compact parametric representation (a generative model). This problem is far simpler than general object recognition, yet could be tremendously useful in the context of human-computer interaction and immersive teleconferencing, through the "reconstruction" of hands and faces. The technical challenges entailed in this approach, and likely solution techniques, are described below.

Introduction

Although the inversion problem was acknowledged at the outset to be extremely challenging, its connection with the goals of the vision community were perhaps not fully appreciated within the STC. Further research and collaboration has made the extent and nature of the connections far more obvious, and clarified some of the subtle distinctions.

While the rendering equation [5] has proven to be a powerful organizing principle within the computer graphics community, it has also had a number of unfortunate negative effects, some of which were brought out by attempts at inversion. Most importantly, the articulation of the rendering equation as a concept unique to image synthesis has contributed to a rift between global illumination and disciplines such as radiative transfer, illumination engineering, and computer vision. This rift is now slowly closing. A concrete instance of this disconnect is that the notion of "inverting" the rendering equation to recover geometry from images by its very phrasing appeared to be subtly distinct from the problems confronting computer vision. This is most emphatically false, as the dozens of algorithms and hundreds of papers produced by the vision community on this very problem make abundantly clear. While this does not preclude contributions from the computer graphics community, it does emphasize the importance of targeting our efforts wisely. Areas in which our contributions can have the greatest influence lie primarily within the following categories:

Tools for measuring, modeling, and simulating non-Lambertian reflection.
Geometric modeling.
Image-based rendering (as distinct from inversion).
Applications in human-computer interaction and immersive displays.

All of these come into play at some level in the proposed work.

With respect to the initial goals of the project, several interesting results have been achieved. For example, it has been shown that some of the sources of difficulty in image synthesis turn out to be beneficial in related inverse problems. For example, interreflection near concavities makes approximation of global illumination by finite element methods very difficult, yet this very phenomenon creates an opportunity to infer shape from interreflection [7] (which is an inverse problem). These results came from analysis of the linear transport operators that govern radiative transfer and were done in collaboration with Eric Veach at Stanford University. (A paper on this topic is in preparation.) Other ideas, such as piecewise-smooth 4-manifold approximations to the light field still appear promising and are being pursued. Work with Pietro Perona's vision group at Caltech has begun, as we explore vision-based methods for understanding human gestures [4].

Error Analysis for Image-Based Rendering

Because image-based rendering appears to be an extremely viable alternative to inversion when geometry capture is not the explicit objective (i.e. when the goal is simply to generate new images), it is important to understand its theoretical limits. Toward this end, we propose to analyze the process using a number of tools from approximation theory and information-based complexity theory [9, 16], in a similar spirit to previous work in global illumination [1]. Such tools have already been applied to a number of problems in computer vision [2], but not yet in the context of image-based methods.

It is likely that measurements obtained from the measurement lab will be helpful in closing the gap between theory and practice. For example, in verifying error bounds for a range of interpolated images constructed from n measurements, it will be necessary to characterize several aspects of actual BRDFs, such as the maximum reflectivity and the spread of the specular peak. Physical measurements will be the key to obtaining meaningful bounds, and in assessing the accuracy of the resulting bounds (i.e. determining how tight they are in practice). This seems to be an excellent place to bring together analysis and measurement.

Finite-Dimensional Inversion

Nearly every way in which the rendering equation can be "inverted" is highly non-linear and, more importantly, illposed in the sense of Hadamard [15]. This implies that some means of regularization is required before we can expect to get any type of useful answer; that is, a reformulation of the problem that retains the essential properties of the original problem while guaranteeing the existence of a solution and its continuous dependence on the problem instance. One approach to regularization is to incorporate a priori information about the scene as additional constraints, such as continuity, smoothness, or some variational principle (such as energy minimization). These approaches have all been explored by the vision community [10, 14]. A more aggressive form of regularization is to restrict the problem solution to a finite-dimensional space, such as a parametric family of surfaces. This too has been explored by the vision community to some extent [6]. It is this approach that seems most likely to be both achievable and useful in the near term.

There are four major challenges to this approach:
Constructing a parametric model for a given geometry class.
Estimating geometric parameters from images.
Recognizing instances of a model class.
Constructing model classes ab initio from images.

We propose to attack only the first two of these, since the final two are significantly harder, and need not be solved in order to derive some practical benefits from the approach.

For the first phase (constructing a parametric model), we propose to use the generative modeling technique introduced by Snyder [11]. This approach is extremely flexible, and results in compact parametric surface representations. The second phase (parameter estimation) corresponds to a very challenging non-linear global optimization problem. It can be partitioned into three sub-problems:

Constructing feasible regions.
Finding an initial guess.
Local optimization.

The first sub-problem is concerned with eliminating large regions of the parameter space from consideration so that the optimization need only search a relatively small range of parameters. This is important, since the parameter space may have hundreds of dimensions. Several approaches are likely candidates for this phase, including coarse octree construction [13] and interval analysis [12].

The second sub-problem is crucial since it is likely that octree and interval methods will still leave large regions of parameter space containing a large number of local minima. Choosing an initial guess that is close the the correct (global) minimum is necessary, and is the crux of non-linear optimization. Several ideas shall be pursued here, including Krawczyk's interval-based Newton's method [8], continuation methods [3], which seek to efficiently locate all minima (and are in a sense complementary to interval methods), and potentially even methods developed for identifying radar cross-sections of various geometries. The latter approach would seek to classify features in the frequency domain of the image.

The final sub-problem is mathematically tedious, yet conceptually simple. The idea is to use gradient descent in the parameter space to rapidly converge to a closest minimum (i.e. the best parameter fit). Because the number of parameters may be quite large, there will be a tremendous computational advantage in doing some or all of this step symbolically rather than numerically through finite differences. Here is where the advantage of a generative model is most evident, since the relative simplicity of its mathematical description lends itself to symbolic differentiation.

This approach will be the Ph.D. project for a new graduate student at Caltech (Ravi Ramamoorthi), supervised by Jim Arvo. Ravi has an excellent background in the required areas, and has already done a considerable amount of work on the project, including some very preliminary results using John Snyder's generative modeling software. Participation from Ken Torrance will be required in pursuing methods based on electro-magnetic scattering (i.e. radar cross-sections) and eigen modes (fundamental configurations that characterize the space). Al Barr will contribute his expertise in interval methods, manifolds, generative modeling, and electro-magnetic theory. Pietro Perona will assist in image capture, image processing, and fundamental vision algorithms. Spike Hughes will be crucial in developing many of the mathematical elements, including Fourier analysis, image processing, and both local and global optimization methods.

Home Research Outreach Televideo Admin Education