Iconic Representations of Buildings

(i.e. "Lego-ization")


Given a facade of a building, identify key features, such as windows and doors, and construct an iconic representation using a library of architectural primitives, with discrete dimensions and limited color palette, that preserves these features. Features with similar appearance should use the same representation, and objects with a known classification should attempt to use primitives in the same category.

Pipeline Overview

Our algorithm begins with a labeled, rectified, facade image. We further cluster the scene elements with each label to identify visually similar elements, and generate a represntative for each cluster. We then snap the objects (and image bounds) onto a grid compatible with the primitives, subject to the following constraints:
  • All objects that were clustered together must maintain the same dimensions.
  • Objects can't overlap, and must remain in the image bounds.
After finding object locations satisfying these constraints, the image content must be reshuffled accordingly. We then subdivide the non-object areas of the image into rectangular regions. Each object region and each non-object region is then solved for independently.
The project is implemented in Scala using the JVM bindings for OpenCV, plus OscaR and JSAT. It is available on GitHub. Test images were drawn from the ICG Graz50 and ECP facade datasets.

Section 1: Object labeling and clustering

Automated labeling of windows and doors is difficult. Aspect ratios may differ and fine edges are often important details. Moreover, glass surfaces are frequently present and reflections provide additionally difficulties. An SVM trained on a dataset of approximately 4000 windows and 20000 counterexamples using a "HOG + image dimension" (with binned dimensions) feature produced totally garbage output: SVM garbage The remainder of the project made use of the labels from the training set in order.
For clustering we used meanshift, since the number of clusters is not known in advance, and the same feature. This appears to have been insufficiently discriminative beyond image dimension, but worked reasonably for the test cases. To generate examples from each cluster, the cluster members were averaged together, which appeared to work well. Here an example image is shown before and after the clustering process, along with the labels used. original labels averaged with clusters (Click here or here for two more examples.)

Section 2: Snap-to-Grid and Subdivision

Before running this step, the source image may be resized to a desired scale. It is important to do any resizing before snapping-to-grid occurs to avoid rounding errors later in the process. The snap-to-grid constraint program is solved using Large-Neighborhood Search in OscaR. We require the following: the image boundary and object boundaries must be grid-aligned. The objects may not overlap, and must be fully contained by the image boundary. Additionally, objects belonging to the same cluster must have the same dimensions. We attempt to minimize the amount that each object's center must be moved, and the change in side-length of each object. While this is a hard problem in general, most instances we tested were under-constrained and solved easily. Here we see the image before, and after the snap-to-grid procedure: objects snapped to grid
Once the new sizes and locations are calculated, the image content must be reshuffled to match. We treat each vertex, of either objects or the image border, as a keypoint. We perform a remap operation using the following function:
  • Classify every point by object containment
  • If contained: shift with parent object
  • Else: shift by distance-weighted average of the movement of the 3 nearest neighbors (drawn from keypoints)
An example map is visualized here, with the red channel representing the transformation on row coordinates, and blue representing the transformation on column coordinates. The value at a given coordinate represent a normalized source coordinate. The relatively smooth appearance of the map shows that it should preserve the image appearance relatively well. Transition artifacts from changes in nearest-neighbor relations and object boundaries are visible, however. remap vis
The non-object regions are subdivided rectangularly according to the following algorithm:
  • Sweep downwards to the top of the next object to form one region,
  • Subdivide and recurse to the left, right, and below the current object
An example subdivision is shown here: subdiv We note that the current algorithm produces some undesirable subdivisions of vertically contiguous regions, as it "cuts through" other objects. An improved result could be produced by a trivial patching-up post-processing stage, or by a more involved recursive step. Several displeasing artifacts from the previous example are repaired here: fixed subdiv

Section 3: Primitive placement

Object and non-object primitives have different requirements and must be treated differently. To handle non-object primitives, we solve a dynamic program that rewards accurate color reproduction, and favors primitives with large area over primitives with small area. We further require that all primitives within the same row of a region being solved must be of the same height, to make the problem more tractable. The reward function in question is dp reward. The k parameter allows for rescaling according to pixel format. Useful values of λ where expected to be small values 0 < λ < 1; however, this was found to reward larger area bricks insufficiently well. In test cases, values λ > 1 were found to work well but this may turn out to be too strong a correction for larger regions. An example output from the dynamic programming stage is shown here: solved
Object primitive placement is not yet implemented, but we anticipate that SIFT keypoints will be useful for matching against the object primitives database, as windows and doors have sharp corners and appear at at variety of scales and aspect ratios.