This algorithm implements the seam-carving method for intelligent image resizing. In standard image resizing, pixels are removed or interpolated between at a fixed interval that reflects the desired scale factor, yielding an image that is uniformly smaller. Image Retargeting seeks to identify and preserve the most interesting and important contents of an image, instead removing less important surrounding pixels to achieve the same desired dimensions. To calculate each pixel's importance, or energy, there are many possible methods, most involving gradient comparisons. As we have previously observed, human visual processing is especially sensitive to intensity gradients, so it follows that the least interesting pixels in an image are likely the pixels that are most similar to their neighbors.
The main steps of the algorithm are laid out below, and optimizations are discussed after.
Finding the minimum-cost path through the energy image could be a complex problem, requiring exponential time to complete using an exhaustive depth-first search search. However, utilizing dynamic programming reduces the search to linear time in the number of image pixels. To minimize image distortion, we require that each pixel in the seam must be adjacent (laterally or diagonally) to the next pixel in the seam. Therefore, for each potential seam pixel, P, there are only three possible locations for the previous seam pixel, as shown below.
The dynamic approach proceeds as follows:
For each row of pixels in the image: For each pixel of a row: Find the minimum-cost path to reach that pixel by taking the minimum of the costs to reach each of the three possible preceding pixels and adding that to the cost of the current pixel. The updated cost of the current pixel now reflects the total cost of the minimum-cost path to reach that pixel.
By this approach, each pixel is visited only once, and at each visit, at most three other pixels are examined, yielding linear time performance in the number of image pixels.
To further optimize the calculation of seam costs, the find_min_path() function takes as optional arguments the previous seam-cost image and the x coordinates of the last seam removed. With this extra information, the function only updates the pixels in the seam-cost image that could have changed with the removal of the seam, on average halving the runtime of this operation. The number of pixels that need to be updated also depends on the radius of the energy calculation filter. As the function travels row by row, the updated range is expanded by 1 pixel, to account for the propagation of previous updates. If values at the ends of the range are unchanged, then the range is shrunk to reprocess as few pixels as necessary.
The baseline implementation of the image energy function used a local gradient filter, simply summing the intensity differences between each pixel and its immediate neighbors. This energy calculation yielded reasonable results. As we have discussed, human visual perception is sensitive to gradient changes. The most extreme gradient changes occur at edges, so the energy calculation was reformulated using edge-emphasizing filters.
First, a simple sobel filter, passed over the image both vertically and horizontally, acted to emphasize edges and gradient inconsistencies in textures, however, the sobel filter is vulnerable to noise. Next, a derivative of gaussian filter was tried, first smoothing the image and then emphasizing edges. The derivative of gaussian filter proved more resistant to noise, and still emphasized edges well. In the final energy calculation method tried, the responses of the derivative of gaussian filter were squared, yielding much stronger edge responses than any of the previous filters. By emphasizing edges, the seam-finding operation tended to avoid whole objects or well-defined regions, preserving what was likely important and noticeable content in the image. The energy images and final retargeted images for each energy calculation method are displayed below in Results.
The results of the algorithm's image-retargeting operation are displayed below. The original image and the image scaled to the target dimensions are displayed first. Below that, the energy image and retargeting results for each energy calculation method are displayed side-by-side. Failure cases are discussed at the end.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
This image demonstrates that the retargeting operation may be applied in both the x and y dimensions.
Original and Scaled
Derivative of Gaussian
In this first case, we see the importance of emphasizing edges, as the gradient energy method yields a grotesque face. Human visual perception is also very sensitive to faces, and even small manipulations to facial structure can appear startling. As a first measure, edge emphasis will protect the shapes of faces, given enough contrast between the face and the background, as evidenced with the edge-emphasizing energy methods. Explicit face detection and reweighting of the energy function would be the most reliable method to preserve faces.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
This failure case demonstrates the difficulty of preserving straight lines. Even when lines are emphasized, if they occupy most of the image, they must be cut, and any cut through a slanted line will create a noticeable sheared junction. In this case, retargeting performance could be improved by implementing a "forward-looking" energy calculation that finds the seam that "inserts" the most energy into the image, rather than just the seam that contains the least energy. This augmented method tends to better preserve lines.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
This case also demonstrates that parallel lines are difficult to preserve. The violation of parallel lines is especially apparent to our visual perception, as they are often indicators of perspective in an image.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
Parallel lines are also disrupted in this image, though we can see that by uniformly de-emphasizing the lines, as in the gradient energy image, they are less distorted in the retargeted image.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2
This failure case demonstrates the difficulties that strong texture poses. By our energy calculations, the foliage actually appears to be the most important content, and the train is chipped away. This is perhaps a case where user annotation could benefit the algorithm, so that the train could be designated as a high importance area.
Original and Scaled
Gradient
Sobel
Derivative of Gaussian
Derivative of Gaussian2