The
goal of my project is to generate non-photorealistic images using some
techniques. The field of NPR has been well developed, and there are many kinds
of styles of NPR, such as cartoon-like images, bitonal
images and other funny images. In my final project, I have implemented two
different styles of NPR image, which are cartoon-like image and gray-scale
image with different types of texture mapped based on color. The two techniques
are separately implemented, so my project should de divided into two parts.
The
first part of my project is the implementation of cartoon-like images. The
cartoon-like image is a simplified illustration from original image. It doesn’t
have fine textures and rich edges. Instead, it uses simple simple color regions
and bold lines around the boundary between objects in the image. I divide my works
on this style of NPR into two parts: the first part is to do color
quantization, which is aiming at reducing the color complexity in the original
image; the second part is to add fine edges to the image, making the contrast
between different color regions more apparent.
We use such a style of image to abstract the original image, turning it
to a non-photorealistic image. I take the reference of Holger
Winnemöler
et al. The
results turn out to be good.
1.
Convert the image from RGB space to CIE
LAB space.
2.
Use fast bilateral filter to
successively filter the image.
3.
Do color quantization.
4.
Use Dog to detect edges.
5.
Warp the edges, making them sharper.
6. Overlay the edges on the color image.
We need to firstly convert the image from
RGB space to LAB space. LAB space is designed to approximate human vision, and
I can easily get the luminance channel (L)
and chrominance channel (a,b). In my project, I do
color quantization only in luminance channel, and my fast bilateral filter is
done in L,a,b channel.
The
second step is to use fast bilateral filter and iteratively filter the image
(In my project, I do this for 3 times).
The reason for using fast bilateral filter is simple: we want to blur
those local contrast while preserve the contrast between strong edges. Using a
regular Gaussian filter will destroy the edge. The edges are essential since we
will do edge detection based on the blurred image. On the other hand, just
filter once cannot generate an enough blurry image. Since we will do color
quantization later, we must reduce the local contrast to enough extent, and
thus we iteratively filter the image. Naïve bilateral filter will cost too much
time so I use fast bilateral filter that I have implemented in project 5. Here are
the original image and images after blurring once, twice and 3 times.
Note
that the edge is preserved very well. The image that has been filter for more
than 3 times has become textureless. It’s ideal for
doing color quantization.
The
third step is to do color quantization. The regular color quantization is to divide
the full range of one channel into several bins, and we put the pixels into the
corresponding bins, and then set the specified value of that channel to the
value of the bin that the pixels are in. But this method is problematic because
the boundary of each bin is too hard, which may lead to sharp boundaries in the
processed image. Instead of using that approach, I add a tanh() function and a
parameter to
make the color quantization smoother. I use the following formula to do color
quantization on luminance channel of the blurred image I got from the step
above.
The
free parameter enables me to adjust the softness of tanh().
Here is a comparison between regular approach of color quantization and my
approach.
Diagrams
from Holger Winnemöler’s
slide
Larger will make the result sharper. The image below
on the left side is generated by smaller
, and the right side is larger
.
There
are many operators that can be used to detect edge, such as Canny, Sobel and Dog. In my project I use Dog operator to detect
edges. It has many advantages such as computational efficiency and it is not
prone to disconnectedness like Canny. Instead of regular Dog function, I use an
extended version Dog that has many parameters. Modifying these parameters can
generate different results, and I can easily adjust the thickness of the edge,
or adjust the magnitude of noise.
My
Dog operator uses the following formula:
Where
is the image after Gaussian filtering.
Here
is the image after doing Dog.
Indeed,
if having finished the above 4 steps, it’s enough to overlay the image of edge with
the color image. Here I do an image-based warp in order to make the edges less
blurry and sharp the edges. Image-based warping will move the pixels along the
direction of its gradient. If the gradient is zero, it won’t be moved. We can
consider the image-based warping as a displacement map. This method can be used
to sharp the edges or attenuate the edges.
The
approach is simple. We should firstly extract the gradient for each pixel by
doing vertical and horizontal Sobel operator. Then we
blur this gradient map to attenuate the range of gradient. Next we move each
pixel along its direction of gradient.
Here
is one example of comparison between original edges and edges after warping.
The result of warping is the rightmost one.
Here
is another example from Holger
Winnemöler’s slide:
Then
it’s the last step, we simply overlay the edges on the
image of color.
There
are no ‘absolute’ failure cases. Due to the limitation of Dog, the images are
often suffering from noise. I cannot use erosion to eliminate those noises
since this will destroy the normal edges. A good edge detector will generate a
better result, I believe, but I haven’t found such a detector. I test the
images using Canny, and the results turn out to be much worse.
Using
the method above can generate a cartoon-like image nicely, and I believe this
method may help those artists who are working in the industry of cartoon to
some degree. They can produce some specified scenes using this automatic
algorithm to transfer ordinary images to cartoon-like images.
As
for the algorithm itself, I think there are still many parts that can be
improved. The color quantization approach can provide good results in my
project, while the edge detection often are suffering from noise or losing
important edges. The simple Dog operator cannot extract those important edges
automatically. If I modify the parameter for Dog, like to make the edge sharper
and thicker, then the noise will be apparent. The best algorithm of edge
detection should preserve those important edges even though the edges are not
so strong as those unimportant edges or noise.
The
second approach of my project is to generate a gray-scale image, with different
texture mapped to different regions. The approach is totally different from
what I have illustrated above. In order to map the textures, we must firstly
have a texture library. I manually generate the textures by using one commercial software called ‘Manga studio’. I take the
reference of Yingge Qu
et al. The basic algorithm of texture mapping can be divided into several
parts: segmentation, kmeans clustering, and texture
feature matching.
1.
Build a texture library
2.
Build texture features for each style
of texture
3.
Do segmentation
4.
Do kmeans to
cluster colors
5.
Texture mapping
The first step is to build a texture library
and we will select textures from this library to map to the image. We should
have several different types of textures. In my project, I generate some common
textures using a software called ‘Manga Studio’. The
textures are all gray-scale image. In addition, we must generate textures of different
density of each style because we should take luminance into consideration when
selecting appropriate textures. Darker region should be mapped a large density
texture and brighter region should be mapped a small density texture.
The
library should be a two-dimension library that the horizontal axis is one style
of textures of different luminance, and vertical axis is the different styles
of textures.
…
…
…
…
Segments
exhibiting apparent texture characteristics should be assigned a texture based
on texture similarity. To quantify the texture characteristics, we compute the
texture features using Gabor wavelet. It’s a good texture identification
technique for texture. In my project I compute Gabor wavelets of 8 different orientations
and 3 different scales and for each texture I generate a 48-dimension vector to
be the texture feature. Each time I use a Gabor with specified orientation and
scale to filter the image, I extract the average and standard deviation value
of the result image. Because we have 8*3 = 24 Gabors,
so the vector of identification is 48-dimension.
Here
is the visualization of 24 Gabor wavelets composed by 8 orientation and 3 scales.
Gabor has real part and imaginary part. This is the visualization of real part:
We
must segment the image into different regions. After segmenting, we can select
appropriate texture to map to the region.
In
my project, I use Meanshift to segment image. It’s a
highly color-based method. Those pixels that have similar color will be
clustered into a segment.
I
didn’t implement Meanshift. Instead, I used EDISON’s
implementation for better performance.
After
segmentation, I compute the texture feature for each segment for future use.
The method is just like the above.
Here
is the visualization of each segment, with random color filled.
In
this step, we cluster the average color of all of the segments in one image,
using Kmeans algorithm. Suppose we have n texture
styles, then k equals n. I firstly transfer the image from RGB space to CIE LAB
space. Secondly I do Kmeans algorithm on (a,b) plane, regardless of the L
channel (luminance), because we only care about the chrominance now. There are
many distance metric of Kmeans, and I use the cosine
as the distance metric, because the whole plane of (a,b) vectors is a plane, with zero to be achromous.
After
doing that, each segment should belong to one cluster.
In
my project, I use two strategy of texture mapping. Firstly I compute the
distance between the texture features of each segment and each texture. Those
segment that is similar to the texture in the library should be mapped that
texture style. I set a threshold to control if they are similar. If the segment
cannot find a similar texture style, then we looks at which cluster the segment
is in. We randomly assign a texture style to all of the segments in the same
cluster.
Note
that we should preserve the luminance of each segment. As I have mentioned, the
texture library is a 2-D library. If the texture styles are different, then
obviously their feature vector should be different. If the textures are similar
and only differs in luminance (density), then the feature
vectors are similar.
We
have chosen a style for each segment, and we must then compute the average
luminance of that segment, and then map a texture with appropriate luminance to
the segment. If the average luminance of each segment is too large or too
small, we simply set the color of that segment to white or black.
After
doing texture mapping, we simply overlay the result image with the image being
edge detected.
This
method is highly color based. If the color distribution in an image is too
sparse, then the Kmeans will fail to cluster similar
colors, and Meanshift will segment badly.
Furthermore, Meanshift algorithm cannot segment
smartly since it only clusters those color-similar regions. Sometimes an object
will be divided into many segments.
Both
of the algorithms I used have some limitation and need to be optimized in the
future work.
In
addition, what I have done is only on single image. I haven’t tried to
implement them in video. Obviously more problems will appear since we must
preserve the temporal coherence.