I implemented the basic algorithm with few to no additional refinements. One thing that helped my matching a lot was decreasing the threshold on interest points. The matching process did a good job of filtering out bad points, and increasing the points in general gave a lot more matches -- 25 good and 7 bad vs 6 good and 3 bad. I found that ensuring that the best match was twice as close as the second best match was a good threshold there -- any lower and I saw far more false matches. Running this on two images of the Sleeping Beauty Castle in Paris produced better results -- I don't have precise accuracy statistics, but I ended up with almost 600 matches and the top 100 look pretty reasonable.