CSCI2951-T Data-driven Computer Vision

Class Blog »

Spring 2016, TR 9:00 to 10:20am, CIT 477.
Instructor: Genevieve Patterson

**Figure from : Deep Visual-Semantic Alignments for Generating Image Descriptions. Andrej Karpathy and Li Fei-Fei, CVPR 2015.

Final Projects (May 10, 2016)

Rapid content based image retrieval by Gustave Marques Netto

Determining artifact date and culture from images by Christine Whalen

A House Share Price Predictior using a Deep Neural Network by Adam Lesnikowski

Deep Learning for Natural Image Segmentation Priors by Gabe Hope

Invariant Superpixel Features for Object Detection & Localization by Sam Kelly

Course Description

Course Catalog Entry
Investigates current research topics in data-driven object detection, scene recognition, and image-based graphics. We will examine data sources, features, and algorithms useful for understanding and manipulating visual data. We will pay special attention to methods that harness large-scale or Internet-derived data. There will be an overview of the current crowdsourcing techniques used to acquire massive image datasets. Vision topics such as scene understanding and object detection will be linked to graphics applications such as photo editing. These topics will be pursued through independent reading, class discussion and presentations, and projects involving current research problems in Computer Vision.

The goal of this course is to give students the background and skills necessary to perform research in computer vision for image detection. Students should understand the strengths and weaknesses of current approaches to research problems and identify interesting open questions and future research directions. Students will hopefully improve their critical reading and communication skills, as well.

Course Requirements

Reading and Summaries

Students will be expected to read one paper for each class. For each assigned paper, students must write a two or three sentence summary and identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Reading summaries must be posted to the class blog by 11:59pm the day before each class. Feel free to reply to other comments on the blog and help each other understanding confusing aspects of the papers. The blog discussion will be the starting point for the class discussion. If you are presenting you don't need to post a summary to the blog.

Class participation

All students are expected to take part in class discussions. If you do not fully understand a paper that is OK. We can work through the unclear aspects of a paper together in class. If you are unable to attend a specific class please let me know ahead of time (and have a good excuse!). Many of the papers covered in this course will have publicly available code and tutorials for running their systems and experiments. For these papers, students will be expected to run the basic versions of the systems. Students are not expected to re-implement an entire system or set of experiments. The purpose of these tutorial exercises is to familiarize students with running code written by other researchers. Students will be expected to identify strengths and weaknesses of the systems they attempt to run.


Depending on enrollment, students will lead the discussion of one or two papers during the semester. Ideally, students would implement some aspect of the presented material and perform experiments that help understand the algorithms. Presentations and all supplemental material should be ready one week before the presentation date so that students can meet with the instructor, go over the presentation, and possibly iterate before the in-class discussion. For the presentations it is fine to use slides and code from outside sources (for example, the paper authors) but be sure to give credit.

Semester projects

Students are expected to complete a state-of-the-art research project on topics relevant to the course. Students will propose a research topic part way through the semester. After a project topic is finalized, students will meet occasionally with the instructor to discuss progress. Students will present their progress on their semester project twice during the course and the course will end with final project presentations. Students will also produce a conference-formatted write-up of their project. Projects will be published on the this web page. The ideal project is something with a clear enough direction to be completed in a couple of months, and enough novelty such that it could be published in a peer-reviewed venue with some refinement and extension.


Strong mathematical skills (linear algebra, calculus, probability and statistics) and previous imaging (graphics, vision, or computational photography) courses are needed. It is strongly recommended that students have taken one of the following courses (or equivalent courses at other institutions): If you aren't sure whether you have the background needed for the course, you can try reading some of the papers below or you can simply come to class during the shopping period.


We will not rely on a textbook, although the free, online textbook "Computer Vision: Algorithms and Applications" by Richard Szeliski is a helpful resource.


Your final grade will be made up from

Office Hours:

Genevieve Patterson, Tuesday and Thursday 1:00-2:30pm, CIT 551

Tentative Schedule

Date Paper Paper, Project page Presenter
Thurs, Jan 28 Introduction; the state of vision and crowdsourcing. Genevieve
Tues, Feb 2 Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014. project page, paper Genevieve
Tues, Feb 2 Tropel: Crowdsourcing Detectors with Minimal Training. Genevieve Patterson, Grant Van Horn, James Hays, Serge Belongie, Pietro Perona. Human Computation (HCOMP) 2015. pdf Genevieve
Thurs, Feb 4 CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning. CVPR 2014 tutorial Genevieve
Tues, Feb 9 ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. pdf Genevieve
Thurs, Feb 11The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014. project page Genevieve
Tues, Feb 16 Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015. project page, arXiv Sam
also read Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014. project page, pdf, demo n/a
Thurs, Feb 18 Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015. arXiv Christine
Feb 23No class. Everyone
Thurs, Feb 25 Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012. project page Genevieve
Tues, Mar 1Project Status Updates. Everyone
Thurs, Mar 3 DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015. arXiv Gabe
also read Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013. project page n/a
Tues, Mar 8 Fast R-CNN. Ross Girshick. ICCV 2015. arXiv, code Gustavo
also read Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS 2015. pdf n/a
Thurs, Mar 10 Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015. arXiv Sam
Tues, Mar 15 Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015. author page, pdf Gustavo
also read Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015. pdf n/a
Thurs, Mar 17 What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. project page Adam
Tues, Mar 22 Special Presentation by Zhile Ren. Three-Dimensional Object Detection and Layout using Clouds of Oriented Gradients. Zhile Ren and Erik B. Sudderth. Zhile Ren
Thurs, Mar 24 Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. project page Christine
Mar 26 - Apr 3Spring Break. Everyone
Tues, Apr 5Project Status Updates. Everyone
Thurs, 4/7 Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015. Project page Gabe
Tues, Apr 12 Vision for Robotics. Presentation from the Tellex Lab. n/a Stefanie Tellex and John Oberlin
Thurs, Apr 14 VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015. project page, arXiv Christine
also read Visual Turing test for computer vision systems. Geman, Donald, et al. Proceedings of the National Academy of Sciences 112.12 (2015): 3618-3623. PNAS page n/a
Tues, Apr 19 Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015. arXiv Adam
Thurs, Apr 21 How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. project page Gabe
Tues, Apr 26 Quizz: Targeted crowdsourcing with a billion (potential) users. Ipeirotis, Panagiotis G., and Evgeniy Gabrilovich. Proceedings of the 23rd international conference on World wide web. ACM, 2014. pdf Gustavo
Thurs, Apr 28Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014. project page Genevieve
Tues, May 3 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015. project page, arXiv Sam
Tues, May 10 Final Project Presentations 9am - approx. 11:30am Everyone
* Note: Final Project reports due 11:59 PM EST on Tues May 10.

Suggested Topics

Date Paper Paper, Project page Presenter
Crowdsourcing and Human Computation
? Micro Perceptual Human Computation for Visual Tasks. Yotam Gingold, Ariel Shamir, Daniel Cohen-Or. ACM Transactions on Graphics (ToG) 2012 project page ?
? Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014. arXiv ?
Learned Representations, ConvNets, Visualizations
? Going Deeper with Convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 2014. arXiv ?
Object Proposals
ConvNet detection and segmentation
? Visualizing and Understanding Convolutional Networks. Matthew D Zeiler, Rob Fergus. ECCV 2014. pdf ?
Weakly Supervised and Unsupervised ConvNets
? Unsupervised Visual Representation Learning by Context Prediction. Carl Doersch, Abhinav Gupta, Alexei A. Efros. ICCV 2015. project page ?
Images and Words
? Visual Madlibs: Fill in the blank Description Generation and Question Answering. Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. project page, pdf ?
Generative ConvNets
? Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015. arXiv ?
? A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. implementation, arXiv ?
? Aggregating local descriptors into a compact image representation (VLAD). H. Jegou, M. Douze, C. Schmid, and P. Perez. In Proc. CVPR, 2010. pdf ?
Siamese / Ranking / Triplet ConvNets
? Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-Or, Leonidas Guibas. Siggraph Asia 2015. project page ?
Attribute-based Representations
? Automatic attribute discovery and characterization from noisy web data. Berg, Tamara L., Alexander C. Berg, and Jonathan Shih. Computer VisionECCV 2010. Springer Berlin Heidelberg, 2010. 663-676. pdf ?
? Discovering the Spatial Extent of Relative Attributes. Fanyi Xiao, Yong Jae Lee. ICCV 2015. pdf ?
Discriminative Feature Discovery
?Learning to predict where humans look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. IEEE International Conference on Computer Vision (ICCV), 2009.project page ?
? Learning a Discriminative Model for the Perception of Realism in Composite Images. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros. ICCV 2015. project page ?
? Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks. Fang Wang, Le Kang, Yi Li. CVPR 2015. arXiv ?
? Multi-view Convolutional Neural Networks for 3D Shape Recognition. Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller. ICCV 2015. project page ?
?Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu.project page ?
? Eulerian video magnification for revealing subtle changes in the world. Wu, Hao-Yu, et al. ACM Trans. Graph. 31.4 (2012): 65. project page ?
? A High Performance CRF Model for Clothes Parsing. E Simo-Serra, S Fidler, F Moreno-Noguer, R Urtasun Computer VisionĐACCV 2014. pdf, code ?

Previous topics (which you should know)

Date Paper Paper, Project page Presenter
Fundamental representations
?Object recognition from local scale-invariant features, David Lowe, ICCV 1999. pdf, project page ?
?Video Google: A Text Retrieval Approach to Object Matching in Videos. Sivic, J. and Zisserman, A. Proceedings of the International Conference on Computer Vision (2003) pdf, project page ?
? Histograms of Oriented Gradients for Human Detection. Navneet Dalal and Bill Triggs. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2005. .pdf ?
? Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006. pdf, slides ?
? ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. IEEE Computer Vision and Pattern Recognition (CVPR), 2009 pdf, project page ?
?LabelMe: a Database and Web-based Tool for Image Annotation. B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman. International Journal of Computer Vision, 2008. pdf, project page ?
? 80 million tiny images: a large dataset for non-parametric object and scene recognition. A. Torralba, R. Fergus, W. T. Freeman. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), 2008. pdf, project page ?
?Describing Objects by Their Attributes. A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth. CVPR 2009 project page ?
? SUN Database: Exploring a Large Collection of Scene Categories J. Xiao, K. Ehinger, J. Hays, A. Oliva, and A. Torralba. IJCV 2014. project page, pdf ?

Other previous topics

Date Paper Paper, Project page Presenter
? Painting-to-3D Model Alignment Via Discriminative Visual Elements. Mathieu Aubry, Bryan Russell Josef Sivic. ToG 2013. project page ?
? DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell. 2013. arXiv ?
? Image Melding: combining inconsistent images using patch-based synthesis. Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, Pradeep Sen. Siggraph 2012. project page ?
?Ground-truth dataset and baseline evaluations for intrinsic image algorithms. R. Grosse, M.K. Johnson, E.H. Adelson and W.T. Freeman. ICCV 2009 project page ?
?Intrinsic Images in the Wild. Sean Bell, Kavita Bala, Noah Snavely. Siggraph 2014. project page ?
?First Person Hyperlapse Videos. Johannes Kopf, Michael Cohen, Richard Szeliski. Siggraph 2014. project page ?
? Depixelizing Pixel Art. Johannes Kopf and Dani Lischinski. Siggraph 2011. project page ?
?Photo tourism: Exploring photo collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. Siggraph 2006.pdf, project page ?


This course was originally created by James Hays, and is also being taught this semester at Georgia Tech. Ideas for the organization and content of this course came from many other researchers such as Svetlana Lazebnik, Kristin Grauman, Antonio Torralba, Derek Hoeim, and Alexei Efros.

Related Graduate Seminars at other Universities