CSCI2951-T Data-driven Computer Vision
Class Blog »Spring 2016, TR 9:00 to 10:20am, CIT 477.
Instructor: Genevieve Patterson
**Figure from : Deep Visual-Semantic Alignments for Generating Image Descriptions. Andrej Karpathy and Li Fei-Fei, CVPR 2015.
Final Projects (May 10, 2016)
Rapid content based image retrieval by Gustave Marques Netto
Determining artifact date and culture from images by Christine Whalen
A House Share Price Predictior using a Deep Neural Network by Adam Lesnikowski
Deep Learning for Natural Image Segmentation Priors by Gabe Hope
Invariant Superpixel Features for Object Detection & Localization by Sam Kelly
Course Description
Course Catalog EntryInvestigates current research topics in data-driven object detection, scene recognition, and image-based graphics. We will examine data sources, features, and algorithms useful for understanding and manipulating visual data. We will pay special attention to methods that harness large-scale or Internet-derived data. There will be an overview of the current crowdsourcing techniques used to acquire massive image datasets. Vision topics such as scene understanding and object detection will be linked to graphics applications such as photo editing. These topics will be pursued through independent reading, class discussion and presentations, and projects involving current research problems in Computer Vision.
The goal of this course is to give students the background and skills necessary to perform research in computer vision for image detection. Students should understand the strengths and weaknesses of current approaches to research problems and identify interesting open questions and future research directions. Students will hopefully improve their critical reading and communication skills, as well.
Course Requirements
Reading and Summaries
Students will be expected to read one paper for each class. For each assigned paper, students must write a two or three sentence summary and identify at least one question or topic of interest for class discussion. Interesting topics for discussion could relate to strengths and weaknesses of the paper, possible future directions, connections to other research, uncertainty about the conclusions of the experiments, etc. Reading summaries must be posted to the class blog by 11:59pm the day before each class. Feel free to reply to other comments on the blog and help each other understanding confusing aspects of the papers. The blog discussion will be the starting point for the class discussion. If you are presenting you don't need to post a summary to the blog.Class participation
All students are expected to take part in class discussions. If you do not fully understand a paper that is OK. We can work through the unclear aspects of a paper together in class. If you are unable to attend a specific class please let me know ahead of time (and have a good excuse!). Many of the papers covered in this course will have publicly available code and tutorials for running their systems and experiments. For these papers, students will be expected to run the basic versions of the systems. Students are not expected to re-implement an entire system or set of experiments. The purpose of these tutorial exercises is to familiarize students with running code written by other researchers. Students will be expected to identify strengths and weaknesses of the systems they attempt to run.Presentation(s)
Depending on enrollment, students will lead the discussion of one or two papers during the semester. Ideally, students would implement some aspect of the presented material and perform experiments that help understand the algorithms. Presentations and all supplemental material should be ready one week before the presentation date so that students can meet with the instructor, go over the presentation, and possibly iterate before the in-class discussion. For the presentations it is fine to use slides and code from outside sources (for example, the paper authors) but be sure to give credit.Semester projects
Students are expected to complete a state-of-the-art research project on topics relevant to the course. Students will propose a research topic part way through the semester. After a project topic is finalized, students will meet occasionally with the instructor to discuss progress. Students will present their progress on their semester project twice during the course and the course will end with final project presentations. Students will also produce a conference-formatted write-up of their project. Projects will be published on the this web page. The ideal project is something with a clear enough direction to be completed in a couple of months, and enough novelty such that it could be published in a peer-reviewed venue with some refinement and extension.Prerequisites
Strong mathematical skills (linear algebra, calculus, probability and statistics) and previous imaging (graphics, vision, or computational photography) courses are needed. It is strongly recommended that students have taken one of the following courses (or equivalent courses at other institutions):- CSCI 1230, Introduction to Computer Graphics
- CSCI 1290, Computational Photography
- CSCI 1430, Introduction to Computer Vision
- CSCI 2240, Interactive Computer Graphics
- ENGN 1610, Image Understanding
Textbook
We will not rely on a textbook, although the free, online textbook "Computer Vision: Algorithms and Applications" by Richard Szeliski is a helpful resource.Grading
Your final grade will be made up from- 20% Reading summaries posted to class blog
- 20% Classroom participation and attendance, including completion of coding tutorials and project progress reports
- 20% Paper presentation(s), including partial system implementation or testing
- 40% Semester project
Office Hours:
Genevieve Patterson, Tuesday and Thursday 1:00-2:30pm, CIT 551Tentative Schedule
Date | Paper | Paper, Project page | Presenter |
Thurs, Jan 28 | Introduction; the state of vision and crowdsourcing. | Genevieve | |
Tues, Feb 2 | Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014. | project page, paper | Genevieve |
Tues, Feb 2 | Tropel: Crowdsourcing Detectors with Minimal Training. Genevieve Patterson, Grant Van Horn, James Hays, Serge Belongie, Pietro Perona. Human Computation (HCOMP) 2015. | Genevieve | |
Thurs, Feb 4 | CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning. | CVPR 2014 tutorial | Genevieve |
Tues, Feb 9 | ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. | Genevieve | |
Thurs, Feb 11 | The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014. | project page | Genevieve |
Tues, Feb 16 | Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015. | project page, arXiv | Sam |
also read | Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014. | project page, pdf, demo | n/a |
Thurs, Feb 18 | Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015. | arXiv | Christine |
Feb 23 | No class. | Everyone | |
Thurs, Feb 25 | Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012. | project page | Genevieve |
Tues, Mar 1 | Project Status Updates. | Everyone | |
Thurs, Mar 3 | DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015. | arXiv | Gabe |
also read | Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013. | project page | n/a |
Tues, Mar 8 | Fast R-CNN. Ross Girshick. ICCV 2015. | arXiv, code | Gustavo |
also read | Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS 2015. | n/a | |
Thurs, Mar 10 | Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015. | arXiv | Sam |
Tues, Mar 15 | Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015. | author page, pdf | Gustavo |
also read | Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015. | n/a | |
Thurs, Mar 17 | What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. | project page | Adam |
Tues, Mar 22 | Special Presentation by Zhile Ren. Three-Dimensional Object Detection and Layout using Clouds of Oriented Gradients. Zhile Ren and Erik B. Sudderth. | Zhile Ren | |
Thurs, Mar 24 | Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. | project page | Christine |
Mar 26 - Apr 3 | Spring Break. | Everyone | |
Tues, Apr 5 | Project Status Updates. | Everyone | |
Thurs, 4/7 | Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015. | Project page | Gabe |
Tues, Apr 12 | Vision for Robotics. Presentation from the Tellex Lab. | n/a | Stefanie Tellex and John Oberlin |
Thurs, Apr 14 | VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015. | project page, arXiv | Christine |
also read | Visual Turing test for computer vision systems. Geman, Donald, et al. Proceedings of the National Academy of Sciences 112.12 (2015): 3618-3623. | PNAS page | n/a |
Tues, Apr 19 | Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015. | arXiv | Adam |
Thurs, Apr 21 | How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. | project page | Gabe |
Tues, Apr 26 | Quizz: Targeted crowdsourcing with a billion (potential) users. Ipeirotis, Panagiotis G., and Evgeniy Gabrilovich. Proceedings of the 23rd international conference on World wide web. ACM, 2014. | Gustavo | |
Thurs, Apr 28 | Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014. | project page | Genevieve |
Tues, May 3 | Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015. | project page, arXiv | Sam |
Tues, May 10 | Final Project Presentations 9am - approx. 11:30am | Everyone |
Suggested Topics
Date | Paper | Paper, Project page | Presenter |
? | Micro Perceptual Human Computation for Visual Tasks. Yotam Gingold, Ariel Shamir, Daniel Cohen-Or. ACM Transactions on Graphics (ToG) 2012 | project page | ? |
? | Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014. | arXiv | ? |
? | Going Deeper with Convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 2014. | arXiv | ? |
? | Visualizing and Understanding Convolutional Networks. Matthew D Zeiler, Rob Fergus. ECCV 2014. | ? | |
? | Unsupervised Visual Representation Learning by Context Prediction. Carl Doersch, Abhinav Gupta, Alexei A. Efros. ICCV 2015. | project page | ? |
? | Visual Madlibs: Fill in the blank Description Generation and Question Answering. Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. | project page, pdf | ? |
? | Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015. | arXiv | ? |
? | A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. | implementation, arXiv | ? |
? | Aggregating local descriptors into a compact image representation (VLAD). H. Jegou, M. Douze, C. Schmid, and P. Perez. In Proc. CVPR, 2010. | ? | |
? | Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-Or, Leonidas Guibas. Siggraph Asia 2015. | project page | ? |
? | Automatic attribute discovery and characterization from noisy web data. Berg, Tamara L., Alexander C. Berg, and Jonathan Shih. Computer VisionECCV 2010. Springer Berlin Heidelberg, 2010. 663-676. | ? | |
? | Discovering the Spatial Extent of Relative Attributes. Fanyi Xiao, Yong Jae Lee. ICCV 2015. | ? | |
? | Learning to predict where humans look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. IEEE International Conference on Computer Vision (ICCV), 2009. | project page | ? |
? | Learning a Discriminative Model for the Perception of Realism in Composite Images. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros. ICCV 2015. | project page | ? |
? | Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks. Fang Wang, Le Kang, Yi Li. CVPR 2015. | arXiv | ? |
? | Multi-view Convolutional Neural Networks for 3D Shape Recognition. Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller. ICCV 2015. | project page | ? |
? | Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu. | project page | ? |
? | Eulerian video magnification for revealing subtle changes in the world. Wu, Hao-Yu, et al. ACM Trans. Graph. 31.4 (2012): 65. | project page | ? |
? | A High Performance CRF Model for Clothes Parsing. E Simo-Serra, S Fidler, F Moreno-Noguer, R Urtasun Computer VisionĐACCV 2014. | pdf, code | ? |
Previous topics (which you should know)
Date | Paper | Paper, Project page | Presenter |
? | Object recognition from local scale-invariant features, David Lowe, ICCV 1999. | pdf, project page | ? |
? | Video Google: A Text Retrieval Approach to Object Matching in Videos. Sivic, J. and Zisserman, A. Proceedings of the International Conference on Computer Vision (2003) | pdf, project page | ? |
? | Histograms of Oriented Gradients for Human Detection. Navneet Dalal and Bill Triggs. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2005. | ? | |
? | Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006. | pdf, slides | ? |
? | ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. IEEE Computer Vision and Pattern Recognition (CVPR), 2009 | pdf, project page | ? |
? | LabelMe: a Database and Web-based Tool for Image Annotation. B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman. International Journal of Computer Vision, 2008. | pdf, project page | ? |
? | 80 million tiny images: a large dataset for non-parametric object and scene recognition. A. Torralba, R. Fergus, W. T. Freeman. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), 2008. | pdf, project page | ? |
? | Describing Objects by Their Attributes. A. Farhadi, I. Endres, D. Hoiem, and D.A. Forsyth. CVPR 2009 | project page | ? |
? | SUN Database: Exploring a Large Collection of Scene Categories J. Xiao, K. Ehinger, J. Hays, A. Oliva, and A. Torralba. IJCV 2014. | project page, pdf | ? |
Other previous topics
Date | Paper | Paper, Project page | Presenter |
? | Painting-to-3D Model Alignment Via Discriminative Visual Elements. Mathieu Aubry, Bryan Russell Josef Sivic. ToG 2013. | project page | ? |
? | DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell. 2013. | arXiv | ? |
? | Image Melding: combining inconsistent images using patch-based synthesis. Soheil Darabi, Eli Shechtman, Connelly Barnes, Dan B Goldman, Pradeep Sen. Siggraph 2012. | project page | ? |
? | Ground-truth dataset and baseline evaluations for intrinsic image algorithms. R. Grosse, M.K. Johnson, E.H. Adelson and W.T. Freeman. ICCV 2009 | project page | ? |
? | Intrinsic Images in the Wild. Sean Bell, Kavita Bala, Noah Snavely. Siggraph 2014. | project page | ? |
? | First Person Hyperlapse Videos. Johannes Kopf, Michael Cohen, Richard Szeliski. Siggraph 2014. | project page | ? |
? | Depixelizing Pixel Art. Johannes Kopf and Dani Lischinski. Siggraph 2011. | project page | ? |
? | Photo tourism: Exploring photo collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. Siggraph 2006. | pdf, project page | ? |