AlexNet / VGG-F network visualized by mNeuron.

Deep Learning with TensorFlow 2.0
Introduction to Computer Vision

Logistics

Template: Project4_CNNs.
Part 1: Questions
- Questions + template: Now in the repo: questions/
- Hand-in process: Gradescope as PDF. Submit anonymous materials please!
- This project only: Questions are worth 40% of the project grade.
- Due: Friday 3rd April 2020, 9pm.
Part 2: Code
- Writeup template: In the repo: writeup/
- Required files: code/, writeup/writeup.pdf
- Hand-in process: Gradescope as a zip file. Submit anonymous materials please!
- Due: Friday 3rd April 2020, 9pm.

Overview

This project will be divided up into three weeks. Though we will release all of the resources of the project at once, this is the outline of how we recommend the three weeks to be managed:

Week 1: Complete the written conceptual questions of the project (questions 1-4 in the handout).
Week 2: Write the NumPy neural network assigned at the end of the questions handout. In addition to this, set up GCP and complete one of the two Tensorflow MNIST tutorials on GCP (see handout for more information). At the end of the week (Friday 13th Mar.), you will turn in the PDF of part 1 of the project.
Week 3: Complete part 2 of the project.

Part 2 Overview

We will design and train convolutional neural networks (CNNs) for scene recognition using the TensorFlow system. Remember scene recognition with bag of words, which achieved 50 to 70% accuracy on 15-way scene classification? We're going to complete the same task on the 15 scenes database with deep learning and obtain a higher accuracy.

Task 1: Design a CNN architecture with less than 15 million parameters, and train it on a small dataset of 1,500 training examples. This isn't really enough data, so we will use:

Standardization (a type of normalization)
Data augmentation
Regularization via dropout

You will be implementing standardization and data augmentation in preprocess.py. Regularization via dropout layers will be in YourModel. It's a good idea to have an (at least) preliminary preprocessing routine set up before building your model that you can fine-tune later. You can see some of the results of your preprocessing function visualized after/during training under the "IMAGES" tab in Tensorboard.

Task 2: Write and train a classification head for the VGG-F pre-trained CNN to recognize scenes, where the CNN was pre-trained on ImageNet. With the weights of the pre-trained network frozen, there should be no more than 15 million trainable parameters in this model. To download the pretrained VGG16 weights (trained on the ImageNet data set), navigate to your project's code directory from the command line, then enter the following command:

wget "https://cs.brown.edu/courses/csci1430/proj4/vgg16_imagenet.h5"

These are the two most common approach to recognition problems in computer vision today: either train a deep network from scratch—if you have enough data—or fine tune a pre-trained network.

In your submission to Gradescope, you will include your best performing weights for YourModel (you will not have to include weights for VGGModel in your submission). The createSubmissionZip.py script will automatically do this for you by searching for the best weights in the directory your_model_checkpoints/. Make sure not to rename weight files, as they include the accuracy on the test set in their name, and this is how the script is able to tell which is best.

The only files you need to edit for the assignment are preprocess.py, your_model.py, vgg_model.py, and possibly hyperparameters.py. The locations in these files that need editing are marked by TODO comments.

Each time the program is run, a summary of the network will be printed, including the number of trainable and non-trainable parameters. Make sure to pay attention to this so that you don't exceed the limit enforced on each network.

Rubric

+50 pts: Task 1: Build a convolutional network with standardization, data augmentation, and regularization via dropout that achieves at least 65% test accuracy (for any training epoch) on the 15 scene database. There will be a deduction of 1 pt for each percent below 65% scored. No points will be awarded for architectures with more than 15 million parameters.
+30 pts: Task 2: Train VGG-F to achieve at least 85% test accuracy (for any training epoch) on the 15 scene database. There will be a deduction of 1 pt for each percent below 85% scored. No points will be awarded for architectures with more than 15 million trainable parameters.
+05 pts: Writeup with design decisions and evaluation.
+10 pts: Extra credit (up to ten points)
-05*n pts: Where n is the number of times that you do not follow the handin instructions.

Write up

We provide you with a LaTeX template writeup/writeup.tex. Please compile it into a PDF and submit it along with your code. We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.

Task:

Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant.
Report your classification performance for each step in Task 1, plus for Task 2.
Include graphs of your loss function over time during training. You can use Tensorboard to view these graphs (see the GCP guide for more information).
Include screenshots of the model summaries. Make sure to capture both the architecture and the number of parameters used (trainable and non-trainable).

Extra Credit

Be sure to analyze in your writeup.pdf whether your extra credit has improved classification accuracy. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit. Some ideas:

up to 10 pts: Gather additional scene training data (e.g., from the SUN database or the Places database) and train a network from scratch. Report performance on those datasets and then use the learned networks for the 15 scene database with fine tuning.
up to 10 pts: Try a completely different recognition task. For example, try to recognize human object sketches (download the .png files). Or try to predict scene attributes. The scene attributes are not one-vs-all (an image simultaneously has many attributes) so you'll need to configure TensorFlow accordingly. There are many other recognition data sets available.
up to 10 pts: Produce visualizations using your own code or methods such as mNeuron, Understanding Deep Image Representations by Inverting Them, DeepVis, or DeepDream.
up to 10 pts: 1 point for every percent accuracy over 70% when training from scratch on the 15 scene database. The highest accuracy we've achieved is ~69%, so we expect this to require some effort to pull off.
up to 10 pts: 1 point for every percent accuracy over 90% when fine-tuning from VGG-F. You don't get extra credit for switching to another network (like one trained on the Places database). The challenge here is to adapt a very big network to a relatively small training set.

Setting up Google Cloud Platform (required)

We will use Google Cloud Platform through the GCP guide. Each account can only get a coupon once. Please make sure you are using brown.edu account, not cs.brown.edu or accounts of other domains. You will receive $50 coupon. To stop consuming the bill, please shut down the VM instance after using it each time.

Setting up Google Colaboratory

If you have used up all the coupons for GCP, you can also use Google Colab. Google Colab provides a platform with free GPUs. See the Colab Tutorial.

Setting up TensorFlow on local machine

We will use TensorFlow through Python. This is all set up on the departmental machines. For a personal machine, please visit the TensorFlow Website for installation instructions. Usually this can be achieved in your computer's terminal via the command:

pip3 install --upgrade tensorflow

If you have an NVIDIA GPU and want to use it, it may be a little more complicated to set up; please venture for yourself.

Credits

Project description and code written by Isa Milefchik, Aaron Gokaslan, James Tompkin, and James Hays. Originally solely by James Hays, but translated to TensorFlow from MatConvNet by Aaron, then translated to Tensorflow 2.0 by Isa. GCP guide by George Lee and Isa Milefchik. Colab guide by Ruizhao Zhu, Zhoutao Lu and Jiawei Zhang. We also get reference from materials of CSCI1470 Deep Learning.

Deep Learning with TensorFlow 2.0 Introduction to Computer Vision