AlexNet / VGG-F network visualized by mNeuron.
This project will be divided up into three weeks. Though we will release all of the resources of the project at once, this is the outline of how we recommend the three weeks to be managed:
We will design and train convolutional neural networks (CNNs) for scene recognition using the TensorFlow system. Remember scene recognition with bag of words, which achieved 50 to 70% accuracy on 15-way scene classification? We're going to complete the same task on the 15 scenes database with deep learning and obtain a higher accuracy.
Task 1: Design a CNN architecture with less than 15 million parameters, and train it on a small dataset of 1,500 training examples. This isn't really enough data, so we will use:
You will be implementing standardization and data augmentation in preprocess.py
. Regularization via dropout layers will be in YourModel
. It's a good idea to have an (at least) preliminary preprocessing routine set up before building your model that you can fine-tune later. You can see some of the results of your preprocessing function visualized after/during training under the "IMAGES" tab in Tensorboard.
Task 2: Write and train a classification head for the VGG-F pre-trained CNN to recognize scenes, where the CNN was pre-trained on ImageNet. With the weights of the pre-trained network frozen, there should be no more than 15 million trainable parameters in this model. To download the pretrained VGG16 weights (trained on the ImageNet data set), navigate to your project's code directory from the command line, then enter the following command:
wget "https://cs.brown.edu/courses/csci1430/proj4/vgg16_imagenet.h5"
These are the two most common approach to recognition problems in computer vision today: either train a deep network from scratch—if you have enough data—or fine tune a pre-trained network.
In your submission to Gradescope, you will include your best performing weights for YourModel
(you will not have to include weights for VGGModel
in your submission). The createSubmissionZip.py script will automatically do this for you by searching for the best weights in the directory your_model_checkpoints/. Make sure not to rename weight files, as they include the accuracy on the test set in their name, and this is how the script is able to tell which is best.
The only files you need to edit for the assignment are preprocess.py, your_model.py, vgg_model.py, and possibly hyperparameters.py. The locations in these files that need editing are marked by TODO comments.
Each time the program is run, a summary of the network will be printed, including the number of trainable and non-trainable parameters. Make sure to pay attention to this so that you don't exceed the limit enforced on each network.
We provide you with a LaTeX template writeup/writeup.tex
. Please compile it into a PDF and submit it along with your code. We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.
Task:
Be sure to analyze in your writeup.pdf whether your extra credit has improved classification accuracy. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit. Some ideas:
We will use Google Cloud Platform through the GCP guide. Each account can only get a coupon once. Please make sure you are using brown.edu account, not cs.brown.edu or accounts of other domains. You will receive $50 coupon. To stop consuming the bill, please shut down the VM instance after using it each time.
If you have used up all the coupons for GCP, you can also use Google Colab. Google Colab provides a platform with free GPUs. See the Colab Tutorial.
We will use TensorFlow through Python. This is all set up on the departmental machines. For a personal machine, please visit the TensorFlow Website for installation instructions. Usually this can be achieved in your computer's terminal via the command:
pip3 install --upgrade tensorflow
If you have an NVIDIA GPU and want to use it, it may be a little more complicated to set up; please venture for yourself.
Project description and code written by Isa Milefchik, Aaron Gokaslan, James Tompkin, and James Hays. Originally solely by James Hays, but translated to TensorFlow from MatConvNet by Aaron, then translated to Tensorflow 2.0 by Isa. GCP guide by George Lee and Isa Milefchik. Colab guide by Ruizhao Zhu, Zhoutao Lu and Jiawei Zhang. We also get reference from materials of CSCI1470 Deep Learning.