AlexNet / VGG-F network visualized by mNeuron.

Deep Learning with TensorFlow
Introduction to Computer Vision

Please bear with us...

This is a new coursework, so please expect a few bumps in the mechanics. TensorFlow code (with TensorPack functions) will look very different from MATLAB, and much of this project is about familiarizing yourself with these sytems. If you get stuck, please post on Piazza or ask a TA, and we will do our best to help you through! Further, this project has some waiting around, as CNNs take a long time to train.

Logistics

Files: projTFCNNs.zip (83 MB)—dataset included.
Files: projTFCNNs_NoData.zip (83 MB)—for space-constrained dept. machine access.
Part 1: Questions
- Questions + template: Now in the zip: questions/
- Hand-in process: Gradescope as PDF. Submit anonymous materials please!
- Due: Friday 3rd Nov. 2017, 9pm.
Part 2: Code
- Writeup template: In the zip: writeup/
- Required files: Use 'createSubmissionZIP.m'
- Hand-in process: Gradescope as ZIP file. Submit anonymous materials please!
- Due: Friday 10th Nov. 2017, 9pm.

Tasks and Rubric

We will design and train convolutional neural networks (CNNs) for scene recognition using the TensorFlow system. Remember scene recognition with bag of words, which achieved 50 to 70% accuracy on 15-way scene classification? We're going to complete the same task on the 15 scenes database with deep learning and obtain a higher accuracy. We will try the two most common approach to recognition problems in computer vision today: training a deep network from scratch—if you have enough data—and fine tuning a pre-trained network.

Task 0: Install TensorFlow and TensorPack, and familiarize yourself with the stencil code. This will take time—take it slow, learn to follow the code flow.

Task 1: Train a CNN to recognize scenes with the provided architecture and a dataset of 1,500 training examples. This isn't really enough data to gain high accuracy given the number of parameters, so we will try to:

Add standardization (feature normalization)—run.py, class Scene15.
Add dropout regularization—your_model.py, function _build_graph()
Add data augmentation to 'fake' more data—run.py, function get_data().
Change the architecture to be deeper—your_model.py, function _build_graph().

Training for this part might take 20-30 minutes—40 seconds per epoch on James' laptop CPU.

Possibly-helpful links: http://tensorpack.readthedocs.io/en/latest/modules/dataflow.imgaug.html, https://www.tensorflow.org/tutorials/layers

+50pts: Achieve at least 50% test accuracy (for any training epoch) on the 15 scene database, and complete each of the requested features.

Task 2: Fine tune the VGG-F pre-trained CNN to recognize scenes, where the CNN was pre-trained on ImageNet. For this, begin by downloading the VGG-16.npy model and placing it in your code directory. Or, if you're running on a department machine, then please feel free to uncomment the dept file system location of this file in run.py __main__.

Training for this part will take many hours—two hours per epoch on James' laptop CPU. Leave it to run overnight, and use the function to resume training.

+20pts: Achieve at least 85% test accuracy (for any training epoch) on the 15 scene database.

Task 3: TensorBoard. TensorBoard is a locally-hosted Web-based interface to assess models. Use TensorBoard to visualize your loss and error. Each training session is saved as a set of logs and model weights (which you can then use on new examples).

$> tensorboard --logdir=train_log/run
Load a Web browser and navigate to http://localhost:6006/
Explore : )

+5 pts: Use this information and the visual outputs in your write up.

Task 4: Write up. We provide you with a LaTeX template writeup/writeup.tex. Please compile it into a PDF and submit it along with your code. We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.

Describe your process and algorithm, show your results, describe any extra credit, and tell us any other information you feel is relevant.
Report your classification performance (validation error) for each change/improvement you make to Task 1. Report for Task 2, too.
From TensorBoard, include graphs of your loss function over time during training.

+5 pts: For your write up.

-5*n pts: Lose 5 points each time you do not follow the hand-in instructions.

Starter Code Outline

The following is an outline of the stencil code:

run.py. The top level function for data loading and network training. This is what you will run, e.g., $> python run.py --task 1 --gpu -1. Arguments are explained at the top of the file, plus you can inspect them in __main__. If you run this starter code unmodified, then it will train a simple network that achieves ~40% accuracy after 30 epochs—somewhat better than tiny images and nearest neighbor baseline, but not as good as HOG + bag of words + linear SVM.
parameters.py. Contains all the tweakable parameters; feel free to add your own if you wish.
Then we have two 'models' for your network:
1. your_model.py. This is your model for Task 1, to be trained from scratch.
2. vgg_model.py. This is the VGG-16 model for Task 2, to be fine tuned.

Extra Credit

Be sure to analyze in your writeup.pdf whether your extra credit has improved classification accuracy. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit. Some ideas:

up to 10 pts: Gather additional scene training data (e.g., from the SUN database or the Places database) and train a network from scratch. Report performance on those datasets and then use the learned networks for the 15 scene database with fine tuning.
up to 10 pts: Try a completely different recognition task. For example, try to recognize human object sketches (download the .png files). Or try to predict scene attributes. The scene attributes are not one-vs-all (an image simultaneously has many attributes) so you'll need to configure TensorFlow accordingly. There are many other recognition data sets available.
up to 10 pts: Produce visualizations using your own code or methods such as mNeuron, Understanding Deep Image Representations by Inverting Them, DeepVis, or DeepDream.
up to 10 pts: 1 point for every percent accuracy over 70% when training from scratch on the 15 scene database. The highest accuracy we've achieved is 66%, so we expect this to require many bells and whistles such as extensive jittering of the training data, carefully tuned network structure and per-layer training rates, etc.
up to 10 pts: 1 point for every percent accuracy over 90% when fine-tuning from VGG-F. You don't get extra credit for switching to another network (like one trained on the Places database). The challenge here is to adapt a very big network to a relatively small training set.

GPUs in the department

A GPU isn't required to complete the project. If you're happy to multitask, then you can complete this project by 'checking in' on your training. TensorPack even has a function to send you an email or SMS when your training is done (add to the TrainConfig callbacks).

However, a GPU will speed up training dramatically. CIT Sun Lab has machines in the 6th row with GPUs that can be used for training. Further, you can investigate how to schedule jobs on the department's grid GPU machines. These are a limited resource, so please play nice.

Setting up TensorFlow

We will use TensorFlow through Python. This is set up on the departmental machines via Python virtual environments.

$> source /course/cs1430/tf_cpu/bin/activate
On a GPU machine: $> source /course/cs1430/tf_gpu/bin/activate

For a personal machine, please visit the TensorFlow Website for installation instructions. We will also use an additional library called TensorPack, which provides convenience functions.
Windows: James' installation process on Windows with Python 3.6, through PowerShell:

Install Python 3.6
Use Python 3's package manager pip3 to install TensorFlow:
$> pip3 install --upgrade tensorflow
Use Python 3's package manager pip3 to install TensorPack:
$> pip3 install --upgrade tensorpack
Test the install:
$> python
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))

If you have a personal NVIDIA GPU and want to use it to speed up processing, then it's a little more complicated to set up; please venture for yourself using the TensorFlow documentation.

TensorFlow Tutorials

Recommended:

Getting Started with TensorFlow.
MNIST tutorials: Slower pace, with more explanation, and Faster, with no underlying explanation.
TensorBoard.

Credits

Project description and code by Aaron Gokaslan, James Tompkin, James Hays. Originally solely by James Hays, but translated to TensorFlow from MatConvNet by Aaron.

Deep Learning with TensorFlow Introduction to Computer Vision