# A House Share Price Predictior using a Deep Neural Network

### Adam Lesnikowski

### Problem

What is the market price of a house or room share, given only a picture of a room?

### Motivations

This problem has definite commerical applications in buidling a system that can algorithmically detect
over-priced or under-priced assets. The generalizability of deep convolutional neural networks, the model that we use for this problem, means that not only house share prices would be open to such an approach, but anything that has a picture and has a price, for instance houses, cars, internet products, and perhaps even other assets like fianical assets and companies. This problem also has an academic interest in the challenge of getting a deep net to work with heterogenous, messy data set and in exploring how well nets trained on one kind of data set detect features which can perform on an unseen kind of data.
### Approach

A deep convolutional neural network was chosen as the price predictor for this problem.
The choice was motivated by the following considerations. First there is a plethora of labelled data available
for this problem. Deep neural nets have been shown to achieve top performace against other models, with the proviso
that enough training data is availble to train them. Enough data is important as the network often contain millions of parameters and regularization techniques such as dropout and data set augmentation whose theoretical properties are currently
not thoroughly understood. Second is the promise of neural networks to eliminate the need for hand-engineered features. For the price regression problem, it's not at all clear which features from a photo are most important for price. Hence we forgo tthe problem of feature selection and use a deep convolutional network for our problem.
### Dataset

I collected a dataset of 117,000 user submitted photos photos from a popular house share website. This consisted of six American cities and three European cities. The number of photos ranged from 2,500 pictures from Boston to 28,000 from New York City. The minimum image dimension was 600 pixels on a side. Hence this dataset allows for a variety of interesting data augementation and multi-scale training possibilities by taking various crops of the training data.
### Simplifying Assumptions

For my initial set of experiments, I ran experiments on entire houses and apartment shares in New York City. I also used the Alexnet deep net avaialbe as part of the Berkeley Vision Center deep learning package Caffe. This network was trained on Imagenet, a large collection of one thousand categories of images. Deep networks trained on one dataset have been shown to generalize quite well to types of images not seen in training, so I argue that this is a realiatively mild simplifying step to take for my initial run of experiments.
### Modifying a Classification Net for Regression

AlexNet was trained to perform classification, and its topology is setup to have 1000 output nodes corresponding to each of ImageNet categories. How do we turn this into a regressor, that is a model that gives us a real valued price prediction? I used a technique that's been used before on similar kinds of problems, which is to sample intermediate outputs of the network and train a side regressor to output a price. In particular, I ran a forward pass on AlexNet on each of my 28,000 New York images, and took the 58 six by six pixel convolutional activations that are computed just before the set of fully connected layers. These are called *deep features*. Interestingly these deep features were about 170KB each, larger than the typical input image set of 50-70KB. I then trained a Support Vector Regressor (SVR) on these deep features.
### Support Vector Regressors

A Support Vector Regressor, or SVR, with a linear kenel is essentially linear regression with a linear loss instead of a squared loss, and with an epsilon parameter that we can tune to say that we don't care about regression mistakes within that epsilon range. For example, can set epsilon so that we don't penalize predictions that are within $1 of the correct price. The motivation for an SVR, I would argue, is that a linear loss better tracks what a price predictor should be doing instead of squared loss. SVR's of increasing flexible regression curves were tried, from a linear kernel, then polynomial kernels of sizes 2, 4, 8, and 16, and also a radial basis function (RBF) kernel. The more linear kernel was found to perform nearly as well as its more flexible cousins with the advantage of faster training times, hence for my inital experiments, only linear kernels were explored. Parameter fitting for C and epsilon was performed using cross-validation and a grid search among C in 10**-10 and 10**10 and epsilon between 0 and 25. I noticed a dramatic slowdown in training my linear SVM's with more than 8000 training examples of deep features, which I suspect to be a memory issue. The metrics I optimized for was mean absolute error, or MAE, which I chose since the psychology of price prediction seems to suggest that a price that is off by $20 is twice, not four times, as bad as a price that is off by $10. I performed a test-train split of 8000 train images and 1000 test images.
### Data Preparation

The prices of house shares in New York was found to have wild outliers from $30/night up to an $8000/night penthouse overlooking Central Park. I cleaned the data set by filtering outliers at more than three standard deviations from the mean price of $170/night.
### Results

The optimal linear kernel SVR has a mean absolute error, or MAE, of $68.26 on the 1000 test images. This predictor also correclty classified 64% of house shares as either above or below the mean price. I found this to be an extremely encouraging performance for my initial run of experiments.
### Further Steps

Here's an outline of future experiments. Train the SVR on all the New York data, and then on all the 118K images to maximize traiing experience. In the latter case of cross-city training, the regressor would predict z-scores instead of prices which can be calibrated to a local market by fitting a distribution to the local house price shares. More careful parameter choosing for the SVR's and a closer look at the more flexible polynomial and RBF kernel regressors in the larger data set case is expected to lead to improved performance. Another data set to run the same experiments on is the pictures of room shares instead of whole house or apartment shares. Calculating deep features through another deep net such as VGG or GoogLeNet would be extremeley interesting. Finetuning on my labelled dataset, and then an SVR on these fine-tuned deep features is another intriguing direction. I've been very excited by these intial results, and I hope to run these additional experiments in the near future.
### Figures

Distribution of pixel intensities of a sample deep feature of AlexNet at the fifth convolutional layer.
### Try It!

A demo of my price predictor using the latest trained predictor is in the works at http://pricepredictor.adamlesnikowski.com/.

### Contact

Questions, more info, or latest results?
Email me at first name dot last name at gmail or visit my website http://www.adamlesnikowski.com