AHA: Part II


Deep Learning, Fine-tuning, TF-Slim

While using AHA, I took notes on what I encountered to understand my preferences. I was able to identify themes which I tended to swipe left or right on. For instance, poor quality images, gated windows, a facade that looked like a hotel, dim lighting, and carpeting tended to get down voted. On the other hand, hard flooring, private patios, scenic vistas, strong natural lighting, bookshelves, and dog parks were among the attributes of photos I tended to like.

In short, for places in a fairly narrow price range, I looked for bright, spacious, modern, private, and secure spots.

Fine-tuning Inception V3

The challenge then becomes, how can I build a classifier to learn a complex and subtle rule set like this? Karpathy recommends "don't be a hero". I do not have many samples and would surely overfit in trying to train from scratch. Certainly, it makes sense to leverage a model pretrained on ImageNet. Many objects that appear in apartment photos will have been accurately classified with Inception or VGG.

However, the problem depends upon more global context than classifying the subject of a photo. The rough heuristics above were not strictly followed, there were often exceptions. The overall sentiment for a place was difficult to attribute to a specific quality.

Nonetheless, fine-tuning a high quality image classifier seems to be the most reasonable course of action. Fortunately, TF Slim simplifies this task.

Following their pattern from the docs to fine-tune a flower classifier, I created a directory structure with images that were positively or negatively labeled. Then I was able to use their scripts to produce TFRecords and establish the train/validation split. After setting the relevant environment variables, I was able to run the script.

As in the example, I restored all but the final two layers and began training from there. I allowed the remaining layers to be trainable and I chose a very small learning_rate of 1e-5 and increased the batch_size to 64.

Because I am using the InceptionV3 checkpoint, there are really not many other parameters to explore. Indeed, the premise of the fine-tuning approach is that I do not want to stray too far from the parameter weights of such a strong model. This is also why I chose a very conservative learning rate in fine-tuning.

Training is going quite slowly, but so far the model achieves ~81% accuracy on the validation set. Here is the training curve.


Considering a swipe acceptance rate of ~68%, the model is doing much better than the 'all-yes' classifier. Already, this model may reasonably reduce the apartment hunting workload. It could be used to filter out the 'hard negatives' while I fish to schedule apartment viewings without wasting too much time sorting through Craigslist.

Finally, this model doesn't even take the post description into consideration. Another worthwhile experiment would be to build a text classifier and introduce this model as well.