10/24/17Kaggle, Remote Sensing, Computer Vision, Unsupervised Pretraining
We've found that convnets are pretty versatile with applications to computer vision, signal processing and even NLP. Looking to test my convnet repo in live competition, I found the remote sensing Statoil C-core challenge hosted on Kaggle.
In this binary classification task, the participant has access to readings of radar backscatter to distinguish ships and icebergs. The data is 2-channel, each polarized to transmit horizontally-receive horizontally/vertically (HH/HV). The readings are float values on the dB scale. Also included is the angle of incidence for each reading.
The classes are fairly well-balanced and it is not too hard to discern ships from icebergs by visual inspection of a channel coerced into grayscale images.
Observe that radar backscatter is more focused with ships, perhaps due to the geometric regularity compared to the complex faceted surface of an iceberg. It seems plausible that convolution kernels can learn to detect signatures from the raw HH/HV channels distinguishing ship and iceberg.
Related work suggests that strong correlation between HH and HV channels may indicate the presence of a ship. Indeed, this feature alone with a Logistic Regression can achieve AUC ~0.6.
I build a convnet and treat input data similar to images with respect to data augmentation. At train-time, I generate random shifts and crops of the input assuming the usual symmetries.
Stacking 5x5's with 2 strides as well as 3x3's with 1 strides, I explore numerous architectures.
Tuning learning rate decay schedules and architectures took the loss from around 0.5 down to 0.2 offering a promising single model and top leaderboard placement. The AUC hangs near 0.85. Models tend to be confident about predicting ships but misclassifies some icebergs.
I also explore deeper architectures with larger kernels like: 9x9-2-128, 7x7-1-64, 5x5-1-64, 3x3-1-32 (6 times) and wider fc layers, reaching similar performance.
Interestingly, plotting confusion matrices during evaluation helps to track the kind of mistakes a model makes and I find that the deeper architecture tends to be more confident about predicting icebergs.
As often the case, blending these models will likely lead to an improvement overall. However, we have not yet implemented battle-tested, tree-based models like GBM. GBM implementations can be found for several libraries in contest Kernels. It turns out that ships increase the image skewness. Incorporating GBM models with array statistics for the more global view along with applying image filters, a simple blend between trees and nets drives performance into the top 10%.
Note that our CNN was trained from scratch. Perhaps we can adapt low level features from pretrained models on 3-channel corpora like imagenet to our 2-channel images. Some have suggested pretrained NN as feature extractors for GBMs. Another idea... considering that the train/test data is at a 1604/8424 sample split, using an autoencoder on the total dataset, using the embedding as feature extractors for something like GBM or SVM.
Perhaps concatenating the angle of incidence to our first fc layer, we can measure its affect.
The tuning was done manually and further exploration for hyperparameters can boost performance. Likewise, a more principled approach to blending models can help to eke out additional performance.
During training, we may decide to upsample misclassified instances since we are performing random augmentation.
Finally, we should consider how the train and test data may differ. Perhaps we can apply adversarial training to perturb our predictions somewhat for improved leaderboard position.
Our simple convnet trained on augmented HH,HV planes is quite capable of differentiating ships and icebergs in this remote sensing challenge. We also have a few ideas for additional improvements to consider.