SmellsLikeML

Porto Seguro: The Shakeup

11/29/17

Big Batches & Boosting

With the Porto Seguro competition coming to a close, contestants have that all important feedback necessary to solidify some intuition from the experience. Interestingly, the release of the final scoring buoyed most contestants up the leaderboard, though to varying degrees. My chief concern with this challenge was to avoid overfitting. In particular, because nobody was able to break through a Gini coefficient of 0.292 on the public leaderboard, I expected this implied a low signal-to-noise ratio.

As the saying goes, "hindsight is 20-20". With the final scoring, I see I should have invested additional time developing a promising XGB model. Though scoring somewhat lower than some blends and stacks, selecting this single model would have shot me up the leaderboard by a few hundred slots.

Cross validation Issues were noted early in the contest and for this reason, I chose to hedge by selecting one high scoring blend and one lower scoring but more diverse stack. Incidentally, this was my first experience implementing 2-stage stacked models. What I gained in this contest was a reinforced appreciation for simple models. Additionally, I found that considering unconventionally large batch sizes can prove profitable in training a neural net, possibly related to class imbalance issues.

Generally, my strategy was to diversify my ensemble with logistic regression, lda, KNN on LDA projections, entity embeddings, naive bayes, ffms, ftrl, regularized greedy forests, several variants of xgb and one lgbm model. Like others, I dropped the 'ps_calc*' features, OHE categoricals, applied target encodings. Additionally, I log transformed and used standardscaler for normalization.

Some models minimized logloss while others targeted the evaluation metric AUC/Gini explicitly. Still some better models used xgb 'objective' 'counts:poisson' to great effect modeling the skewed target distribution.

With the benefit of this experience behind me, I have a codebase to rapidly implement/test boosting algorithms, stacking, hyperparameter search with genetic algorithms or simple random search. I have experimented with workflows around jupyter notebooks toward building a final monolithic stack. Though I have been mindful in saturating compute resources, I will likely iterate faster in future competitions for a more thorough search. I will be able to push on the few models that offer the most juice without the sink of exploring many new tools. Though I was disciplined in sticking with 5 seeds and generating OOF predictions, clearly sampling and cross validation can be more thoroughly investigated so that I make the best choice in the future. Finally, I will try to be a tiny bit more intrepid with respect to overfitting. It isn't always necessary to build a complex stack and more effective leaderboard probing can reveal this.

Though satisfied with my placement, we are here to optimize. I certainly feel that I have leveled up during this competition and there are so many interesting new ones currently on the horizon.