CNNs for EEG


Last time, we considered the state-of-the-art in seizure prediction from raw EEG readings via machine learning. Over the course of some months, Kagglers competed to drive up the AUC in this highly-imbalanced classification problem.

Standard signal processing techniques were brought to bear for feature engineering while ensembles of standby ML algorithms like SVMs and RandomForest took contestants the rest of the way. Still, the amount of post-processing and individual specialization required is disappointing for applications to real-world systems.

Assume local patterns most important

Here, I explore convnets to extract time-frequency patterns from the short-time Fourier Transformed eeg matrices. This approach moves away from the kitchen sink of exotic statistics considered by top competitors and toward an end-to-end model. We benefit from the regularizing effect in passing from time to frequency domain through Fourier Transforms, downsampling, and parameter sharing.

To deal with the 95/5 percent split on negative/positive classes, we upsample the negatives by a factor of 10-12 to balance the training batches. This puts the "all-0's" classifier at 53%-59% accuracy.

Alternatively, you can compute class averages to rescale the losses like:

ratio = # positives / # total
class_weights = tf.constant([[ratio, 0],
                             [0, 1 - ratio]])
weighted_logits = tf.matmul(logits, 
xent = tf.nn.softmax_cross_entropy_with_logits(
loss = tf.reduce_mean(xent)

After performing short-time Fourier Transforms, we are left with large arrays which we downsample taking every 10th time slice to fit batches into memory. Perhaps this leaves time-slices mod 10 available for data augmentation though we haven't performed this experiment.

Can a human do it?

Reviewing many samples, we find some visually discernible patterns in lower frequencies:

Henceforth, we make the simplifying assumption that higher frequencies are less important and explore cropping to the range 0-50 Hz for dimensionality reduction.

Finally, by averaging across all channels, we can reduce to performing 2-D convolution in time-frequency space. We essentially disregard spatial information assuming inconsistency between individuals, regularizing the EEG signal and simplifying computations.

High Capacity Models on small data

We are chiefly concerned by overfitting to the small number of samples. We explore stacking 5x5 kernels with 2-strides and 3x3's with 1-strides for depths 32 or 64 and between 4-6 layers, followed by 4-5 densely connected layers funneling down to 2 outputs. We take an initial learning rate of 1e-5 and offer the following summaries:

On a validation set of 300 randomly chosen samples, we reach AUC of 0.69 for an overall accuracy of 87% which breaks down into a recall of 73% of preictal samples while maintaining a precision of only 9.5%. Comparing to reported performance, this individual model is no game changer. However, there remain many avenues for eking out additional performance in our reductions above.

This raises the question: what if deep learning libraries like TF were in greater use at the time? Convnets may have provided a framework for automatic feature extraction, possibly adding to the performance of ensembles.

Final Assessment

In these experiments, we have not reached sufficient precision for this model to go into application. This model's weak precision would certainly frustrate the user like "the boy who cried wolf". Nonetheless, high recall and the ability for adaptive feature learning may improve an ensemble. This avenue deserves a larger corpus where we might apply still higher capacity models.

Finally, there were other features in the contest/EEG research like the pairwise correlation matrices for EEG channels. However, some research suggests that channel synchronization is an after-effect of seizures rather than a precursor to seizure onset.