10/17/17

Roughly 3 years ago, Kaggle hosted a contest for the American Epilepsy Society. The challenge was to develop an algorithm to classify 10 minute segments of multi-channel intracranial EEG readings into those which do/don't precede a seizure. Training data on the order of 100 GB was available on 5 dogs and 2 human patients in .mat files. From the kaggle page:

Most participants engineered lab-tested features based on FFT or correlation between the 16/24 channels for dogs/humans. Reference to more exotic statistics pepper the forums but many Kagglers still had the previous contest fresh on the mind.

Ultimately, these methods would prove effective yet again.

Despite the relatively large files, the number of samples is deceptively small. The negative/positive classes of interest go by "interictal" and "preictal" phases in the literature. For dogs, we have hours in (# of Interictal, # of Preictal):

- Dog 1: (80, 4)
- Dog 2: (83, 7)
- Dog 3: (240, 12)
- Dog 4: (134, 16)
- Dog 5: (75, 5)

Each sample hour was split into 10 minute segments for 6 times as many .mat files as above. Despite concern that discernible EEG patterns may not manifest over the entire hour prior to onset, the challenge was framed this way and contestants were evaluated by AUC with nearly 4 thousand test samples.

Those who made it to the top of the leaderboard employed the use of LogisticRegression, RandomForests (though many reported difficulties training), SVMs, and often ensembles after applying dimensionality reduction techniques like PCA or binning the frequencies of FFT with a range of filter banks. Surprisingly, coarse reductions and simple models like these suffice to bring AUC near 0.8.

One popular idea was using the short time Fourier transform to localize frequencies in time. Since the EEG sampling rate is set for dogs at 400 Hz, we can resolve frequencies up to 200 Hz. Scipy has the function stft from the signal module. We build a simple top level script:

```
import numpy as np
from scipy.signal import stft
def mean_stft(fl):
mat = sio.loadmat(fl)
eeg_mat = mat[list(df.keys())[-1]][0][0][0]
freqs, tms, specs = stft(eeg_mat, 400,
nperseg=500)
pow_mat = np.abs(specs)
return np.mean(pow_mat, axis=0)
if __name__ == '__main__':
import sys
fl = sys.argv[1]
avg_channel_stft = mean_stft(fl)
# Plot/save image, loading data
```

With this script, I can perform the fft once and write to file with something like:

find . -name '*.mat' | parallel -j0 python my_mean_stft.py {}

using all available cores, in this case 32 ;)

Loading a few training samples and rendering a channel-averaged spectrogram, we can inspect for differences in time-frequency patterns between preictal and interictal samples from dogs. Here, I have plotted several in sequence for ease in comparison:

Going down along each vertical axes, frequencies range from 0 to 200. From left to right we have the entire 10 minute segment where black/white indicates magnitude [0,1] after normalizing by dividing each time slice by the max value.

Before normalization, the interictal spectrograms seemed more diffuse. Also visible were some bizarrely shaped, and highly concentrated structures in the spectrum of the preictal phase. This normalization reveals many fine cross hatched patterns above and can be thought of as rendering probabilities of selecting a particular frequency for each time step.

Perhaps averaging over channels is heavy handed. We must balance the tradeoffs in separating/averaging channels, downsampling, normalizing, and splitting these arrays into features. Inspection shows that some channels are just more active. In fact, Dog 5 had a channel that went bad and was omitted. Doesn't this also highlight the robustness of the averaging strategy? Could geometric regularities justify a few extra parameters?

There is an intriguing electrode mapping to consider. After trolling the forums, I found this plausible, though not official, map of electrode arrays on dogs. Seems like a worthwhile test.

You can see how this geometrically inspired viewpoint can be used to block channels and localize in space.

Convolving along the time axis would support efficient parameter sharing when we expect time translation symmetries. Perhaps a convnet can learn locally continguous structures in time-frequency associated with an impending seizure. With so few samples and such fine patterns, we have an investigation to mount.