https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/
- Sample one of valid labels (+ unknown, silence)
- Pick one of the clips or...
- ...If 'silence' picked, generate silence clips from background noise provided
- Randomly mix sample with background noise provided, transform pitch/speed/volume
- Compute mel-scaled spectrogram
- Scale to match mean, std dev with a pre-fit scaler
- ...
- profit!
- Output model activations (after softmax) to CSV for multiple training runs/model variations
- Generate submission with voting/averaging strategy
- Predict same file many times with different transfromations and average/vote result (?, if performance allows)
- Record more noise
- Generate holdout set
- Generate 10 folds from filenames
- Generate training set excl. holdout set
- Train 10 L1 models, predict on test and holdout sets
- Train L2 model from predictions on holdout set
- Predict using L1 test prtedictions as inputs