Skip to content

myurasov/Kaggle-TF-Speech

Repository files navigation

Kaggle: TensorFlow Speech Recognition Challenge

https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/

Level 1 models training graphs

V1 Flow

Generating training data

  1. Sample one of valid labels (+ unknown, silence)
  2. Pick one of the clips or...
  3. ...If 'silence' picked, generate silence clips from background noise provided
  4. Randomly mix sample with background noise provided, transform pitch/speed/volume
  5. Compute mel-scaled spectrogram
  6. Scale to match mean, std dev with a pre-fit scaler
  7. ...
  8. profit!

Inference

  1. Output model activations (after softmax) to CSV for multiple training runs/model variations
  2. Generate submission with voting/averaging strategy
  3. Predict same file many times with different transfromations and average/vote result (?, if performance allows)

Ideas

  • Record more noise

V2 Flow

  1. Generate holdout set
  2. Generate 10 folds from filenames
  3. Generate training set excl. holdout set
  4. Train 10 L1 models, predict on test and holdout sets
  5. Train L2 model from predictions on holdout set
  6. Predict using L1 test prtedictions as inputs