- Run the notebook: mtkd4ser.ipynb
1. Clone the Repository
git clone https://github.com/aalto-speech/mtkd4ser.git
cd mtkd4ser
2. Create the Environment
conda env create -f environment.yml
3. Activate the Environment
conda activate ser_venv
1. Multi-Teacher Language-Aware Knowledge Distillation for English Speech Emotion Recognition Using the Monolingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 5 --TRAINING 1 --PARADIGM "MTKD" --LANGUAGE "EN" --LINGUALITY "Monolingual"
2. Conventional Knowledge Distillation for Finnish Speech Emotion Recognition Using the Multilingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 9 --TRAINING 1 --PARADIGM "KD" --LANGUAGE "FI" --LINGUALITY "Multilingual"
3. Vanilla Fine-Tuning for French Speech Emotion Recognition Using the Multilingual Setup
python main.py --LEARNING_RATE 3e-5 --BATCH_SIZE 16 --N_EPOCHS 20 --SESSION 1 --TRAINING 1 --PARADIGM "FT" --LANGUAGE "FR" --LINGUALITY "Multilingual"
4. Available Configurations and Choices
It supports a range of configurable parameters for training, validation, and evaluation. The table below details each Configuration and its options. Select the options that fit your use case.
Configuration | Options |
---|---|
LINGUALITY | Monolingual or Multilingual |
LANGUAGE | EN or FI or FR |
PARADIGM | MTKD or KD or FT |
TRAINING | 1 or 0 |
SESSION | EN: 1-5 or FI: 1-9 or FR: 1 |
N_EPOCHS | ℤ⁺ |
BATCH_SIZE | ℤ⁺ |
LEARNING_RATE | ℝ⁺ |
- MTKD-based monolingual SER methods for English, Finnish, and French.
- Adapt the method for a new language (e.g., Chinese).
- MTKD-based multilingual SER method for English, Finnish, and French.
- Extend the multilingual method to include a resource-scarce language (e.g., Bangla).
- Incorporate heterogeneous Large Audio-Language Models in the MTKD method.
- Distill the internal knowledge of heterogeneous models to the student.