Skip to content

Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Notifications You must be signed in to change notification settings

rmarcacini/ser-coraa-pt-br

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Here, we present the Brazilian Portuguese Speech Emotion Recognition Task. This task aims to motivate research for SER in our community, mainly to discuss theoretical and practical aspects of Speech Emotion Recognition, pre-processing and feature extraction, and machine learning models for Brazilian Portuguese.

We provide a dataset called CORAA SER version 1.0 composed of approximately 50 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male. While the neutral class represents audio segments with no well-defined emotional state, the non-neutral classes represent segments associated with one of the primary emotional states in the speaker's speech. This dataset was built from the C-ORAL-BRASIL I corpus.

The available corpus consists of audio segments representing Brazilian Portuguese informal spontaneous speech. The non-neutral emotion class was labeled considering paralinguistic elements (laughing, crying, etc). Participants can use pre-trained models and external data, as long as the original C-ORAL-BRASIL corpus (or variants) is not used for model training.

In this task, participants must train their own models using acoustic audio features. A training set is available. The models trained by the participants will be evaluated in a test set, which will be made publicly available after the challenge.

Training Data

Train audio segments are available in the data_train.zip file.

Audio files are named according to their label: <file-id>_<label>.wav . Check the baselines for some examples on reading and pre-processing the training set.

Test Data

Test audio segments are available in the test_ser.zip file.

The ground truth and other metadata are available in the test_ser_metadata.csv file.

Baselines

We present two simple baselines as examples of pre-processing audio segments for feature extraction and model training for emotion recognition.

The first baseline uses a set of prosodic audio features for emotion classification.

In the second baseline, we use the Wav2Vec model to extract features (i.e. embeddings) from the audio segments. These features can be used for training a speech emotion recognition classifier.

Evaluation

Each participant can submit up to three models. The Macro F1 Score measure will be used to evaluate the models.

More information

The S&ER 2022 Workshop is collocated with the 15th edition of the International Conference on the Computational Processing of Portuguese (PROPOR 2022).

Workshop website: https://sites.google.com/view/ser2022/home

About

Emotion Recognition from Brazilian Portuguese Informal Spontaneous Speech

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published