Skip to content

Conversation

@sbmaruf
Copy link
Collaborator

@sbmaruf sbmaruf commented Sep 22, 2021

As requested by @TevenLeScao

We want to:
1. train on a mix of languages
2. do validation on English-only

By default, Megatron-deepspeed uses just a fraction of the training set as the validation set, so we can't have multilingual training data and English-only validation data at the moment. In order to launch experiments, we'd need just a dirty hack to be able to use an English-only validation set

  • Add additional argument for valid data
  • Implement valid data-loader
  • Run a dummy test

@sbmaruf sbmaruf requested a review from TevenLeScao September 22, 2021 01:08
@sbmaruf
Copy link
Collaborator Author

sbmaruf commented Sep 29, 2021

@TevenLeScao Did you get a chance to take a look into this pull?

@sbmaruf sbmaruf requested a review from ibeltagy September 29, 2021 06:02
@TevenLeScao
Copy link
Collaborator

Hey Maruf, sorry, not yet, I'm a bit swamped at the moment and the priority switched to cleaning OSCAR-ml additionally ourselves before launching anything on it, maybe @ibeltagy can review in the meantime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants