This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 538
Horovod support for pretraining and fune-tuning squad #1276
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit 35a586676036f627bffd0d3c753c6cd0a70d63cf Author: ZheyuYe <[email protected]> Date: Fri Jul 17 10:10:14 2020 +0800 Squashed commit of the following: commit 673344d Author: ZheyuYe <[email protected]> Date: Wed Jul 15 22:43:07 2020 +0800 CharTokenizer commit 8dabfd6 Author: ZheyuYe <[email protected]> Date: Wed Jul 15 15:47:24 2020 +0800 lowercase commit f5c94a6 Author: ZheyuYe <[email protected]> Date: Tue Jul 14 17:45:28 2020 +0800 test commit dc55fc9 Author: ZheyuYe <[email protected]> Date: Tue Jul 14 05:45:01 2020 +0800 tiny update on run_squad commit 4defc7a Author: ZheyuYe <[email protected]> Date: Mon Jul 13 23:18:08 2020 +0800 update testings commit 2719e81 Author: ZheyuYe <[email protected]> Date: Mon Jul 13 23:08:32 2020 +0800 re-upload xlmr commit cd0509d Author: ZheyuYe <[email protected]> Date: Mon Jul 13 22:30:47 2020 +0800 fix get_pretrained commit 8ed8a72 Author: ZheyuYe <[email protected]> Date: Mon Jul 13 22:28:13 2020 +0800 re-upload roberta commit 5811d40 Author: ZheyuYe <[email protected]> Date: Mon Jul 13 18:27:23 2020 +0800 update commit 44a09a3 Author: ZheyuYe <[email protected]> Date: Sat Jul 11 15:06:33 2020 +0800 fix commit 4074a26 Author: ZheyuYe <[email protected]> Date: Fri Jul 10 16:08:49 2020 +0800 inference without horovod commit 31cb953 Author: ZheyuYe <[email protected]> Date: Thu Jul 9 18:41:55 2020 +0800 update commit 838be2a Author: ZheyuYe <[email protected]> Date: Thu Jul 9 15:14:39 2020 +0800 horovod for squad commit 1d374a2 Author: ZheyuYe <[email protected]> Date: Thu Jul 9 12:09:19 2020 +0800 fix commit e4fba39 Author: ZheyuYe <[email protected]> Date: Thu Jul 9 10:35:08 2020 +0800 remove multiply_grads commit 007f07e Author: ZheyuYe <[email protected]> Date: Tue Jul 7 11:26:38 2020 +0800 multiply_grads commit b8c85bb Author: ZheyuYe <[email protected]> Date: Mon Jul 6 12:28:56 2020 +0800 fix ModelForQABasic commit 0e13a58 Author: ZheyuYe <[email protected]> Date: Sat Jul 4 18:42:12 2020 +0800 clip_grad_global_norm with zeros max_grad_norm commit bd270f2 Author: ZheyuYe <[email protected]> Date: Fri Jul 3 20:21:31 2020 +0800 fix roberta commit 4fc564c Author: ZheyuYe <[email protected]> Date: Fri Jul 3 19:36:08 2020 +0800 update hyper-parameters of adamw commit 59cffbf Author: ZheyuYe <[email protected]> Date: Fri Jul 3 16:25:46 2020 +0800 try commit a84f782 Author: ZheyuYe <[email protected]> Date: Thu Jul 2 20:39:03 2020 +0800 fix mobilebert commit 4bc3a96 Author: ZheyuYe <[email protected]> Date: Thu Jul 2 11:14:39 2020 +0800 layer-wise decay commit 07186d5 Author: ZheyuYe <[email protected]> Date: Thu Jul 2 02:14:43 2020 +0800 revise commit a5a6475 Author: ZheyuYe <[email protected]> Date: Wed Jul 1 19:50:20 2020 +0800 topk commit 34ee884 Author: ZheyuYe <[email protected]> Date: Wed Jul 1 19:25:09 2020 +0800 index_update commit 74178e2 Author: ZheyuYe <[email protected]> Date: Wed Jul 1 00:48:32 2020 +0800 rename commit fa011aa Author: ZheyuYe <[email protected]> Date: Tue Jun 30 23:40:28 2020 +0800 update commit 402d625 Author: ZheyuYe <[email protected]> Date: Tue Jun 30 21:40:30 2020 +0800 multiprocessing for wiki commit ddbde75 Author: ZheyuYe <[email protected]> Date: Tue Jun 30 20:41:35 2020 +0800 fix bookcorpus commit 6cc5ccd Author: ZheyuYe <[email protected]> Date: Tue Jun 30 16:39:12 2020 +0800 fix wiki commit 9773efd Author: ZheyuYe <[email protected]> Date: Tue Jun 30 15:52:13 2020 +0800 fix openwebtext commit 1fb8eb8 Author: ZheyuYe <[email protected]> Date: Mon Jun 29 19:51:25 2020 +0800 upload gluon_electra_small_owt commit ca83fac Author: ZheyuYe <[email protected]> Date: Mon Jun 29 18:09:48 2020 +0800 revise train_transformer commit 1450f5c Author: ZheyuYe <[email protected]> Date: Mon Jun 29 18:07:04 2020 +0800 revise commit b460bbe Author: ZheyuYe <[email protected]> Date: Mon Jun 29 17:24:00 2020 +0800 repeat for pretraining commit 8ee381b Author: ZheyuYe <[email protected]> Date: Mon Jun 29 17:06:43 2020 +0800 repeat commit aea936f Author: ZheyuYe <[email protected]> Date: Mon Jun 29 16:39:22 2020 +0800 fix mobilebert commit eead164 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 18:44:28 2020 +0800 fix commit 8645115 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 17:27:43 2020 +0800 update commit 2b7f7a3 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 17:18:00 2020 +0800 fix roberta commit 86702fe Author: ZheyuYe <[email protected]> Date: Sun Jun 28 16:27:43 2020 +0800 use_segmentation commit 6d03d7a Author: ZheyuYe <[email protected]> Date: Sun Jun 28 15:52:40 2020 +0800 fix commit 5c0ca43 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 15:49:48 2020 +0800 fix token_ids commit ff7aae8 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 13:56:07 2020 +0800 fix xlmr commit 2070b86 Author: ZheyuYe <[email protected]> Date: Sun Jun 28 13:54:26 2020 +0800 fix roberta commit 70a1887 Author: Leonard Lausen <[email protected]> Date: Fri Jul 17 00:07:08 2020 +0000 Update for Block API (dmlc#1261) - Remove params and prefix arguments for MXNet 2 and update parameter sharing implementation - Remove Block.name_scope() for MXNet 2 - Remove self.params.get() and self.params.get_constant() commit ea9152b Author: Xingjian Shi <[email protected]> Date: Thu Jul 16 15:42:04 2020 -0700 Fixes to make the CI more stable (dmlc#1265) * Some fixes to make the CI more stable * add retries * Update tokenizers.py commit a646c34 Author: ht <[email protected]> Date: Sun Jul 12 02:49:53 2020 +0800 [FEATURE] update backtranslation and add multinomial sampler (dmlc#1259) * back translation bash * split "lang-pair" para in clean_tok_para_corpus * added clean_tok_mono_corpus * fix * add num_process para * fix * fix * add yml * rm yml * update cfg name * update evaluate * added max_update / save_interval_update params * fix * fix * multi gpu inference * fix * update * update multi gpu inference * fix * fix * split evaluate and parallel infer * fix * test * fix * update * add comments * fix * remove todo comment * revert remove todo comment * raw lines remove duplicated '\n' * update multinomaial sampler * fix * fix * fix * fix * sampling * update script * fix * add test_case with k > 1 in topk sampling * fix multinomial sampler * update docs * comments situation eos_id = None * fix Co-authored-by: Hu <[email protected]> commit 83e1f13 Author: Leonard Lausen <[email protected]> Date: Thu Jul 9 20:57:55 2020 -0700 Use Amazon S3 Transfer Acceleration (dmlc#1260) commit cd48efd Author: Leonard Lausen <[email protected]> Date: Tue Jul 7 17:39:42 2020 -0700 Update codecov action to handle different OS and Python versions (dmlc#1254) codecov/codecov-action#80 (comment) commit 689eba9 Author: Sheng Zha <[email protected]> Date: Tue Jul 7 09:55:34 2020 -0700 [CI] AWS batch job tool for GluonNLP (Part I) (dmlc#1251) * AWS batch job tool for GluonNLP * limit range Co-authored-by: Xingjian Shi <[email protected]> commit e06ff01 Author: Leonard Lausen <[email protected]> Date: Tue Jul 7 08:36:24 2020 -0700 Pin mxnet version range on CI (dmlc#1257)
Codecov Report
@@ Coverage Diff @@
## numpy #1276 +/- ##
==========================================
- Coverage 84.21% 83.38% -0.83%
==========================================
Files 42 42
Lines 6316 6375 +59
==========================================
- Hits 5319 5316 -3
- Misses 997 1059 +62
|
leezu
reviewed
Jul 24, 2020
sxjscience
reviewed
Jul 27, 2020
sxjscience
reviewed
Jul 27, 2020
sxjscience
reviewed
Jul 29, 2020
sxjscience
reviewed
Jul 29, 2020
sxjscience
reviewed
Jul 29, 2020
sxjscience
approved these changes
Aug 1, 2020
@szhengac @ZiyueHuang This is merged and you may try pretraining with Electra + Horovod |
zheyuye
commented
Aug 28, 2020
@@ -496,13 +494,17 @@ def dynamic_masking(self, F, input_ids, valid_lengths): | |||
valid_candidates = valid_candidates.astype(np.float32) | |||
num_masked_position = F.np.maximum( | |||
1, F.np.minimum(N, round(valid_lengths * self._mask_prob))) | |||
# The categorical distribution takes normalized probabilities as input | |||
# softmax is used here instead of log_softmax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log(sample_probs)
instead of sample_prob
s should be used for npx.topk
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Thanks @eric-haibin-lin for providing Horovod support in MXNet 2.0, see https://github.com/eric-haibin-lin/horovod/tree/mx2. Based on this, a sight modification were made to support the use of Horovod in the GluonNLP number version.
Changes
prepare_wikipedia.py
scripts with multiprocessing and uplaod the 2020-06-20 version to S3--data 'pretraining_data/pretraining_data/*.train,pretraining_data/prepared_wikipedia/*.txt'
mx.npx.index_update
implemented in add op npx.index_update apache/mxnet#18545Comments
@sxjscience @hymzoque