From 375550a549db91810f395b56094975c023dbd1fd Mon Sep 17 00:00:00 2001 From: Conglong Li Date: Tue, 16 Aug 2022 11:25:14 -0700 Subject: [PATCH] add doc for new bert example --- docs/_tutorials/bert-pretraining.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/_tutorials/bert-pretraining.md b/docs/_tutorials/bert-pretraining.md index e3771b7fdad2..a0943949f9bc 100755 --- a/docs/_tutorials/bert-pretraining.md +++ b/docs/_tutorials/bert-pretraining.md @@ -4,6 +4,10 @@ excerpt: "" tags: training pre-training --- +**Note:** +On 08/15/2022 we have added another BERT pre-training/fine-tuning example at [github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile), which includes a README.md that describes how to use it. Compared to the example described below, the new example in Megatron-DeepSpeed adds supports of ZeRO and tensor-slicing model parallelism (thus support larger model scale), uses a public and richer [Pile dataset](https://github.com/EleutherAI/the-pile) (user can also use their own data), together with some changes to the model architecture and training hyperparameters as described in [this paper](https://arxiv.org/abs/1909.08053). As a result, the BERT models trained by the new example is able to provide better MNLI results than original BERT, but with a slightly different model architecture and larger computation requirements. If you want to train a larger-scale or better quality BERT-style model, we recommend to follow the new example in Megatron-DeepSpeed. If your goal is to strictly reproduce the original BERT model, we recommend to follow the example under DeepSpeedExamples/bing_bert as described below. On the other hand, the tutorial below helps explaining how to integrate DeepSpeed into a pre-training codebase, regardless of which BERT example you use. +{: .notice--info} + In this tutorial we will apply DeepSpeed to pre-train the BERT (**B**idirectional **E**ncoder **R**epresentations from **T**ransformers), which is widely used for many Natural Language Processing (NLP) tasks. The