-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Covid-19 Pre-training 11B model on TPU Pod v3-512 and v3-1024 #253
Comments
@nshazeer to answer 1 and 3. For 2:
|
Note that we typically pre-train without the model API so you may want to do use those instructions (https://github.com/google-research/text-to-text-transfer-transformer#training) instead. |
@adarob Thanks a lot for your quick reply. Thanks for clarifying "2", and I will wait for @nshazeer feedback on "1" and "3" Regarding your recommendation for switching to "t5_mesh_transformer", I was planning initially to do that, but could you help me on how I can convert the model API code to "t5_mesh_transformer" in order to integrate a new task? This is my code for the new task:
The dataset consists of several txt files, where each line consists of a single sequence. |
One option is to create a local |
@adarob I have followed your advice, and I have to say it is much better. I have test it on Colab on the small mode before I run my large scale training as follows :
My current task file is:
It will be great @nshazeer if you could confirm that the above command is correct, and I simply need to perform the following changes for the large scale training with the 11B model:
|
@nshazeer Could you please confirm our training schema here ? |
Hi Ahmed, your command looks good though you will need to set
Here is the exact command we used for T5.1.1 XXL C4 unsupervised training:
HTH |
Thanks a lot @craffel @adarob , I will make sure to add both of you on our paper acknowledgment :) One last question, in our case the sequence length could go up to 40k. In T5 you are using "relative_attention_type" which is set to "bias_shared". My questions are:
|
Hi, the relative position allows you to use arbitrary sequence lengths. All relative distances above |
@craffel Thanks for the clarification. In this case, How we change the "max_distance" with "gin_param" ? or we have to hardcode it on mesh TensorFlow ? |
All calls to |
Good luck!
…On Sun, Jun 7, 2020, 1:57 PM Ahmed Elnaggar ***@***.***> wrote:
Closed #253
<#253>
.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#253 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADF6IBLEOWABI7RR4VXDQTRVP5JZANCNFSM4NQYZ5BQ>
.
|
Hi,
We will start large scale training for new two unsupervised datasets on TPU Pod v3-512 and v3-1024. This is research for supporting Covid-19 efforts.
We want to make sure that our configuration is correct, and we have several questions:
This is the following official gin file for 11B model that we will use:
This is the model initalization that we will use:
This is the model training that we will use:
In this gin we made the following changes:
My questions are:
"tpu_topology" and "train_batch_size" for TPU Pod v3-512 and V3-1024 ?
@adarob @craffel @sharannarang @nshazeer , Your feedback is highly appreciated.
The text was updated successfully, but these errors were encountered: