Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNet] - [BERT] #14864

Closed
araitats opened this issue May 2, 2019 · 0 comments · Fixed by #14868
Closed

[MXNet] - [BERT] #14864

araitats opened this issue May 2, 2019 · 0 comments · Fixed by #14868
Labels

Comments

@araitats
Copy link

araitats commented May 2, 2019

dmlc/gluon-nlp#690

Description

There is a problem with a custom BERT model training with the later version of MXNet 1.5.0 (observed with cu90).
mlm_loss stops around 7.3 and nsp_acc stopps around 54.
mxnet-cu90 version which is older than 1.5.0b20190425 does not have this issue.
1.5.0b20190426 onward has this issue.

Environment info (Required)

Amazon SageMaker Notebook (ml.p3.16xlarge)
CUDA version: 9.0

Package used (Python/R/Scala/Julia):
Python 3.6

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants