Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[v0.10.x] Softmax optimization & bertpass refactor #1565

Merged
merged 15 commits into from
May 27, 2021

Conversation

bgawrych
Copy link
Contributor

This PR adds graph pass to optimize CPU's softmax on BERT.
Currently for BERT-large length tensor is created by following operations: expand_dims -> brodcast_axis -> Reshape
and there is x24 such tensor creation. This pass replace softmax (with length) with regular softmax (but with masked input) - mask is created only once and then is passed to elemwise_sum to mask input. Applying pass in the scripts is optional

Original:
image

Masked softmax:
image

Thoughput in samples/s:

batches batch_size fp32 quantized quantized + mha_interleave quantized + mask_softmax quantized + mask_softmax + mha_interleave
1000 24 19,74 25,08 25,26 34,36 34,80
500 1 14,06 16,97 18,02 20,24 21,81

Accuracy:

batch_size fp32 quantized quantized + mha_interleave quantized + mask_softmax quantized + mask_softmax + mha_interleave
EM 80,99 80,91 80,44 80,73 80,44
F1 88,60 88,33 88,06 88,29 88,06

There is also bug fix in interrleaved mha pass
Accuracy without mha_interleave bug fix: {'exact_match': 79.62157048249763, 'f1': 87.75497143592598}

@bgawrych bgawrych requested a review from a team as a code owner April 27, 2021 08:34
@bartekkuncer
Copy link

LGTM

@github-actions
Copy link

scripts/bert/bertpass.cc Outdated Show resolved Hide resolved
scripts/bert/bertpass.cc Outdated Show resolved Hide resolved
scripts/bert/bertpass.cc Outdated Show resolved Hide resolved
scripts/bert/bertpass.cc Show resolved Hide resolved
scripts/bert/index.rst Outdated Show resolved Hide resolved
@github-actions
Copy link

@bgawrych
Copy link
Contributor Author

@szha Can you help with CI? I'm not sure why it's failing and don't know how to rerun it.
website-build seems to fail on notebook I haven't edited

@barry-jin
Copy link
Contributor

Hi @bgawrych, Could you try to merge with v0.10.x, we ported new CI settings from v0.x to v0.10.x. Thanks!

@bgawrych
Copy link
Contributor Author

@barry-jin, @szha still issue with notebook, little strange as bert.md file was not changed for 1 year - should I fix it or it's CI issue?

@leezu
Copy link
Contributor

leezu commented May 13, 2021

@barry-jin I see the following error in the log, pointing out that there's a CI issue:

[2021-05-10T05:59:14.916Z] umount: /dev/shm: must be superuser to unmount.
[2021-05-10T05:59:14.917Z] mount: /dev/shm: permission denied.
[2021-05-10T05:59:14.917Z] ./gluon_nlp_job.sh: line 33: sudo: command not found

@barry-jin
Copy link
Contributor

@barry-jin, @szha still issue with notebook, little strange as bert.md file was not changed for 1 year - should I fix it or it's CI issue?

Hi @bgawrych, we have ported the changes in bert.md from v0.x branch, you could try to merge with current v0.10.x.

@barry-jin
Copy link
Contributor

@barry-jin I see the following error in the log, pointing out that there's a CI issue:

[2021-05-10T05:59:14.916Z] umount: /dev/shm: must be superuser to unmount.
[2021-05-10T05:59:14.917Z] mount: /dev/shm: permission denied.
[2021-05-10T05:59:14.917Z] ./gluon_nlp_job.sh: line 33: sudo: command not found

Thanks, I will fix this issue.

@bgawrych
Copy link
Contributor Author

@barry-jin

[2021-05-19T09:19:56.267Z] Exception occurred:
[2021-05-19T09:19:56.267Z]   File "/workspace/gluon-nlp/docs/conf.py", line 237, in setup
[2021-05-19T09:19:56.267Z]     app.add_javascript('google_analytics.js')
[2021-05-19T09:19:56.267Z] AttributeError: 'Sphinx' object has no attribute 'add_javascript'

@barry-jin
Copy link
Contributor

@barry-jin

[2021-05-19T09:19:56.267Z] Exception occurred:
[2021-05-19T09:19:56.267Z]   File "/workspace/gluon-nlp/docs/conf.py", line 237, in setup
[2021-05-19T09:19:56.267Z]     app.add_javascript('google_analytics.js')
[2021-05-19T09:19:56.267Z] AttributeError: 'Sphinx' object has no attribute 'add_javascript'

Will be fixed in #1575

@github-actions
Copy link

@szha szha merged commit b4d7c0f into dmlc:v0.10.x May 27, 2021
@szha
Copy link
Member

szha commented May 27, 2021

Merged. Thanks @bgawrych!

@ptrendx
Copy link
Contributor

ptrendx commented Jun 18, 2021

I'm suprised that the elimination of the 24x mask tensor creation gave you any speedup (as opposed to using masked softmax, which should) - MXNet already has common expression elimination pass (I wrote it: apache/mxnet#15657). Does that not work for you?

@bgawrych
Copy link
Contributor Author

@ptrendx I didn't know about this feature, but I wrote this small graph pass to test it:

#if MX_LIBRARY_VERSION <= 7
MXReturnValue TEST(const std::string& in_graph, const std::string** out_graph,
                          const std::unordered_map<std::string, std::string>& options,
                          const std::unordered_map<std::string, MXTensor>& args,
                          const std::unordered_map<std::string, MXTensor>& aux,
                          const PassResource& res) {
  Graph *g = Graph::fromString(in_graph);
#else
MXReturnValue TEST(mxnet::ext::Graph *g,
                          const std::unordered_map<std::string, std::string>& options) {
#endif

  Node* commonnode;
#if MX_LIBRARY_VERSION <= 7
  for(Node* n : g->nodes) {
#else
  for(int i=0; i < g->size(); i++) {
    Node* n = g->getNode(i);
#endif
    if (n->op.compare("softmax") == 0) {
      commonnode = n->inputs[1].node;
      break;
    }
  }

#if MX_LIBRARY_VERSION <= 7
  for(Node* n : g->nodes) {
#else
  for(int i=0; i < g->size(); i++) {
    Node* n = g->getNode(i);
#endif
    if (n->op.compare("softmax") == 0) {
      n->inputs[1].node = commonnode;
    }
  }

#if MX_LIBRARY_VERSION <= 7
  // convert back to JSON string from Graph/Node
  *out_graph = new std::string(g->toString());
#endif
  return MX_SUCCESS;
}

Overhead from these operators are negligible, but seems like it don't work in this case:
with graph pass:
image

w/o
image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants