Skip to content

Conversation

@guillaume-be
Copy link
Contributor

@guillaume-be guillaume-be commented Jan 7, 2021

What does this PR do?

This PR proposes an optimization for the ProphetNet model. The current implementation calculates an attention bias mask by looping through the position to unmask. It performs a high number of assignments (ngram * sequence_length) which can be in the order of ~1000. Single tensor assignments, especially on accelerators, are inefficient.

This PR proposes a vectorized implementation which performs at most ngram assignments, which should be significantly lower than ngram * sequence_length.

A quick experiment shown at https://gist.github.com/guillaume-be/e6b862c701fac1b54765e7af7e71c641 shows that:

  1. this ngram_attention_bias calculation is very expensive, taking close to 230ms (!) on a GPU
  2. the vectorized implementation is several orders of magnitude faster (the same calculation takes less than 1ms on the same example)

Who can review?

@patrickvonplaten maybe you would be a good candidate? I could not find anyone assigned for ProphetNet

edit: pushed some further optimization, further accelerating by ~40%

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @guillaume-be ! Very nice to remove the for i in range(sequence_length) call -> this does indeed look very suboptimal.

I'm fine with the PR! When all changes are made I'll run the slow tests of ProphetNet locally to make sure nothing broke unexpectantly and merge then :-)

@patrickvonplaten
Copy link
Contributor

All slow tests are passing! Very nice PR - thanks a mille @guillaume-be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants