Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Commit

Permalink
Optimize Gelu operator for caffe2 export
Browse files Browse the repository at this point in the history
Summary:
TIL ONNX->Caffe2 is very memory inefficient, it creates an intermediate blob for each intermediate output. So, the Gelu operator creates a lot of intermediate ops since it does a bunch of math.

Fix is to use the caffe2 Gelu operator, so all that computation is captured in a single op.

https://pxl.cl/HzGf

Differential Revision: D16849396

fbshipit-source-id: 4903c614833ae4ad8a84c6eddc2382b2a24872f3
  • Loading branch information
geof90 authored and facebook-github-bot committed Aug 16, 2019
1 parent a170dd4 commit 6d6f1da
Showing 1 changed file with 15 additions and 5 deletions.
20 changes: 15 additions & 5 deletions pytext/optimizer/activations.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,21 @@ class GeLU(nn.Module):
"""

def forward(self, x):
return (
0.5
* x
* (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * (x * x * x))))
)
if torch.onnx.is_in_onnx_export():
# ONNX -> Caffe2 conversion will create an intermediate blob for
# each intermediate math output, which is very memory inefficient.
# We use the Gelu operator directly to reduce the memory footprint
# in the exported model.
return torch.ops._caffe2.Gelu(x, True)
else:
return (
0.5
* x
* (
1
+ torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * (x * x * x)))
)
)


def get_activation(name):
Expand Down

0 comments on commit 6d6f1da

Please sign in to comment.