-
Notifications
You must be signed in to change notification settings - Fork 90
feat: implement LSTM and GRU operators for torchlib #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement LSTM and GRU operators for torchlib #2674
Conversation
Implement aten_lstm and aten_gru operators to enable torch.onnx.export for PyTorch LSTM and GRU layers. This addresses issue microsoft#2546. Key features: - Full support for multi-layer RNNs (num_layers > 1) - Bidirectional support (forward and backward directions) - Handles both biased and non-biased configurations - batch_first parameter support with automatic transposition - Dropout support between layers (nondeterministic seeded) - Proper gate reordering for ONNX compatibility: * LSTM: PyTorch [i,f,g,o] -> ONNX [i,o,f,g] * GRU: PyTorch [r,z,n] -> ONNX [z,r,n] Implementation details: - Uses ONNX LSTM/GRU operators with proper parameter formatting - Handles weight matrix transposition and reshaping - Correctly concatenates biases using op.Concat - Processes each layer independently with proper state management - Returns outputs in PyTorch-compatible format Closes: microsoft#2546
|
@microsoft-github-policy-service agree |
|
Thanks for your contribution! Could you help create some tests in https://github.com/microsoft/onnxscript/blob/main/tests/function_libs/torch_lib/e2e_ops_tests.py? |
Update to aten::lstm.input and aten::gru.input
Move aten_gru function to appear after aten_ger for alphabetical ordering with other aten_g* functions. Also add hidden_size attribute computation to GRU for consistency with LSTM implementation.
Hi @justinchuby, I've addressed all comments: Regarding tests: In W_shape = op.Shape(W)
hidden_size_times_4 = op.Slice(W_shape, [1], [2], axes=[0])
hidden_size_attr = op.Div(hidden_size_times_4, op.Constant(value_ints=[4]))The result is a tensor/SymbolicValue, not a static integer that ONNX attributes require. Thank you! |
|
Thanks! Can the hidden_size be obtained from any of the function inputs? Is it static? If so you may do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lintrunner found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2674 +/- ##
==========================================
- Coverage 70.46% 70.11% -0.36%
==========================================
Files 224 224
Lines 26812 26956 +144
Branches 2686 2700 +14
==========================================
+ Hits 18893 18899 +6
- Misses 6987 7126 +139
+ Partials 932 931 -1 ☔ View full report in Codecov by Sentry. |
Use initial_h.shape[2] for LSTM and hx.shape[2] for GRU to get static hidden_size values instead of computing from weight matrices. This allows the attribute to be a Python integer rather than a SymbolicValue.
Add comprehensive tests for LSTM operator covering: - Unidirectional single-layer - Bidirectional single-layer - Multi-layer (3 layers) All tests pass successfully with the fixed hidden_size attribute.
Thanks! That worked. I've updated the code to use I added 3 LSTM tests and they all pass:
However, I'm having issues with GRU. The tests are failing with numerical accuracy errors - the outputs don't match PyTorch. I've double-checked the gate reordering (PyTorch [r,z,n] to ONNX [z,r,n]) and it looks correct, but something is causing the results to be different. Could you help me figure out what's wrong with the GRU implementation? All changes are pushed. Thanks for your help! |
|
Feel free to add commit the tests and I can take a look |
Add comprehensive tests for GRU operator covering: - Unidirectional single-layer - Bidirectional single-layer - Multi-layer (3 layers) Note: GRU tests currently fail with numerical accuracy issues.
Thanks! I've added all 6 tests and pushed them. The LSTM tests all pass, but the GRU tests are failing. The outputs don't match PyTorch within tolerance even though the gate reordering looks correct. |
|
@ombrdr47 the tests seem ok according to the CI? You may want to fix the lint errors |
titaiwangms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested to add dynamic tests (or even just promote all tests to be dynamic). Other than that, the PR looks good.
Hi @titaiwangms,
|
|
I think it's ok to skip dynamic shapes testing for now if the error comes from torch.export, thanks! |
|
Have not seen this error message for a while. Is the CI using old torch? (before oblivious backed size?) |
titaiwangms
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge for now. Thank you!
Right we do test with 2.7 |
|
@ombrdr47 Thank you very much! |
Thank you @justinchuby and @titaiwangms for your guidance. Happy to be a contributor to the project! |
Implement aten_lstm and aten_gru operators to enable torch.onnx.export for PyTorch LSTM and GRU layers. This addresses issue #2546.
Key features:
Implementation details:
Closes: #2546
Also resolves:
nn.GRUfails totorch.exportdue to unimplemented operator pytorch/pytorch#120626 (GRU export)