-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay, TOPI] Add numpy style cumsum op #7334
Conversation
0eadcc6
to
7612cfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks, but overall LGTM
@tkonolige @mbrookhart Comments were addressed. CPU cumsum is now done in parallel over non-scan axes, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I put a couple minor nits for documentation, but otherwise this is good to go.
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Thanks @tkonolige @mbrookhart |
Hi @masahi , But it still does not work. |
typo -> "CumSum": AttrCvt("cumsum",{"axis": "axis", "dtype": "dtye"})," |
@ybai62868 please post this to the discuss forum or open an issue, clarifying what exactly you mean by "it still does not work" |
@masahi Hi, what I mean "it still does not work" is that it will report this message " tvm.error.OpNotImplemented: The following operators are not supported for frontend ONNX: CumSum " although I add and change the AttrCvt("cumsum",{"axis": axis, "dtype": dtye}) and rename the CumSum to cumsum. |
@ybai62868 This PR adds cumsum support at the topi level. The work to add support to onnx has not been done yet. |
* Add cumsum relay/topi op * relay tests working * add torch frontend converter * fix for importing detr * fix bad merge * begin cuda cumsum * support non innermost axis * support rank higher than 3 * making binop parameter * fix overflow issue in thrust scan * generic binop parameter working * relay test working * fixed for bool input * remove pytorch change * fix pylint * doc update * Update python/tvm/topi/cumsum.py Co-authored-by: Tristan Konolige <[email protected]> * Update tests/python/relay/test_op_level3.py Co-authored-by: Tristan Konolige <[email protected]> * add example outputs * add supported input and output dtype in thrust log * adding more loop var names * fix cpplint * fix missing check for the cuda target in nms thrust sort * parallelize cpu cumsum * making binop argument tir function * update doc for binop * doc update Co-authored-by: Tristan Konolige <[email protected]>
* Add cumsum relay/topi op * relay tests working * add torch frontend converter * fix for importing detr * fix bad merge * begin cuda cumsum * support non innermost axis * support rank higher than 3 * making binop parameter * fix overflow issue in thrust scan * generic binop parameter working * relay test working * fixed for bool input * remove pytorch change * fix pylint * doc update * Update python/tvm/topi/cumsum.py Co-authored-by: Tristan Konolige <[email protected]> * Update tests/python/relay/test_op_level3.py Co-authored-by: Tristan Konolige <[email protected]> * add example outputs * add supported input and output dtype in thrust log * adding more loop var names * fix cpplint * fix missing check for the cuda target in nms thrust sort * parallelize cpu cumsum * making binop argument tir function * update doc for binop * doc update Co-authored-by: Tristan Konolige <[email protected]>
* Add cumsum relay/topi op * relay tests working * add torch frontend converter * fix for importing detr * fix bad merge * begin cuda cumsum * support non innermost axis * support rank higher than 3 * making binop parameter * fix overflow issue in thrust scan * generic binop parameter working * relay test working * fixed for bool input * remove pytorch change * fix pylint * doc update * Update python/tvm/topi/cumsum.py Co-authored-by: Tristan Konolige <[email protected]> * Update tests/python/relay/test_op_level3.py Co-authored-by: Tristan Konolige <[email protected]> * add example outputs * add supported input and output dtype in thrust log * adding more loop var names * fix cpplint * fix missing check for the cuda target in nms thrust sort * parallelize cpu cumsum * making binop argument tir function * update doc for binop * doc update Co-authored-by: Tristan Konolige <[email protected]>
* Add cumsum relay/topi op * relay tests working * add torch frontend converter * fix for importing detr * fix bad merge * begin cuda cumsum * support non innermost axis * support rank higher than 3 * making binop parameter * fix overflow issue in thrust scan * generic binop parameter working * relay test working * fixed for bool input * remove pytorch change * fix pylint * doc update * Update python/tvm/topi/cumsum.py Co-authored-by: Tristan Konolige <[email protected]> * Update tests/python/relay/test_op_level3.py Co-authored-by: Tristan Konolige <[email protected]> * add example outputs * add supported input and output dtype in thrust log * adding more loop var names * fix cpplint * fix missing check for the cuda target in nms thrust sort * parallelize cpu cumsum * making binop argument tir function * update doc for binop * doc update Co-authored-by: Tristan Konolige <[email protected]>
This adds a numpy style cumsum op to Relay/TOPI. The spec is identical with numpy one except there is no promotion of int32 -> int64 output dtype, for the case when the input is a int32 tensor and output dtype is not provided.
https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html
Both CPU and GPU are supported, and it is especially efficient if thrust is available. I updated TIR scan IR to support cumsum on any rank. But scan is still done only on the innermost axis, so transposing is required when the scan axis is not the innermost one.
please review @jwfromm @mbrookhart @zhiics @kevinthesun @junrushao1994 @antinucleon