-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime stitching APIs and sanity tests, ttnn runtime submit refactor #1301
Conversation
2799458
to
95ae05f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/1)
107f0ad
to
6138bd5
Compare
6138bd5
to
80419f0
Compare
@pilkicTT / @rpavlovicTT, would really like your input on these runtime API changes. |
c3f32d6
to
274cf17
Compare
At the moment it looks good overall. Few general questions --
|
@nsmithtt Sorry, I haven't had a chance yet to take a look. Hopefully, later today or tomorrow (in the worst case)... @jnie-TT I've noticed there are multiple unrelated changes in this PR, this should be avoided, especially with larger changes. For at least two reasons:
I am not blocking on this, just think it's good to avoid these situations in the future. Thanks! |
|
274cf17
to
a3541bf
Compare
Thanks. When you say
Are you saying we'll be able to poll if tensor is ready/not?
I asked this question but don't have a concrete answer, it's more of a question for TPRT folks. cc @mrakitaTT @AleksKnezevic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see in the code, it doesn't look like the inputs to the program are allowed to be either on host or device (i.e. they must be always on host, if the binary expects it that way)? I couldn't find the place where the inputs to the program are specially handled if they're already on the device?
It would be good if we add tests for these kinds of basic API scenarios. Not necessarily with this PR...
I'll add some runtime stitching tests that pre-moves tensors to device and run with tensors already on device, though right now the flatbuffer binaries have all input tensors hardcoded to host. We're currently looking at how we can canonicalize the input layout, would also be great to get your feedback there from a FE perspective. |
Yes the APIs will block and poll for tensor readiness, and return the info when the tensor is ready. |
@jnie-TT Thanks for the confirmation. Can you please update the commit & PR description (for future reference)? Just to clarify that at this point, the APIs are prepared but cannot effectively be used in the context of stitching multiple programs by keeping the inputs always on the device (since we always assume the inputs are on host). With regards to the way how we can implement this, I am really not sure (i will throw out some ideas)... I was thinking that for v0, we would just handle the cases in the runtime (without reflecting it in the binary or the IR). Obviously not ideal, but the most simple way. For a proper solution, I was thinking about if there is an option to extend dialects (both TTIR and TTNNIR) in the following way:
Not sure, if this makes sense in the context of mlir, but it looks to me that we'll have complete transparency about what happens in the program (at least from the standpoint of moving tensors between the host and the device). |
a3541bf
to
4361f74
Compare
3c5eda8
to
124aa5d
Compare
64d5f5d
to
b64b43e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for adding tests! Just as a sanity check before we land, can we kick off a tt-forge-fe CI run rebased on this change?
Let's try to pre-empt any fallout.
b64b43e
to
6e8d137
Compare
Forge CI green: https://github.com/tenstorrent/tt-forge-fe/actions/runs/12149172667 |
6e8d137
to
c61e3f7
Compare
closes #671 #103 #366
Runtime support for APIs that enable stitching separate runtime submit calls together.
APIs
std::vector submit(Device, Binary, programIndex, inputTensors)
Layout getLayout(Binary, ProgramIndex, InputIndex)
Tensor toLayout(Tensor, Device, Layout)
Tensor createTensor(Device device, Layout layout,
std::vectorstd::uint32_t const &shape,
std::vectorstd::uint32_t const &stride,
std::uint32_t itemsize)
void deallocateTensor(Tensor, bool force)
void memcpy(Tensor dst, Tensor src)
Tensor toHost(Tensor, bool untilize)
Testing
runtime/test/python/ttnn/test_runtime_api.py
.getLayout
until compiler support is added.test/ttmlir/Runtime
that contains the hacked ttnn dialect mlir for the runtime stitching test. Adding it under Silicon didn't work because ttrt will run all flatbuffers generated there, and the runtime stitching mlir needs a different flow. I'm open to suggestions for better locations of adding this, but it would be great to have a dedicated location to add mlir files that the compiler may not support yet and are used specifically for runtime testing.Backwards Compatibility
Misc