Add mechanism for remapping device-specific module imports #4539

int3 · 2024-08-19T15:02:49Z

This is motivated by #4509. The crux of the problem is that the Triton code generator needs to inspect a function's arguments / attributes / types in order to determine how it should be called. This meant that "implementation details" like whether a function is a builtin needed to be exposed in the "interface" tl.extra.libdevice module, instead of just residing in tl.extra.cuda.libdevice. Moreover, this meant that libdevice functions marked as @core.extern in the interface could not be implemented via JitFunctions.

Allowing each backend to provide its own module map solves this problem as the code generator can inspect the actual function implementation.

int3 · 2024-08-19T15:11:32Z

I initially tried to tackle #4509 with a purely "userspace" solution by changing @dispatch from a runtime-rebinding decorator to a declaration-time-rebinding one. But this wasn't possible because we don't know which backend will be used at declaration time. I then drafted https://github.com/int3/triton-cpu/tree/resolve which allows for user-defined dynamic symbol resolvers. Ultimately I settled on this PR because it seems simpler and doesn't actually change the semantics of the language.

int3 · 2024-08-19T18:38:27Z

fixed test

This is motivated by triton-lang#4509. The crux of the problem is that the Triton code generator needs to inspect a function's arguments / attributes / types in order to determine how it should be called. This meant that "implementation details" like whether a function is a builtin needed to be exposed in the "interface" `tl.extra.libdevice` module, instead of just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that libdevice functions marked as @core.extern in the interface could not be implemented via JitFunctions. Allowing each backend to provide its own module map solves this problem as the code generator can inspect the actual function implementation.

This is motivated by #4509. The crux of the problem is that the Triton code generator needs to inspect a function's arguments / attributes / types in order to determine how it should be called. This meant that "implementation details" like whether a function is a builtin needed to be exposed in the "interface" `tl.extra.libdevice` module, instead of just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that libdevice functions marked as `@core.extern` in the interface could not be implemented via JitFunctions. Allowing each backend to provide its own module map solves this problem as the code generator can inspect the actual function implementation.

…#134774) In triton-lang/triton#4539 the `make_ir` API was modified to accept a new `module_map` parameter. Update the Inductor callsite accordingly, preserving backwards compatibility following the existing code. Fixes #134674 Pull Request resolved: #134774 Approved by: https://github.com/EikanWang, https://github.com/zou3519, https://github.com/jansel

…pytorch#134774) In triton-lang/triton#4539 the `make_ir` API was modified to accept a new `module_map` parameter. Update the Inductor callsite accordingly, preserving backwards compatibility following the existing code. Fixes pytorch#134674 Pull Request resolved: pytorch#134774 Approved by: https://github.com/EikanWang, https://github.com/zou3519, https://github.com/jansel

Context: In `CodeGenerator.__init__`, globals for a given triton function are modified to handle remapping the libdevice module to cuda or hip (from triton-lang#4539). In particular, this logic: ```python for k, v in gscope.items(): # gscope is a dict of fn.__globals__ ... self.gscope[k] = getattr(module_map[module_name], k) ``` was failing if you do this in the global scope: `from triton.language.extras.libdevice import fast_dividef as my_fast_dividef`.

…5081) Context: in `CodeGenerator.__init__`, globals for a given triton function are modified to handle remapping the libdevice module to cuda or hip (from #4539). In particular, this logic: ```python for k, v in gscope.items(): # gscope is a dict of fn.__globals__ ... self.gscope[k] = getattr(module_map[module_name], k) ``` was failing if you do this in the global scope: `from triton.language.extras.libdevice import fast_dividef as my_fast_dividef`.

…riton-lang#5081) Context: in `CodeGenerator.__init__`, globals for a given triton function are modified to handle remapping the libdevice module to cuda or hip (from triton-lang#4539). In particular, this logic: ```python for k, v in gscope.items(): # gscope is a dict of fn.__globals__ ... self.gscope[k] = getattr(module_map[module_name], k) ``` was failing if you do this in the global scope: `from triton.language.extras.libdevice import fast_dividef as my_fast_dividef`.

…ng#4539) This is motivated by triton-lang#4509. The crux of the problem is that the Triton code generator needs to inspect a function's arguments / attributes / types in order to determine how it should be called. This meant that "implementation details" like whether a function is a builtin needed to be exposed in the "interface" `tl.extra.libdevice` module, instead of just residing in `tl.extra.cuda.libdevice`. Moreover, this meant that libdevice functions marked as `@core.extern` in the interface could not be implemented via JitFunctions. Allowing each backend to provide its own module map solves this problem as the code generator can inspect the actual function implementation.

…5081) Context: in `CodeGenerator.__init__`, globals for a given triton function are modified to handle remapping the libdevice module to cuda or hip (from triton-lang/triton#4539). In particular, this logic: ```python for k, v in gscope.items(): # gscope is a dict of fn.__globals__ ... self.gscope[k] = getattr(module_map[module_name], k) ``` was failing if you do this in the global scope: `from triton.language.extras.libdevice import fast_dividef as my_fast_dividef`.

int3 requested review from antiagainst, ptillet and zhanglx13 as code owners August 19, 2024 15:02

int3 mentioned this pull request Aug 19, 2024

How should we approach libdevice calls that don't map to a libmvec function? triton-lang/triton-cpu#111

Closed

ThomasRaoux mentioned this pull request Aug 19, 2024

Allow third-party backends to add submodules to triton.language.extra #4503

Merged

7 tasks

int3 force-pushed the module-map branch from c921cb6 to ffc68da Compare August 19, 2024 18:37

int3 force-pushed the module-map branch from ffc68da to f3c4b73 Compare August 19, 2024 18:46

ThomasRaoux approved these changes Aug 22, 2024

View reviewed changes

ThomasRaoux merged commit 2ea4890 into triton-lang:main Aug 22, 2024

jlotthammer mentioned this pull request Aug 25, 2024

Fix incorrect import for triton 3 linkedin/Liger-Kernel#79

Merged

alexbaden mentioned this pull request Aug 29, 2024

[Inductor] Support passing module map parameter to Triton make_ir API pytorch/pytorch#134774

Closed

jlebar mentioned this pull request Sep 3, 2024

Build LLVMAarch64CodeGen if CMAKE_OSX_ARCHITECTURES is arm64. #4637

Merged

davidberard98 mentioned this pull request Nov 6, 2024

[FRONTEND] Fix handling of from m import x as y in CodeGenerator #5081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mechanism for remapping device-specific module imports #4539

Add mechanism for remapping device-specific module imports #4539

Uh oh!

int3 commented Aug 19, 2024

Uh oh!

int3 commented Aug 19, 2024

Uh oh!

int3 commented Aug 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add mechanism for remapping device-specific module imports #4539

Add mechanism for remapping device-specific module imports #4539

Uh oh!

Conversation

int3 commented Aug 19, 2024

Uh oh!

int3 commented Aug 19, 2024

Uh oh!

int3 commented Aug 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants