[BACKEND] Improve printf. by jlebar · Pull Request #2532 · triton-lang/triton

jlebar · 2023-10-23T19:00:15Z

[BACKEND] Improve printf.

Previously, we printed all of a GPU thread's values in a single printf()
call, and this, plus the user-specified prefix, was all we printed.

This caused a few problems.

nvptx printf can only handle 32 arguments; if you pass more than
that, it prints garbage. So if a thread had more than 32 values, you
couldn't print them, issue issues with print function #2486.
The order of the values within the Triton program (GPU thread block)
is an implementation detail -- it depends on the layout the compiler
assigns to a tensor. So this also prevented you from interpreting
the printed output.

To address this, we now print the Triton pid and multi-dimensional
Tensor index for each value. And each value gets its own line to avoid
passing too many args to printf.

Example output:

```
pid (0, 1, 2) idx (36, 127) x: 42
```

If you want to observe all the values in a tensor in order, you can grep
and then sort the output.

We also make a UX enhancement to print: The printed label always ends
with ": "; you don't have to add it yourself.

Fixes #2486.

PR chain

👉 [BACKEND] Improve printf. #2532 👈 YOU ARE HERE

ThomasRaoux · 2023-10-23T19:21:47Z

+  // Cache the llvm Value for each format string to avoid making duplicates in
+  // the module.  Sorry about `mutable`, but the top-level matchAndRewrite
+  // function is const, so it virally comes down here.
+  mutable llvm::StringMap<Value> formatStrCache;


Caching will break if the new print function is not dominated by the value that has been cached. For instance it will always fail if we are printing from different functions. In general doing caching during rewrite patterns is not trivial, we could cache only the global value but this is probably a premature optimization.

addStringToModule adds the string as a global, so shouldn't this work? https://github.com/openai/triton/blob/020f43d5a37c2f1cce5c70fcda17ec57b000755c/lib/Conversion/TritonGPUToLLVM/Utility.cpp#L354

(The reason I wanted to cache this is otherwise we get BLOCK_SIZE copies of the format-string. Or otherwise we have to rewrite it so we generate the format-string only once per block, but that's also bug-prone and harder to read.)

but it creates operations that are within a block:

Value stringStart = rewriter.create<LLVM::GEPOp>(UnknownLoc::get(ctx), ptr_ty(i8_ty), globalPtr, SmallVector<Value>({zero, zero})); return stringStart;

if those are used by an instruction that isn't dominated by the cached ops it would break SSA. Can you change the code so that this gets called outside the loop?

Indeed you're completely correct. Done, thanks.

Jokeren · 2023-10-23T20:37:07Z

Another comment regarding the output format. Is it possible to make the format consistent with the output of the assert which also annotates block ids and thread ids?

jlebar · 2023-10-23T22:45:36Z

Another comment regarding the output format. Is it possible to make the format consistent with the output of the assert which also annotates block ids and thread ids?

The assert message e.g.

/root/code/triton/python/test/unit/language/assert_helper.py:51: kernel_device_assert: block: [0,0,0], thread: [31,0,0] Assertion `x != 0` failed.

is actually coming from nvptx. The only part that we control is x != 0.

The block/thread-id here is CUDA, not Triton pid. AIUI Triton pid is only equivalent to the block id pre-Hopper; on Hopper, the Triton pid seems to be equal to the cluster ID.

Since we don't have control over the assertion message and since we can't get it to print out the actual Triton PID, I'm inclined not to try to match it. WDYT, @Jokeren ?

Jokeren · 2023-10-24T00:50:24Z

The block/thread-id here is CUDA, not Triton pid. AIUI Triton pid is only equivalent to the block id pre-Hopper; on Hopper, the Triton pid seems to be equal to the cluster ID.

OK, that sounds reasonable.

ptillet · 2023-10-24T03:29:43Z

thanks @jlebar ! It is a great improvement to our printer :)

ThomasRaoux

Looks great!

Previously, we printed all of a GPU thread's values in a single printf() call, and this, plus the user-specified prefix, was all we printed. This caused a few problems. - nvptx printf can only handle 32 arguments; if you pass more than that, it prints garbage. So if a thread had more than 32 values, you couldn't print them, issue #2486. - The order of the values within the Triton program (GPU thread block) is an implementation detail -- it depends on the layout the compiler assigns to a tensor. So this also prevented you from interpreting the printed output. To address this, we now print the Triton pid and multi-dimensional Tensor index for each value. And each value gets its own line to avoid passing too many args to printf. Example output: ``` pid (0, 1, 2) idx (36, 127) x: 42 ``` If you want to observe all the values in a tensor in order, you can grep and then sort the output. We also make a UX enhancement to print: The printed label always ends with ": "; you don't have to add it yourself. Fixes #2486. GPC: improve-printf

[BACKEND] Improve printf. Previously, we printed all of a GPU thread's values in a single printf() call, and this, plus the user-specified prefix, was all we printed. This caused a few problems. - nvptx printf can only handle 32 arguments; if you pass more than that, it prints garbage. So if a thread had more than 32 values, you couldn't print them, issue triton-lang#2486. - The order of the values within the Triton program (GPU thread block) is an implementation detail -- it depends on the layout the compiler assigns to a tensor. So this also prevented you from interpreting the printed output. To address this, we now print the Triton pid and multi-dimensional Tensor index for each value. And each value gets its own line to avoid passing too many args to printf. Example output: ``` pid (0, 1, 2) idx (36, 127) x: 42 ``` If you want to observe all the values in a tensor in order, you can grep and then sort the output. We also make a UX enhancement to print: The printed label always ends with ": "; you don't have to add it yourself. Fixes triton-lang#2486.

jlebar requested a review from ptillet as a code owner October 23, 2023 19:00

This was referenced Oct 23, 2023

[BACKEND] Delete dead vprintf and vprintf_array functions #2531

Merged

[BUILD] Add TRITON_BUILD_WITH_ASAN envvar #2533

Merged

ThomasRaoux reviewed Oct 23, 2023

View reviewed changes

Jokeren changed the title ~~Improve printf.~~ [BACKEND] Improve printf. Oct 23, 2023

jlebar changed the base branch from jlebar/dead-vprintf to main October 23, 2023 22:46

jlebar force-pushed the jlebar/improve-printf branch from 5424cc4 to 9794ae7 Compare October 23, 2023 22:46

jlebar changed the base branch from main to jlebar/asan-build October 23, 2023 22:46

jlebar force-pushed the jlebar/asan-build branch from 8e47509 to 295550e Compare October 23, 2023 22:51

jlebar force-pushed the jlebar/improve-printf branch 2 times, most recently from ad55ad0 to 28ae3e6 Compare October 23, 2023 23:40

Base automatically changed from jlebar/asan-build to jlebar/dead-vprintf October 24, 2023 02:42

jlebar force-pushed the jlebar/dead-vprintf branch from f3c919c to 8b50cfd Compare October 24, 2023 03:07

jlebar mentioned this pull request Oct 24, 2023

Add TRITON_BUILD_WITH_ASAN envvar. #2536

Closed

jlebar changed the base branch from jlebar/dead-vprintf to jlebar/asan-build October 24, 2023 03:07

jlebar changed the base branch from jlebar/asan-build to main October 24, 2023 03:10

jlebar force-pushed the jlebar/improve-printf branch from 28ae3e6 to 4844a4a Compare October 24, 2023 03:10

jlebar changed the base branch from main to jlebar/dead-vprintf October 24, 2023 03:10

jlebar requested review from ThomasRaoux and removed request for ptillet October 24, 2023 04:51

jlebar force-pushed the jlebar/dead-vprintf branch from 05876d1 to 6308407 Compare October 24, 2023 05:09

jlebar force-pushed the jlebar/improve-printf branch from 4844a4a to 529141a Compare October 24, 2023 05:09

jlebar mentioned this pull request Oct 24, 2023

[BUILD] Add TRITON_BUILD_WITH_ASAN envvar. #2537

Merged

ThomasRaoux approved these changes Oct 24, 2023

View reviewed changes

jlebar force-pushed the jlebar/improve-printf branch from 529141a to 00d8de7 Compare October 24, 2023 17:30

jlebar force-pushed the jlebar/dead-vprintf branch from 6308407 to 7fcaed7 Compare October 24, 2023 17:30

jlebar force-pushed the jlebar/improve-printf branch from 00d8de7 to 75fa8a2 Compare October 24, 2023 19:16

jlebar force-pushed the jlebar/dead-vprintf branch from 7fcaed7 to da2abfa Compare October 24, 2023 19:16

Base automatically changed from jlebar/dead-vprintf to main October 25, 2023 00:22

jlebar force-pushed the jlebar/improve-printf branch from 75fa8a2 to 74564a5 Compare October 25, 2023 04:17

jlebar enabled auto-merge (squash) October 25, 2023 04:17

jlebar merged commit e70e11e into main Oct 25, 2023

jlebar deleted the jlebar/improve-printf branch October 25, 2023 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND] Improve printf.#2532

[BACKEND] Improve printf.#2532
jlebar merged 1 commit into
mainfrom
jlebar/improve-printf

jlebar commented Oct 23, 2023 •

edited

Loading

Uh oh!

ThomasRaoux Oct 23, 2023

Uh oh!

jlebar Oct 23, 2023

Uh oh!

ThomasRaoux Oct 23, 2023

Uh oh!

jlebar Oct 23, 2023

Uh oh!

Uh oh!

Jokeren commented Oct 23, 2023 •

edited

Loading

Uh oh!

jlebar commented Oct 23, 2023

Uh oh!

Jokeren commented Oct 24, 2023

Uh oh!

ptillet commented Oct 24, 2023

Uh oh!

ThomasRaoux left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jlebar commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR chain

Uh oh!

ThomasRaoux Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlebar Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

ThomasRaoux Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

jlebar Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jokeren commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlebar commented Oct 23, 2023

Uh oh!

Jokeren commented Oct 24, 2023

Uh oh!

ptillet commented Oct 24, 2023

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jlebar commented Oct 23, 2023 •

edited

Loading

Jokeren commented Oct 23, 2023 •

edited

Loading