LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

vicary · 2022-09-20T19:00:48Z

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Monterey 12.6
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
TensorFlow.js installed from (npm or script link): https://cdn.skypack.dev/@tensorflow/tfjs
TensorFlow.js version (use command below): 3.20.0
Browser version: Deno (built from source at commit 2929ec9f)
Tensorflow.js Converter Version: N/A

Describe the current behavior

I came from #6746 and made denoland/deno#15853 to have deno compatible with the WebGPU backend. Model predictions sometimes resulting in an all-zeros tensor, the problem does not exist with a CPU backend.

Describe the expected behavior

Output tensors should contain non-zero numbers.

Standalone code to reproduce the issue

import * as tf from "https://cdn.skypack.dev/@tensorflow/tfjs?dts";
import "https://cdn.skypack.dev/@tensorflow/tfjs-backend-webgpu";

await tf.ready();
const model = tf.sequential();
model.add(tf.layers.dense({ units: 1, inputShape: [1] });
const output = await model.predict(tf.tensor([1])).array();

console.log(output); // prints [[0]]

Other info / logs
I am on an Apple M1 laptop.

Backstory and tracking issues:

Deno uses deno_webgpu from gfx-rs/wgpu, which in turn uses gfx-rs/naga for all shader related things.
WGSL const-declarations gfx-rs/wgpu#2994
[wgsl-in] allow functions to be declared literally after entry point gfx-rs/naga#2072
Global item does not support 'const' gfx-rs/naga#2071
[wgsl-in] Implement const expressions gfx-rs/naga#1829
Turns out Naga is lexical scoping instead of functional/modulo scoping and they are changing that in [wgsl-in] Implement module-level scoping gfx-rs/naga#2075

The text was updated successfully, but these errors were encountered:

qjia7 · 2022-09-21T01:21:30Z

@haoyunfeix Please take a look, thanks.

haoyunfeix · 2022-09-21T05:25:53Z

@vicary Thanks for your effort on denoland/deno#15853 and report this bug! And could you please show me the steps how to build deno from source at commit 2929ec9f if possible? I would like to use a local build of deno to reproduce this issue in your way.

@qjia7 In another way, I can reproduce this issue with local build webgpu backend by skipping the feature check like:

-    const supportTimeQuery = adapter.features.has('timestamp-query');
+    const supportTimeQuery = false;

Here is the code:

import * as tf from 'https://cdn.skypack.dev/@tensorflow/tfjs'
import '../dist/bin/tfjs-core/tfjs-core_pkg/dist/tf-core.es2017.js'
import '../dist/bin/tfjs-backend-webgpu/tfjs-backend-webgpu_pkg/dist/tf-backend-webgpu.es2017.js'

async function test(backend) {
  // initialize tensorflow
  if(await tf.setBackend(backend)){
    await tf.ready()

    const model = tf.sequential();
    model.add(tf.layers.dense({ units: 1, inputShape: [1] }));
    const output = await model.predict(tf.tensor([1])).array();
    console.log(`The output of ${backend} is ${output}`); // prints [[0]]
  }
  else{
    console.log(`${backend} is not set successfully!`);
  }
}

await test('webgpu');
await test('cpu');

command and output:

wp >>> ~/.deno/bin/deno run --allow-write --allow-read --allow-net --unstable mod3.ts                  22-09-21 13:03 
libEGL warning: pci id for fd 12: 102b:0522, driver (null)

WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
The output of webgpu is 0
The output of cpu is -0.8412973284721375

deno version:

wp >>> ~/.deno/bin/deno --version                                                                      22-09-21 13:04 
deno 1.25.0 (release, x86_64-unknown-linux-gnu)
v8 10.6.194.5
typescript 4.7.4

vicary · 2022-09-21T05:59:45Z

And could you please show me the steps how to build deno from source at commit 2929ec9f if possible? I would like to use a local build of deno to reproduce this issue in your way.

I followed the steps in their docs: https://deno.land/[email protected]/contributing/building_from_source

brew install rust llvm
git clone --recurse-submodules https://github.com/denoland/deno.git
cd deno
git checkout 2929ec9f Optional because the current HEAD should behave the same.
I am using M1 and I don't care about debug info & profiling, so I built with two more options:
cargo build -vv --target=aarch64-apple-darwin --release
If their default target doesn't work for your linux, you may look up availble build targets via:
rustc --print target-list
You may now run the resulting binary at ./target/*/release/deno

You may use -A to allow all permissions: deno run -A --unstable mod3.ts

vicary · 2022-09-29T07:23:58Z

Gently pinging @haoyunfeix, is there anything I can do to help move this forward?

haoyunfeix · 2022-09-29T19:15:16Z

@vicary Sorry for the delay!

Seems there are some shader validation difference between browser and deno.
Here is a shader generated by tfjs shader.txt, it runs well on Chrome Canary 108.0.5329.0 (V8 10.8.79) but met several validation err in deno and got zeros. This shader got from code:

import '../dist/bin/tfjs-core/tfjs-core_pkg/dist/tf-core.es2017.js'
import '../dist/bin/tfjs-backend-webgpu/tfjs-backend-webgpu_pkg/dist/tf-backend-webgpu.es2017.js'

import * as tf from 'https://cdn.skypack.dev/@tensorflow/tfjs'

async function test(backend) {
  // initialize tensorflow
  if (await tf.setBackend(backend)) {
    await tf.ready()

    const a = tf.tensor2d([1, 2, -3, -4], [2, 2]);
    const b = tf.tensor1d([1, 2]);

    tf.env().set('WEBGPU_CPU_FORWARD', false);
    let c = tf.add(a, b);
    console.log(await a.data());
    console.log(await b.data());
    console.log(await c.data());
    console.log(tf.getBackend());
  } else {
    console.log(`${backend} is not set successfully!`);
  }
}

await test('webgpu');
await test('cpu');

Error	Code Example
	`const workGroupSizeX = 256u; const workGroupSizeY = 1u; const workGroupSizeZ = 1u;`
	`@compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)`
	`Call another function in entryPoint function`

Errors happens at https://github.com/tensorflow/tfjs/blob/master/tfjs-backend-webgpu/src/webgpu_program.ts#L59
After I fixed these shader compile errors, I got the correct result but I don't think it's a good idea to change current shaders.

@qjia7 WDYT? Is it an issue caused by different version of WGSL?

qjia7 · 2022-09-30T00:31:11Z

@haoyunfeix @vicary I think deno is relying on an old webgpu(wgsl) point. But tfjs-webgpu is following up the latest webgpu/WGSL spec. So to resolve this issue, deno needs to upgrade the underlying webgpu implementation in your side.

qjia7 · 2022-09-30T00:34:18Z

@vicary Please let us know whether this issue can be fixed after upgrading webgpu in deno.

vicary · 2022-09-30T03:16:07Z

@qjia7 Deno is using a rust implementation of WebGPU deno_webgpu, I think it is already using the latest version. Quoting from their README.md:

The spec is still very bare bones, and is still missing many details. As the spec becomes more concrete, we will implement to follow the spec more closely.

The Shader Support in the monorepo also mentioned a partial support of the draft.

Note that the WGSL specification is still under development, so the draft specification does not exactly describe what wgpu supports. See below for details.

Sorry for my lack of knowledge of the WebGPU standard, it would be very kind of you to address the earliest compatible version or specific features. I believe this would create a legit use case for them to work on those specific implementations.

qjia7 · 2022-09-30T04:04:55Z

@vicary From the error message provided by @haoyunfeix , it's due to wgpu is behind of the latest WGSL spec. So maybe a better way is to report a bug to https://github.com/gfx-rs/wgpu/ to let it support TFJS webgpu's shaders instead of workaround this in TFJS-webgpu. How do you think?
@haoyunfeix Can you help file an issue to wgpu to let it support those features that we needed, like const?

vicary · 2022-09-30T06:02:34Z

Thanks for the reply @qjia7, also please let me thank @haoyunfeix ahead of time for creating the issue upstream.

From the first 2 errors I get that it's about the missing const keyword, I have updated the issue description with related tracking issues in the wgpu repo.

I am not sure I understand the last error though, maybe the rust implementation requires the main function to be declared before entrypoint? I want to have a deeper understanding on this issue, and hopefully making a PR there if it's within my capabilities. @haoyunfeix would you mind sharing your temporary fix while you are preparing the upstream issue?

haoyunfeix · 2022-09-30T06:51:40Z

@qjia7 Since wgpu relies on naga for wgsl-in compilation, I submitted const issue to gfx-rs/naga#2071

haoyunfeix · 2022-09-30T07:00:23Z

@haoyunfeix would you mind sharing your temporary fix while you are preparing the upstream issue?

Sure. My temporary fix as below, to move main implantation into _start entry point:
Before:

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

        fn main(index : i32)
       {
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }

After:

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          var index = getGlobalIndex();
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }

You could see that the other _start outside function like getCoordsFromIndex and getAByOutputIndex can compile and works well, so maybe we shouldn't call a function named main? I'll take a double check.

haoyunfeix · 2022-09-30T07:14:40Z

Oh, seems function must declare before entry point AND before @compute @workgroup_size(256, 1, 1)
Shader as below is workable.

...
        var<workgroup> sharedBuf : array<f32, 2>;
        fn main(index : i32){
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }


  @compute @workgroup_size(256, 1, 1)
  //@compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

gyagp · 2022-09-30T07:43:01Z

Oh, seems function must declare before entry point AND before @compute @workgroup_size(256, 1, 1) Shader as below is workable.

...
        var<workgroup> sharedBuf : array<f32, 2>;
        fn main(index : i32){
          // Fill in the shared memory buffer.
          let localIndex = i32(localId.x);
          if(localIndex < 2) {
            sharedBuf[localIndex] = f32(B[localIndex]);
          }
          workgroupBarrier();

          if(index < uniforms.size) {
            let coords = getCoordsFromIndex(index);
            let a = getAByOutputIndex(index);
          let b = sharedBuf[coords[1]];
            setOutputAtIndex(index, binaryOperation(a, b));
          }
        }


  @compute @workgroup_size(256, 1, 1)
  //@compute @workgroup_size(workGroupSizeX, workGroupSizeY, workGroupSizeZ)

        fn _start(@builtin(local_invocation_id) LocalId : vec3<u32>,
                  @builtin(global_invocation_id) GlobalId : vec3<u32>,
                  @builtin(num_workgroups) NumWorkgroups : vec3<u32>) {
          localId = LocalId;
          globalId = GlobalId;
          numWorkgroups = NumWorkgroups;
          main(getGlobalIndex());
        }

Maybe this is a bug for Mozilla's Naga? I think we call _start (entry point) after the declarations of main and _start, which should be fine. At least Tint also thinks it's OK. @haoyunfeix or @vicary , could you please also file a bug to them?

vicary · 2022-09-30T09:14:14Z

Issue created. I don't have time to dig into naga yet. If it turns out lexical scoping is the root cause, it may result in a breaking change of theirs and we may not see it happens as soon as const.

Interpreters with function scoping should be compatible with lexical scoping, moving fn main upwards should increase compatibility without apparent down sides.

What do you think? @gyagp @haoyunfeix

FIXES tensorflow#6842 To support shader translation library which does not implement module scoping like naga

google-ml-butler · 2022-10-13T06:42:21Z

Are you satisfied with the resolution of your issue?
Yes
No

…6918) * [webgpu] Update shader to support non module-level scoping function FIXES #6842 To support shader translation library which does not implement module scoping like naga * Unify kerels use main() to generate user function and getStartHenderString() to make entry point function * Use isFlatPatchLayout to determine main header And address comments * remove unnecessary scope checking

haoyunfeix · 2022-10-13T09:03:50Z

Interpreters with function scoping should be compatible with lexical scoping, moving fn main upwards should increase compatibility without apparent down sides.

What do you think? @gyagp @haoyunfeix

@vicary Done with #6918, for shader compiler issues 1 and 2 mentioned above we intend not to fix in TFJS but track them on naga project(gfx-rs/naga#2071 and gfx-rs/naga#2080).

BTW, I indeed try to fix 1 and 2 in TFJS(https://github.com/tensorflow/tfjs/compare/master...haoyunfeix:tfjs:test_6842?expand=1) and glad to see WebGPU on deno could get the same result as CPU backend. I posted all resources(updated webgpu build and test code) here in case you are interested in.

upload.zip

vicary added the type:bug Something isn't working label Sep 20, 2022

rthadur assigned haoyunfeix Sep 21, 2022

vicary mentioned this issue Sep 30, 2022

[wgsl-in] allow functions to be declared literally after entry point gfx-rs/naga#2072

Closed

This was referenced Oct 9, 2022

[webgpu] Update shader to support non module-level scoping function #6918

Merged

workgroup_size attribute does not support variable expression gfx-rs/naga#2080

Closed

haoyunfeix added a commit to haoyunfeix/tfjs that referenced this issue Oct 12, 2022

[webgpu] Update shader to support non module-level scoping function

1f5c878

FIXES tensorflow#6842 To support shader translation library which does not implement module scoping like naga

qjia7 closed this as completed in #6918 Oct 13, 2022

haoyunfeix mentioned this issue Dec 9, 2022

tfjs-models: make pose-detection deno compatible #6746

Closed

vicary mentioned this issue Jan 29, 2023

Update wgpu for TersorflowJS WebGPU support denoland/deno#17580

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

vicary commented Sep 20, 2022 •

edited

Loading

qjia7 commented Sep 21, 2022

haoyunfeix commented Sep 21, 2022

vicary commented Sep 21, 2022 •

edited

Loading

vicary commented Sep 29, 2022

haoyunfeix commented Sep 29, 2022

qjia7 commented Sep 30, 2022

qjia7 commented Sep 30, 2022

vicary commented Sep 30, 2022

qjia7 commented Sep 30, 2022

vicary commented Sep 30, 2022 •

edited

Loading

haoyunfeix commented Sep 30, 2022

haoyunfeix commented Sep 30, 2022

haoyunfeix commented Sep 30, 2022

gyagp commented Sep 30, 2022

vicary commented Sep 30, 2022

google-ml-butler bot commented Oct 13, 2022

haoyunfeix commented Oct 13, 2022

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

Comments

vicary commented Sep 20, 2022 • edited Loading

qjia7 commented Sep 21, 2022

haoyunfeix commented Sep 21, 2022

vicary commented Sep 21, 2022 • edited Loading

vicary commented Sep 29, 2022

haoyunfeix commented Sep 29, 2022

qjia7 commented Sep 30, 2022

qjia7 commented Sep 30, 2022

vicary commented Sep 30, 2022

qjia7 commented Sep 30, 2022

vicary commented Sep 30, 2022 • edited Loading

haoyunfeix commented Sep 30, 2022

haoyunfeix commented Sep 30, 2022

haoyunfeix commented Sep 30, 2022

gyagp commented Sep 30, 2022

vicary commented Sep 30, 2022

google-ml-butler bot commented Oct 13, 2022

haoyunfeix commented Oct 13, 2022

vicary commented Sep 20, 2022 •

edited

Loading

vicary commented Sep 21, 2022 •

edited

Loading

vicary commented Sep 30, 2022 •

edited

Loading