Skip to content

Conversation

@junrushao
Copy link
Member

In TE, a unit loop could be introduced by fusing an empty list of loops on a stage. This PR adds its counterpart in TIR, while being a bit more explicit with a new schedule primitive which adds a unit loop without impacting any existing functionalities.

@junrushao junrushao merged commit 9d2c9a7 into apache:main Jun 5, 2022
junrushao added a commit to junrushao/tvm that referenced this pull request Jun 5, 2022
Following apache#11575, this PR allows CUDA thread binding for TIR programs
like

```python
@T.prim_func
def zero_dim_add(
    A: T.Buffer[(), "float32"],
    B: T.Buffer[(), "float32"],
    C: T.Buffer[(), "float32"],
) -> None:
    with T.block("C"):
        vi = T.axis.spatial(1, 0)
        C[()] = A[()] + B[()]
```

where there is no loop available to be bound to threadIdx/blockIdx.
spectrometerHBH pushed a commit that referenced this pull request Jun 5, 2022
Following #11575, this PR allows CUDA thread binding for TIR programs
like

```python
@T.prim_func
def zero_dim_add(
    A: T.Buffer[(), "float32"],
    B: T.Buffer[(), "float32"],
    C: T.Buffer[(), "float32"],
) -> None:
    with T.block("C"):
        vi = T.axis.spatial(1, 0)
        C[()] = A[()] + B[()]
```

where there is no loop available to be bound to threadIdx/blockIdx.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants