Skip to content

Commit 6c35f3f

Browse files
[TIR][Schedule] Scoped CacheRead/Write producing compact region
This PR enhances CacheRead/Write so that when a cache operation is performed under an inner block, the generated cache buffer will have the shape as compact as possible, by region consumption analysis. The motivation of this change comes from the needs of dynamic shape TIR scheduling, in which case we may isolate a "static shape" internal block using blockize, and do further scheduling inside the internal block. For such cases, the current CacheRead/Write inside the static-shape block will still produce dynamic-shape cache buffers, which is not ideal for analysis and subsequent scheduling. One thing that worths noting is that, to ensure the IR correctness after inserting the cache block, we will only compact the cache buffer when all the consumer blocks of the read buffer (for CacheRead) or the write buffer (for CacheWrite) are children blocks of the cache block insertion location. Otherwise we will insist allocating the full-size cache buffer. Co-authored-by: Bohan Hou <[email protected]>
1 parent 516c56b commit 6c35f3f

File tree

3 files changed

+403
-73
lines changed

3 files changed

+403
-73
lines changed

src/tir/schedule/primitive.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ TVM_DLL std::vector<int64_t> SamplePerfectTile(
105105
* The sampled tile size will be partitioned into two parts. The second part has a guarantee
106106
* that their extent's product have a factor of `innerpart_factor`. The first part is loops at
107107
* [0, partition_pos); the second part is loops at [partition_pos, n) and we will have
108-
* `innerpart_factor` | \prod_{l=partition_pos}^{n-1} l.extent
108+
* `innerpart_factor` | prod_{l=partition_pos}^{n-1} l.extent
109109
*
110110
* \param rand_state The random state
111111
* \param extent The loop extent to be tiled
@@ -123,7 +123,7 @@ TVM_DLL std::vector<int64_t> SamplePartitionedTile(
123123
* The sampled tile size will be partitioned into two parts. The second part has a guarantee
124124
* that their extent's product have a factor of `innerpart_factor`. The first part is loops at
125125
* [0, partition_pos); the second part is loops at [partition_pos, n) and we will have
126-
* `innerpart_factor` | \prod_{l=partition_pos}^{n-1} l.extent
126+
* `innerpart_factor` | prod_{l=partition_pos}^{n-1} l.extent
127127
*
128128
* \param rand_state The random state
129129
* \param loop_sref The loop to be tiled

0 commit comments

Comments
 (0)