Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
161 changes: 161 additions & 0 deletions rfcs/0022-tir-non-scalar-constants.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@

- Feature Name: tir_non_scalar_constants
- Start Date: 2021-06-01
- RFC PR: https://github.com/apache/tvm-rfcs/pull/22
- GitHub Issue: TBD

# 1. Summary

This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process.

# 2. Motivation

Currently, the non-scalar constants are represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants.

Few example scenarios as further motivation :

## Weight compression

When lowering for accelerators (See [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this.

## Memory Planning

The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc because the memory for constants becomes visible for the memory planner. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with target-specific constants in it.

## Winograd Constants

The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code. See https://github.com/apache/tvm/blob/9df2ae8eaa8b394013182a7ad09ac57fe401f80e/python/tvm/topi/utils.py#L320-L350.


# 3. Guide-level explanation

This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Intially, tir.allocate_const nodes will only be created during scheduling when -link-params is included in the Target (e.g. to relay.build and to TVMC).

# 4. Reference-level explanation

The proposal is quite simple and it could be explained as follows :

```
@tvm.script.tir
def myfunc():
param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
```

This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worthwhile to discuss a bit more about the semantics :-)

Will the constants be allocated on stack or on heap? Is this designed for small matrices (e.g. the small matrix in winograd), or relatively larger matrices (e.g. the weight that needs prefetching)? How will lowering and code generation be affected? Does it work for GPU and other devices? How does it affect linkers' job?

Copy link
Contributor Author

@manupak manupak Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think where constants are allocated need to be decided here based on how we decides to represent constants in TIR.

The proposal here, is to represent constants intimately in the TIR primfunc, that could be lowered in a way each target wants them lowered (if they support it).

Will the constants be allocated on stack or on heap?

So currently where we want to 'link' in the parameters, they will generated part of the runtime.Module and linked via the executor : https://github.com/apache/tvm/pull/6917/files.
For CPU targets, they will go to a .rodata section, where the constants are held to keep compatibility with linked param code generation of what exists today.

Not sure we want them in stack nor the heap, however, we might want them in different sections if the system has more non-volatile memories.

Is this designed for small matrices (e.g. the small matrix in winograd), or relatively larger matrices (e.g. the weight that needs prefetching)?

This is size agnostic, therefore I'd not expect a difference here.

How will lowering and code generation be affected?

This will only be supported (at least initially and agreed with @tqchen ) with targets that support code generations for constants (currently it uses link-params target argument). Therefore, if a target have this capability (and enabled in a given compilation flow), we go with assumption the target knows how to generate code for the constants to be used by the operator.

Does it work for GPU and other devices?

I dont think GPU is a target that support (currently) code generation for constants, therefore the constants will live it in the tvm_main function (as relay.Constants).

How does it affect linkers' job?

So there are mainly two ways linkers' job could be affected, AFAIK.

1.) If the code generation for constants is supported by the respective target, we'll assume the code will be generated with appropriate sections (if they are C-like) or consumed in any other artifact that expect the constants to be embedded. If the target support neither, then that target is not a target that requires constant 'link'ed to the TIR.

2,) if the USMP is invoked, in which case it will pool all the constants and pulled out of tvm_main and exposed to application layer.

For basic and most usecases, the constant pools will be generated in the metadata module (See U1 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md).

For certain use-cases (See U3 of https://github.com/apache/tvm-rfcs/blob/c520a3912279bcd3c432dafd4070661f614882cf/rfcs/0009_Unified_Static_Memory_Planning.md), this would be where the user writes the application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junrushao1994 does this answer your questions?

it does seem like we could specify placement of the parameters in terms of memory pool/scope in tir.allocate_const node if we wanted to, though. the main use case i could see for this is marking weights to be loaded into e.g. accelerator memory. i do think that sort of thing is related to this proposal, but i don't think anything here stops this feature from being implemented later on. so i think we can continue with the proposal as written

Copy link
Member

@junrushao junrushao Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering my questions! It is super helpful for me to understand the motivation and scope of this RFC :-)

I like the idea of having tir.allocate_constant, which I dreamed of before, and of course am not going to be against it. I was just really cautious when extending an IR with new nodes, because it usually means breaking most of the passes which usually don't handle unknown node types, and this might lead to the impact to the overall TVM stack, as well as compilation workflow for various different architectures and usecases.

Here are my thoughts (I'm sure I am limited in certain aspects, so please feel free to disagree and propose alternatives):

A. To be consistent with the unified IR effort, what about we use the IRModule-level attributes that @electriclilies is pushing for recently, and put those parameters as IRModule attributes. It could be directly translated into the .rodata section in LLVM-based codegen, which both CPU target and CUDA host target use. We can do some small refactors to make both Relay and TIR level tensor constants into using IRModule attributes in a unified approach.

B. tir.allocate_constant needs proper support by the TVM script parser and printer. Otherwise, the roundtrippable workflow will be broken after this node is introduced. In this case, we might need to think of supporting parsing tensors (or meta info of tensors if the actual parameters are not linked in yet).

C. On architectures other than CPU, like GPUs, my quick thoughts are: at least we can lower this into a call that copies chunk of data onto GPUs using TVM's C API. Let me know if it makes sense.

D. If I understand correctly, because the data is generated into the .rodata section, which is loaded directly to certain memory, so it's not going to be on the stack. Also there is probably zero copy mechanism so that we don't need to waste any extra copy - which sounds cool to me.

E. On winograd weights, my second thought is that if we are able to identify if a buffer is constant in tir.Load, then we can actually simplify the tir.Load into its corresponding scalar constant - which IMO perfectly resolves the question I asked above. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is more like a summary of your thoughts let me answer below as a main comment.


There are mainly two ways of constants being created in the lowering :

A1. Linking the params of the model (relay.Constants -- currently, the model params would be in Relay as relay.Constant nodes)

A2. Creation/Mutation of constants in the lowering -- these maybe different to the original constants prior to scheduling the Relay into TIR.

For A1, this should only be done if the target support codegeneration of the constant data (i.e. support --link-params) as part of the operator runtime.Module. Therefore, this is executor independent.

For A2, the lowering for targets that support constant as part of the operators, there can be new (differently sized) constants could be created due to optimizations such as weight compression as required by the target.


### IRNode Definition

```
class AllocateConstNode : public StmtNode {
public:
/*! \brief The buffer variable. */
Var buffer_var;
/*! \brief The optional data associated to the constant.
This is mutually exclusive to irmod_storage_idx.
*/
Optional<NDArray> data;
/*! \brief If the PrimFunc containing the Stmt is added to IRModule,
this is an optional index to indicate the index within
"Constants" attribute, that is a Array<NDArray> of IRModule.
*/
Optional<Integer> irmod_storage_idx;
Copy link
Contributor

@mbs-octoml mbs-octoml Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Suggest we make data and irmod_storage_idx mutually exclusive. I assume you'll want a Pass to hoist (and share?) the data into the IRModule "Constants" attribute, in which case you'll rewrite from data to irmod_storage_idx.

This is almost identical to how the Relay parser uses the 'meta' syntax and MetaTable to refer to and resolve relay.Constants, except:

  • The array is keyed by 'relay.Constant'.
  • The array holds Relay Constants, not NDArrays.
  • The MetaTable is just to help parsing and is discarded.

This suggest we should similarly move the contents of the MataTable into IRModule attributes (keyed by "Constants"), and allow a Relay Constant to similarly represent the NDArray immediately or via an index.

If that were already in place then I'd suggest we replace data and irmod_storage_idx with just a Relay Constant so that constants can easily make the hop from Relay to TIR. Could you capture that under the alts considered so I don't forget it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mbs-octoml ,

Thanks for taking a look.

I've made a change to make it mutually exclusive.

I think the world in which we want to move to is to hold NDArrays and cross refer them from both Relay and TIR.
Therefore, it would be a bit confusing to discuss about relay.Constant in this RFC as we taking a 'future' precedence on what relay.Constants should become. Therefore, I'd suggest we discuss that in the future RFC which we make the relay.Constants actually work like that.

It would be great to finalize this -- so we can amend the PRs.

/*! \brief The type of the buffer. */
DataType dtype;
/*! \brief The extents of the buffer. */
Array<PrimExpr> extents;
/*! \brief The body to be executed. */
Stmt body;
/*! \brief The constructor with data. */
}

// The constructor to create a IRNode with constant data
// depending on the type of ObjectRef, it will either
// create AllocateConstNode with irmod_storage_idx or data
AllocateConst(Var buffer_var,
DataType dtype,
Array<PrimExpr> extents,
ObjectRef data_or_idx,
Stmt body,
Span span);
```


### Storage of constants

Due to concerns of future expansions of centralized storage of constants and adding alternate methods to parse in constants (other than parsing TVMScript), we have decided to store the constants as an IRModule attribute, *when the PrimFunc is added to the IRModule*.

This will go as a "Constants" key in the DictAttrs where the value is a Array\<NDArray>. However, they are only meant to be accessed via tir.allocate_const(...) nodes in TIR.


* If the constants are created in within passes, the IRModule::Add(...) for a PrimFunc needs to traverse the Stmts to pick the NDArray, add it "Constants" IRModule attribute (Array\<NDArray>) and populate *irmod_storage_idx*.

* If the constants are present in IRModule prior to the PrimFunc is created, then the ObjectRef (for NDArray) and the index of constants in "Constants" IRModule attribute (Array\<NDArray>) has to be populated.




# 5. Drawbacks

* Not all targets need/benefit from handling codegeneration differently for constants.

If we have to 'link' constants to TIR all the time, there might need a subsequent pass to pull them out. However, its clearer if we just 'link' constants where the target supports and benefits of having them expressed in TIR.

* The IRModule::Add(...) for TIR PrimFuncs need to traverse the statements to add the constants to IRModule if they are not originally referencing the constants present in the IRModule "Constants" attribute.

# 6. Alternatives and Discussion

## Different way of representations

This is initiated from the discussion on [#8472](https://github.com/apache/tvm/pull/8472).

C1 :
```
@tvm.script.tir
def myfunc():
tir.attrs({
"link_params": {"model0": array}
})
my_param_var = tir.get_link_param("model0")
```
C2 :
```
@tvm.script.tir
def myfunc():
tir.attrs({
"link_params": {my_param_var: array}
})
```
C3 :
```
@tvm.script.tir
def myfunc():
param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10])
```

C1 and C2 does not need an addition of IR node, however, needs special handling in the passes to figure out whether its a constant.

C3 adds a new IR node, but seems straight-forward way to represent constants near to the compute.

## Different IR node names

D1 : tir.constant
D2 : tir.allocate_const

D1 matches more with relay.Constant and D2 shows the similiarity to tir.allocate node, difference being that the data is constant.