-
Notifications
You must be signed in to change notification settings - Fork 52
[0052] Experimental DXIL Ops #698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
bce13da
[xxxx] Experimental DXIL
V-FEXrt 7e09e2e
Address comments
V-FEXrt b3aab53
Update with examples
V-FEXrt 527a339
Apply suggestions from code review
V-FEXrt fe4dfac
address comment
V-FEXrt 6dfa23c
Address comments
V-FEXrt 36cdeb0
word wrap
V-FEXrt d95bb70
Reserve proposal number
V-FEXrt 9a9b372
Reserve proposal number, but different
V-FEXrt 72fd31d
Update proposals/0052-experimental-dxil-ops.md
V-FEXrt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,283 @@ | ||
| --- | ||
| title: "XXXX - Experimental DXIL" | ||
| params: | ||
| authors: | ||
| - V-FEXrt: Ashley Coleman | ||
| - llvm-beanz: Chris Bieneman | ||
| - tex3d: Tex Riddell | ||
| sponsors: | ||
| - V-FEXrt: Ashley Coleman | ||
| status: Under Consideration | ||
| --- | ||
|
|
||
| * Planned Version: SM 6.10 | ||
|
|
||
| ## Introduction | ||
|
|
||
| This proposal introduces a method for denoting and tracking experimental dxil | ||
| operations that minimize churn when an operation is rejected or delayed to a | ||
| later DXIL version. | ||
|
|
||
| ## Motivation | ||
|
|
||
| During iterative development of the shader compiler it is beneficial to | ||
| implement real lowering into real opcodes to validate that a proposal actually | ||
| solves real world use cases. Traditionally this has been done by adding new | ||
| opcodes right after the last released opcode in the prior DXIL version. | ||
| In some cases this is sufficient, when feature development is unified, opcodes | ||
| don't change after being added, and all opcodes in a contiguous block starting | ||
| from the prior release are accepted into the next release. | ||
| But challenges arise during parallel feature development, from experimental | ||
| feature evolution requiring opcode changes, or when a feature and its opcodes | ||
| are excluded from the release while the opcodes following them are not. | ||
| Excluded opcodes must either be turned into reserved opcodes or a breaking DXIL | ||
| change must be synchronized between the compiler, tests, and drivers. | ||
| This proposal seeks to implement a systematic method to handle these issues. | ||
|
|
||
| ## Goals | ||
| This proposal seeks to address the following points: | ||
| * Needless churn when experimental op are delayed or rejected | ||
| * Experimental feature boundaries are rigid and unaffected by SM updates | ||
| * Enable long term experiments and potentially extensions | ||
| * Focused on core api system (hlsl instrinsics and DXIL ops) | ||
| * Works within the current intrinsics/DXIL op mechanisms | ||
| * Minimizes overall changes to the system and IHV processes | ||
| * Straightforward transition route from experimental to stable | ||
| * Soft transitions between versions of experimental ops and final ops simplify migrations | ||
| * IHV drivers can support multiple experimental versions and the final version of a set of ops in the same driver | ||
|
|
||
| ## Non-goals | ||
| Future proposals may address the topics below but this proposal seeks to be a | ||
| smaller isloated change. It intends to solves immediate term challenges | ||
| without investing significant engineering efforts into a generalized solution. | ||
| That said, an attempt is made to avoid proposals that preclude a generalized | ||
| solution. Thus this proposal explicitly avoids addressing these issues: | ||
| * Full scale generalized extension system | ||
| * Process development to enable asynchronous non-colliding development | ||
| * Metadata/RDAT/PSV0/Custom lowering are out of scope for this document | ||
|
|
||
|
|
||
|
|
||
| ## Existing DXIL Op and HLSL Intrinsic Infrastructure | ||
|
|
||
| In DXC, there exists a large amount of infrastructure for handling DXIL ops as special types of functions throughout the compiler. From definition to lowering to passes to validation and consumption, any solution that doesn't fit into this system will face significant challenges from development through to the transition of operations from experimental to final official DXIL ops in a shader model, both in the compiler and in drivers consuming the ops. | ||
|
|
||
| There is also a high-level intrinsic system which uses its own set of opcodes in the generated enum: `hlsl::IntrinsicOp`. Though these are internal to the DXC compilation pipeline, stability of these opcodes impacts any tests with high-level IR, such as tests for lowering. | ||
|
|
||
| This section outlines key areas of this system for clarity in reasoning about solutions. | ||
|
|
||
| ### DXIL Op Definitions | ||
|
|
||
| DXIL Ops are defined in `hctdb.py`, which is used by `hctdb_instrhelp.py` to generate header and cpp code used directly by drivers to consume the operations, as well as generate a variety of other code for the compiler, validator, DXIL spec, etc... | ||
|
|
||
| `DxilOperations.h/cpp` implements the core of the system for handling DXIL operations in a DxilModule. | ||
|
|
||
| DXIL OpCodes, which are always passed as a literal in the first argument of a DXIL operation call, are a contiguous set of values starting at 0, such that they may be used to directly index a table of opcode definitions at the core of this infrastructure. This OpCode argument in the DXIL Op call is the sole identifier of the operation being called. Function names reflect OpCodeClass and overloads, but this is only a means to prevent collisions between functions used by operations requiring different signatures and attributes. | ||
|
|
||
| The contiguous nature of DXIL OpCodes used to index into a table is the first key hurdle in defining experimental ops. If an operation at a particular index is changed in any significant way, the interpretation of IR across that change boundary produces undefined behavior (crash if you're lucky), with no automatic mechanism to guard against this. | ||
|
|
||
| ### HLSL IntrinsicOp definitions | ||
|
|
||
| Intrinsic operations are normally defined in `gen_intrin_main.txt`, which is parsed by `hctdb.py` and used by `hctdb_instrhelp.py` to generate the `hlsl::IntrinsicOp` enum, and a bunch of tables used by custom intrinsic overload handling code in `Sema.cpp`. | ||
|
|
||
| There is infrastructure that tracks previously assigned HL op indices by intrinsic name in `hlsl_intrinsic_opcodes.json`. This can be a merge conflict point between any parallel feature development. | ||
|
|
||
| While indices are separated between functions and methods, all functions or all methods with the same name will share the same HL opcode. Generally this isn't a problem as the arguments (which would include an object) allow you to differentiate things when handling opcode calls. Recently a `class_prefix` attribute was added to the intrinsic definition syntax for `gen_intrin_main.txt` to prepend a class name, used for `DxHitObject`. This is just an example of how this system can be extended to separate out ops if necessary. | ||
|
|
||
| `HLOperationLower.cpp` uses a direct table lookup from the (unsigned) `IntrinsicOp` value to the lowering function and arguments. This creates another merge point for any experimental features (and potentially extensions), which integrate into the same intrinsic table. | ||
|
|
||
| There is an extension mechanism defined through the `IDxcLangExtensions` interface on the DXC compiler API object. It allows you to define a separate intrinsic table with predefined lowering strategies to produce extended ops as external function calls outside the recognized DXIL operations. It's meant to enable target extensions (extra intrinsics within certain limited definitional bounds) in HLSL for a custom backend. Modules using extensions wouldn't be accepted by the DXIL validator (unmodified). The way extensions must be defined, used, and interpreted differs significantly from adding built-in HLSL intrinsics and DXIL operations, which means it will introduce significant burdens and limitations to initial op definitions, lowering and compiler interaction, and make the transition to final DXIL operations painful. For these reasons, I don't think we should consider this extension mechanism as part of our solution at this time. | ||
|
|
||
| While this document focuses on a solution for DXIL ops, the HL opcodes can lead to difficult conflicts between independent feature development branches as well. Avoiding these requires synchronizing `hlsl_intrinsic_opcodes.json` and pre-allocated lowering table entries in `HLOperationLower.cpp` in a common branch as a very first step whenever adding any new HLSL intrinsics. | ||
|
|
||
| ### IR Tests | ||
|
|
||
| Tests that contain DXIL, will have DXIL operation calls passing a literal `i32` OpCode value in as the first argument. If these opcodes are to change between experimental and final versions, there should be an easy way to update the tests accordingly. Same for any high-level IR for the IntrinsicOp numbers. | ||
|
|
||
| There are two places where hard-coded numbers appear in tests: source IR and FileCheck statements for checking output IR. | ||
|
|
||
| There isn't any known solution that doesn't involve a change to at least the DXIL OpCodes when transitioning from experimental to final DXIL ops. | ||
|
|
||
| That requires either updating these across all tests (potentially with scripted regex replacement - matching could be error-prone) or adding some tool (or tool option) to translate symbolic opcodes to literal numbers as a first step. | ||
|
|
||
| ### Summary of key elements a solution should address | ||
|
|
||
| - DXIL Op property table indexed by OpCode | ||
| - HLOperationLower table indexed by IntrinsicOp | ||
| - A way to update and deprecate experimental opcodes during development without a new opcode overlapping an old one, leading to undefined behavior in a driver if mismatched IR is used. | ||
| - A way for the same driver to accept multiple versions of ops without undefined behavior. | ||
| - A way to easily transition tests from experimental ops to final DXIL ops | ||
| - Potentially: A way to avoid some of the more difficult HL opcode conflicts between independent feature development branches | ||
| - Minimal, or ideally no, changes required to source code interacting with or consuming DXIL ops when transitioning from experimental to final ops. | ||
|
|
||
| ## Potential DXIL Op Solutions | ||
|
|
||
| ### Top 1 bit as "is experimental" flag | ||
|
|
||
| The top bit of all opcodes is a flag stating if the opcode is experimental. | ||
|
|
||
| No structural or shape changes to the DXIL occur, simply the fact that the opcode | ||
| has the high bit set informs that it is experimental. This makes it very easy | ||
| for the compiler and drivers to detect experimental opcodes. When an opcode is | ||
| transistioned to stable the opcode needs to be assigned a stable number. | ||
| This splits the 4 billion opcode space into two 2 billion partions. One for | ||
| stable one for experimental. | ||
|
|
||
| This is the simpliest proposal with the least invasive set of changes. | ||
V-FEXrt marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Pros: | ||
| * Very simple | ||
| * Quick to implement | ||
| * Could be implemented "by hand" today by hard coding opcodes | ||
| Cons: | ||
| * Not a solution for extensions | ||
| * transistion from experimental to stable isn't just unsetting the bit | ||
| * other stable ops may have already taken that number | ||
| * complicates the experimental->stable mapping | ||
|
|
||
| ### Top 8 bits as "opcode partition" value | ||
| This is pretty much identical to the 1 bit flag proposal except there are 256 | ||
| partitions with 16 million opcodes each. The key difference is that it unlocks | ||
| extension potential as extension developers such as IHVs could reserve a | ||
| partition for their own use without collision with other opcodes. | ||
|
|
||
| | Partition | Use | | ||
| |-----------|-----| | ||
| | 0 | stable | | ||
| | 1 | experimental | | ||
| | 2 | extension foo | | ||
| | .. | extension .. | | ||
| | 255 | extension 255 | | ||
|
|
||
|
|
||
| Pros: | ||
| * Fairly simple | ||
| * Quick to implement | ||
| * Enables basic opcodes extension system | ||
| Cons: | ||
| * transistion from experimental to stable isn't just clearing the partition | ||
| * other stable ops may have already taken that number | ||
| * complicates the experimental->stable mapping | ||
|
|
||
| ### Top 16 bits as "opcode partition" value | ||
| Identical concept as above but with 64k partitions, each with 64k opcodes. | ||
|
|
||
| ### Split the opcode in half | ||
| Lower 16 bits are the core/stable opcodes, Upper 16 bits are the experimental opcodes. | ||
|
|
||
| Gives 64k opcodes for stable then the upper 64k can either be chunked manually | ||
| leaving all number available for opcodes or it can be partitioned as 256 | ||
| chunks of 256 opcodes with the partition encoded into the opcode itself | ||
|
|
||
| Very similar concept as before but keeping track of opcodes is complicated. | ||
| Also enables a weird situation where two opcodes "could" be encoded into a | ||
| single value. | ||
|
|
||
| ### Introduce dx.opx.opcodeclass for experimental/extended ops | ||
| Denotes the experimental status in the actual opcode. Potentially doubles the | ||
| opcode space depending on implementation however it doesn't make the transistion | ||
| to stable any easier and complicated the integration with the current intrinsics | ||
| system. | ||
|
|
||
| Pros: | ||
| * Enables fairly robust extension system | ||
| * Doesn't consume large portions of the current opcode space | ||
| * obvious from reading the DXIL that experimental/extension is being used | ||
| Cons: | ||
| * transistion from experimental to stable isn't just dropping the `x` | ||
| * other stable ops may have already taken that number | ||
| * complicates the experimental->stable mapping | ||
| * Not well integrated into the current system, would require notable dev work | ||
| * Unclear how to allocate extension vs experimental ops in the opx space | ||
|
|
||
| ### Extension/Experimental Feature Opcode | ||
| Relaxing the restriction that DXIL opcodes are immediate constants would allow | ||
| a call that returns a value representing a special operation. The call creates | ||
| the value from a feature ID and feature local opcode. Unique-ify information | ||
| could be stored in the call directory or in metadata. | ||
|
|
||
| ```llvm | ||
| %feature_id = i32 123 | ||
| %cool_operation = i32 456 | ||
| %opcode = i32 dx.create.extensionop(%feature_id, %cool_operaton) | ||
| %result = i32 dx.op.binary(%opcode, %a, %b) | ||
| ``` | ||
|
|
||
| Pros: | ||
| * Enables vary robust extension system | ||
| * Doesn't consume any of the current opcode space | ||
| * Obvious from reading the DXIL that experimental/extension is being used | ||
| Cons: | ||
| * Transistion from experimental to stable is non trivial. [See here](####stabilizing-with-opcode-subsets) | ||
| * Not integrated into the current system, would require notable dev work | ||
| * Breaks a pretty fundamental DXIL assumption | ||
|
|
||
| ### Single Specific Experimental Opcode with varargs | ||
| A new opcode class `dx.op.extension` is introduced as a core stable opcode in | ||
| which named opcode subsets can be called directly. | ||
|
|
||
|
|
||
| ```llvm | ||
| %opcode_set = str "My Cool Experiment" | ||
| %opcode = i32 123 | ||
| %res = i32 dx.op.extension(i32 12345, %opcode_set, %opcode, operands...) | ||
| ``` | ||
|
|
||
| The opcode set name and specific opcode are just arbitrary values from other | ||
| parts of the compiled shader. | ||
|
|
||
| Pros: | ||
| * Doesn't consume any of the current opcode space | ||
| * Obvious from reading the DXIL that experimental/extension is being used | ||
| * Very flexible | ||
| * Maintains first args as immediate constant | ||
| * All the information is encoded in the call | ||
| Cons: | ||
| * Transistion from experimental to stable is non trivial. [See here](####stabilizing-with-opcode-subsets) | ||
| * Unclear how well the current system will handle varargs | ||
| * More complex to implement and integrate | ||
| * `dx.op.extension` will need to support any arbitrary overload | ||
|
|
||
| #### Stabilizing with opcode subsets | ||
| Some proposals in this doc create new opcodes sets that reuse existing numbers | ||
| nested under a set name or feature id. These proposals have a more complex route | ||
| for transistioning from experimental to stable. There are two potential routes | ||
| to be considered. | ||
|
|
||
| * Create a new stable opcode from scratch using the normal mechanisms that | ||
| currently exist then migrate lowering paths to use it | ||
| * Maintain a notion of experimental and non-experimental opcode subsets then | ||
| update the specific subset to no longer be considered experimental keeping | ||
| all lowering the same | ||
|
|
||
| The first option has a larger churn burden but maintains the status quo and keeps | ||
| the generated code relatively dense while the second option is likely the easiest | ||
| transistion system from any proposal in this document at the cost of code density | ||
| and introducing a second way for stable operations to exist in DXIL. | ||
|
|
||
| ## Potential HLSL Intrinsic Solutions | ||
| There are two types of intrinsic solutions that can be imagined. One where an | ||
| extension author provides external code that has a custom lowering to an | ||
| arbitrary extension DXIL op and one that is prebaked into the compiler and | ||
| conditional enabled/disabled as appropiate. | ||
|
|
||
| As HLSL intrinsics are more flexable and can be reordered/renamed without | ||
| burning some finite resource only the second type is being considered at the | ||
| moment. The first type falls under "general purpose extension system" which is | ||
| out of scope for this document. | ||
|
|
||
| Intrinsic functions should be handled in a reasonable way. Ideally this means | ||
| that an intrinsic is only available if the experimental/extension op is also | ||
| available. Likely this means updating gen_intrin_main to mark an intrinsic as | ||
| experimental/extension then generating code that errors if it used in a | ||
| non-experimental/non-extension environment. But that is subject to change based | ||
| on the DXIL solution chosen. Once a proposal is selected this section will be | ||
| updated to reflect that. | ||
|
|
||
| ## Outstanding Questions | ||
|
|
||
| * Should DXC have some kind of --experimental flag that turns on/off | ||
| experimental intrinsics and DXIL ops? | ||
| * Related, when/how are experimental ops exposed in the compiler, when are they | ||
| errors to use? | ||
| * Should the validator warn on experimental op usage? | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.