diff --git a/CMakeLists.txt b/CMakeLists.txt index 448f0e3937..e16ab3f192 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -10,6 +10,11 @@ option(TTMLIR_ENABLE_RUNTIME "Enable runtime" OFF) option(TTMLIR_ENABLE_STABLEHLO "Enable StableHLO support" OFF) option(TTMLIR_ENABLE_OPMODEL "Enable OpModel support" OFF) option(TTMLIR_ENABLE_SHARED_LIB "Enable Shared lib building" ON) +option(TTMLIR_ENABLE_DEBUG_STRINGS "Enable debug strings in flatbuffer" ON) + +if (TTMLIR_ENABLE_DEBUG_STRINGS) + add_compile_definitions(TTMLIR_ENABLE_DEBUG_STRINGS) +endif() if (NOT TTMLIR_ENABLE_RUNTIME) set (TTMLIR_ENABLE_SHARED_LIB OFF) diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md index 64a132680e..fcc96410d6 100644 --- a/docs/src/SUMMARY.md +++ b/docs/src/SUMMARY.md @@ -11,7 +11,7 @@ - [ttmlir-translate](./ttmlir-translate.md) - [ttrt](./ttrt.md) - [tt-explorer](./tt-explorer.md) - - [API](./tt-explorer-api.md) + - [Usage & API](./tt-explorer-usage-api.md) - [Roadmap](./tt-explorer-roadmap.md) - [Flatbuffers](./flatbuffers.md) - [CI](./ci.md) diff --git a/docs/src/tt-explorer-api.md b/docs/src/tt-explorer-api.md deleted file mode 100644 index 9987c6bbcd..0000000000 --- a/docs/src/tt-explorer-api.md +++ /dev/null @@ -1,171 +0,0 @@ -# TT-Explorer - -The following is a listed reference for the API in using TT-Explorer, check the TT-Adapter API reference below. - -# `TTExplorer` -## Overview -The `TTExplorer` class is responsible for interacting with the model_explorer server, including uploading models, initializing settings, and executing models. - -## Initialization -### **`__init__(self, port=8080, url="http://localhost", server=False, config=None)`** -Initializes the TTExplorer instance. - -- Parameters: - - `port (int)`: The port number for the model_explorer server. Default is 8080. - - `url (str)`: The base URL for the model_explorer server. Default is `"http://localhost"`. - - `server (bool)`: Flag to indicate if the server should be created. If this is set to true, ensure an environment where the `ttrt` and `ttmlir` python bindings is used. Default is False. - - `config (dict)`: Configuration for the model_explorer server. Default is None. - -## Methods -### `get_model_path(self, file) -> str` -Uploads a model file to the model_explorer server and returns the temporary path provided by the server. -- Parameters: - - `file (file-like object)`: The model file to be uploaded. -- Returns: - - `str`: The temporary path of the uploaded model file. - -### **`initialize(self, settings={})`** -Initializes the server-side `TT-Explorer` by assigning a System Descriptor for future operations, **needed** to execute models. - -- Parameters: - - `settings (dict)`: Settings for initialization, currently none. Default is an empty dictionary. -- Returns: - - `dict`: dict with `system_desc_path` key pointing to server-path to System Descriptor - -### **`execute_model(self, model_path: str, settings={})`** -Executes a model on the model_explorer server with the provided settings. - -- Parameters: - - `model_path (str)`: Server path to `ttir` module to be executed, ensure that module has been uploaded first. - - `settings (dict)`: Settings for execution. Default is an empty dictionary. - - `"ttir_to_ttnn_options": List[str]` Pipeline options to be fed into `ttir-to-ttnn-backend-pipeline`'s String Parser - - `"artifact_dir": str(Path)` A valid Server-Path to store artifacts from execution, if this flag is set then artifacts are not automatically deleted after execution is complete. -- Returns: - - `dict`: Relevant emitted files from Execution - - `"log_file": str`: Log-File from `ttrt perf` call - - `"stdout": str`: STDOUT from `ttrt perf` call, utf-8 decoded. - - `"perf_trace": str`: CSV Performance Trace from module run. - -## Example Usage - -```py -# Initialize TTExplorer -explorer = TTExplorer(server=True) -# Explorer instance now running on thread on http://localhost:8080 -# Make sure you wait until the thread has started the Flask server, you can check by viewing STDOUT. - -# Upload a model file -file = open('my-module.mlir', 'r') -model_path = explorer.get_model_path(file) -# Since local==server, the model_path is moved to a tempfile on the same machine - -# Initialize the SystemDesc on Machine for execution purposes -explorer.initialize() - -# Execute the model, store artifacts permanently in home directory. -resp = explorer.execute_model(model_path, settings={'artifact_dir': '/home//ttrt-artifacts'}) - -csv = resp['perf_trace'] # Do with the CSV trace as you would like to view the performance results! -``` - -# TT-Adapter -The following is a reference for the "REST" API provided by TT-Adapter. First, a short info-dump on how an extensible API can be built on top of Model Explorer. - -## Building an API using Model Explorer -The `/apipost/v1/send_command` endpoint provides an extensible platform with which commands are sent to be executed directly by the adapter specified. This becomes the main endpoint through which communication is facilitated between the server and client, the commands respond with an "adapter response". - -### Sending Commands -The body of the command must be JSON, and only the following fields are fed into the adapter functions: -```js -cmd = { - "extensionId": "tt_adapter", // tt_adapter to invoke functions from TT-Adapter - "cmdId": "", // Name of function to be run, "convert" is built into all adapters to convert some model to graph - "modelPath": "", // Path to model on server to be fed into function - "deleteAfterConversion": False, // True if file at modelPath is to be deleted after function run - "settings": {...}, // Object holding custom settings to be fed into function -} -``` - -More often than not, functions do not need all of these fields, but they must all be present to properly process the command sent into the function. Speaking of function, the function signature that all commands have to follow is as such: - -```py -class TTAdapter(Adapter): - ... - def my_adapter_fn(self, model_path: str, settings: dict): - pass # Parse model_path and settings objects as they are fed from send_command endpoint. -``` - -This function is invoked and called from a new instance every time. This is important to understand for the idea of persisting information on the server. The onus is often on the end-user to store and preserve important information such as the path of a model they've uploaded, or the paths of important artifacts that the server has produced. `TTExplorer` aims to make this as easy as possible. - -Information can be processed in this function however the user would like to define, and often settings becomes a versatile endpoint to provide more information and context for the execution of some function. As an example, refer to `TTAdapter:initialize`, this function to load a SystemDesc into the environment has little to do with `modelPath` or `deleteAfterConversion`, as such these variables are not processed at all, and the function only executes a static initialization process regardless of the parameters passed into the command. - -### Adapter Response -Model Explorer was probably not made to allow for such an extensible framework to be tacked onto it. As such, the adapter response is processed in a very particular way before it is sent back to the user. In particular, refer to [`model_explorer.utils.convert_adapter_response`](https://github.com/google-ai-edge/model-explorer/blob/main/src/server/package/src/model_explorer/utils.py#L40) which is run on the output of every function. This means that responses can **only** be in JSON format and are constructed as: - -```js -{ - "graphs": [ - {/* response goes here */}, - ] -} -``` - -for custom adapter responses. This limits the transfer of raw bytes data through different MIME Types, and requires the `tt_adapter.utils.to_adapter_format` which turns any `dict` object into a model explorer adapter compatible response. While this framework works well for graphs, it makes an "extensible" API difficult to implement. - -## Current API Reference: - -### Initialize -Called from `TTExplorer.initialize`, used to Load SystemDesc into environment. -```js -cmd = { - "extensionId": "tt_adapter", - "cmdId": "initialize", - "modelPath": "", // Irrelevant - "deleteAfterConversion": False, - "settings": {}, // None at the moment -} - -// RESPONSE - -{"system_desc_path": ""} -``` - -### Execute -Called from `TTExplorer.execute_model`, executes a model. -```js -cmd = { - "extensionId": "tt_adapter", - "cmdId": "execute", - "modelPath": "", - "deleteAfterConversion": False, // Can be set to True if TTIR module is to be deleted after run - "settings": { - "ttir_to_ttnn_options": List[str], // Pipeline Options to feed into ttir_to_ttnn_backend_pipeline - "artifact_dir": str, // Path on server to store TTRT Artifacts to, artifacts are not deleted after perf if set. - }, -} - -// RESPONSE - -{ - "stdout": "", - "log_file": "", - "perf_trace": "", -} -``` - -### Convert -Standard built-in conversion function, converts TTIR Module into Model Explorer Graph. Also provides `settings` as a platform for overrides to be applied to the graph. - -```js -cmd = { - "extensionId": "tt_adapter", - "cmdId": "convert", - "modelPath": "", - "deleteAfterConversion": True/False, - "settings": {/* Overrides */}, // Undefined at the moment -} - -// RESPONSE - - -``` diff --git a/docs/src/tt-explorer-roadmap.md b/docs/src/tt-explorer-roadmap.md index e1d71cc168..596a4d1a28 100644 --- a/docs/src/tt-explorer-roadmap.md +++ b/docs/src/tt-explorer-roadmap.md @@ -22,12 +22,12 @@ The primary function of TT-Explorer is to visualize **and edit** the model accor Tasks: -- [ ] Flesh out and test locations ID such that operations can be tracked through the compiler stack. -- [ ] Use Loc IDs to bind TTIR Ops with Tracy Perf Trace Artifact, and send to Model-Explorer to visualize. -- [ ] Implement Overrides Functionality into TT-MLIR, tracking based on Loc IDs. -- [ ] Overhaul UI to enable editing node attributes, use these updated fields to send information back to TT-Explorer via REST API (in the form of an Overrides JSON) -- [ ] Parse Overrides JSON and apply Overrides over a REST API Call, visualize re-compiled graph now. -- [ ] Provide REST API endpoint to track “legal” configurations and provide “legal” options attached to Graph JSON. +- [x] ~~Flesh out and test locations ID such that operations can be tracked through the compiler stack.~~ +- [x] ~~Use Loc IDs to bind TTIR Ops with Tracy Perf Trace Artifact, and send to Model-Explorer to visualize.~~ +- [x] ~~Implement Overrides Functionality into TT-MLIR, tracking based on Loc IDs.~~ +- [x] ~~Overhaul UI to enable editing node attributes, use these updated fields to send information back to TT-Explorer via REST API (in the form of an Overrides JSON)~~ +- [x] ~~Parse Overrides JSON and apply Overrides over a REST API Call, visualize re-compiled graph now.~~ +- [x] ~~Provide REST API endpoint to provide “legal” options attached to Graph JSON.~~ # Milestone 3 (v0.3+) @@ -40,5 +40,5 @@ Tasks: - [ ] Begin researching autogenerated Python bindings for pipelines and transformations defined in C++. - [ ] Create modular frontend capabilities out of Flask app in Model-Explorer - [ ] Create a “mono-adapter” which holds the paths to invoke dialect-specific adapters for each dialect to be supported by TT-Explorer -- [ ] Begin adding new dialects like `.ttm`, `.ttnn` to Model Explorer so that complied results can be inspected and analyzed to optimize at different steps of the compiler. +- [x] ~~Begin adding new dialects like `.ttm`, `.ttnn` to Model Explorer so that complied results can be inspected and analyzed to optimize at different steps of the compiler.~~ - [ ] *To be defined later, depending on the growth of the MLIR Project* diff --git a/docs/src/tt-explorer-usage-api.md b/docs/src/tt-explorer-usage-api.md new file mode 100644 index 0000000000..59d26bfc0a --- /dev/null +++ b/docs/src/tt-explorer-usage-api.md @@ -0,0 +1,284 @@ +# TT-Explorer +This section provides a details about the usage of TT-Explorer. + +## CLI +The CLI for `tt-explorer` provides a simple suite of options to start the UI: + +```bash +tt-explorer -p -u -q +``` + +### Options: +- `-p, --port`: Port that model-explorer server will be exposed to. Default is 8080. +- `-u, --url`: Host URL Address for server. Default is "localhost". +- `-q, --no-browser`: Create server without opening a browser tab. + +Example usage: + +```bash +tt-explorer -p 8000 -u 0.0.0.0 -q +``` + +This command will start the TT-Explorer server on port 8000, accessible at the address 0.0.0.0, and without opening a browser tab. + +## UI +For general reference of the UI, refer to the [model-explorer wiki](https://github.com/google-ai-edge/model-explorer/wiki). This section will highlight specific UI elements added to the tenstorrent fork of model-explorer. + +### Model Execution +In the top right of the screen an additional element has been added to the top bar. It features the UI elements that invoke the execution functionality. + +#### Opt. Policy +This dropdown provides a list of **Optimization Policies** which will be used when the model is executed. These policies are applied when lowering from a `ttir` module to an executable `ttnn` module. + +#### "Upload" Button +Once Overriden Fields have been changed or modified, this button will be available to send the overrides to the backend. The overrides will then be processed and the module recompiled to include these new changes. + +#### "Play" Button +This button invokes the `execute` function which will compile and execute the model. The button will then be "loading" until execution is finished. Once execution is finished a performance trace should be overlayed on the graph and it should reload. + +#### "Comment" Button +This button will open a window to view the shell logs while exeuction is running. If any errors occur they will be displayed here. + +### Overriden Fields +Certain Nodes on the graph will have attributes that are presented as a dropdown. These are fields which have overrides available. This value can be changed and then sent to be recompiled, invalid configurations will result in errors. + +# TT-Adapter +The following is a reference for the REST API provided by TT-Adapter. First, a short info-dump on how an extensible API can be built on top of Model Explorer. + +## Building an API using Model Explorer +The `/apipost/v1/send_command` endpoint provides an extensible platform with which commands are sent to be executed directly by the adapter specified. This becomes the main endpoint through which communication is facilitated between the server and client, the commands respond with an "adapter response". + +### Sending Commands +The body of the command must be JSON, and conform to the following interface (described below as a [Typescript interface](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#interfaces)). Specific commands may narrow the field types or extend this interface. + +```typescript +interface ExtensionCommand { + cmdId: string; + extensionId: string; + modelPath: string; + settings: Record; + deleteAfterConversion: boolean; +} +``` + +More often than not, functions do not need all of these fields, but they must all be present to properly process the command sent into the handling function on the server. + +Speaking of function, the signature that all function that handle commands on the server have to follow is as such: + +```python +class TTAdapter(Adapter): + # ... + def my_adapter_fn(self, model_path: str, settings: dict): + # Parse model_path and settings objects as they are fed from send_command endpoint. + pass +``` + +This function is invoked and called from a new instance every time. This is important to understand for the idea of persisting information on the server. As all requests to the server are _stateless_, the onus is often on the end-user to store and preserve important information such as the path of a model they've uploaded, or the paths of important artifacts that the server has produced. `TTExplorer` aims to make this as easy as possible. + +Information can be processed in this function however the user would like to define, and often settings becomes a versatile endpoint to provide more information and context for the execution of some function. As an example, refer to `TTAdapter:initialize`, this function to load a SystemDesc into the environment has little to do with `modelPath` or `deleteAfterConversion`, as such these variables are not processed at all, and the function only executes a static initialization process regardless of the parameters passed into the command. + +#### Example request + +Below is an example of the JSON request sent from the UI to the server: + +```json +{ + // tt_adapter to invoke functions from TT-Adapter + "extensionId": "tt_adapter", + // Name of function to be run, "convert" is built into all adapters to convert some model to graph + "cmdId": "convert", + // Path to model on server to be fed into function + "modelPath": "/tmp/tmp80eg73we/mnist_sharding.mlir", + // Object holding custom settings to be fed into function + "settings": { + "const_element_count_limit": 16, + "edge_label_font_size": 7.5, + "artificial_layer_node_count_threshold": 1000, + "keep_layers_with_a_single_child": false, + "show_welcome_card": false, + "disallow_vertical_edge_labels": false, + "show_op_node_out_of_layer_edges_without_selecting": false, + "highlight_layer_node_inputs_outputs": false, + "hide_empty_node_data_entries": false + }, + // `true` if file at `modelPath` is to be deleted after function run + "deleteAfterConversion": true +} +``` + +### Adapter Response +Model Explorer was probably not made to allow for such an extensible framework to be tacked onto it. As such, the adapter response is processed in a very particular way before it is sent back to the user. In particular, refer to [`model_explorer.utils.convert_adapter_response`](https://github.com/google-ai-edge/model-explorer/blob/main/src/server/package/src/model_explorer/utils.py#L40) which is run on the output of every function. + +This means that for compatibility reasons (i.e. to not stray too much from the upstream implementation that we are based off of) responses sent from the server must be in JSON format **only** and wrap the data on a `graph` property. + +Below is the base typescript interface that the UI expects for the json response. Commands can define custom data _inside_ the `graph` property. + +```typescript +/** A response received from the extension. */ +interface ExtensionResponse< + G extends Array = Graph[], + E extends unknown = string +> { + graphs: G; + error?: E; +} +``` + +For custom adapter responses. This limits the transfer of raw bytes data through different MIME Types, and requires the `tt_adapter.utils.to_adapter_format` which turns any `dict` object into a model explorer adapter compatible response. While this framework works well for graphs, it makes an "extensible" API difficult to implement. + +## Current API Reference: + +### Convert +Standard built-in conversion function, converts TTIR Module into Model Explorer Graph. Also provides `settings` as a platform for overrides to be applied to the graph. +#### Request + +```typescript +// As this is the base request everything is based off, +// this interface only narrows down the command to be "convert". +interface AdapterConvertCommand extends ExtensionCommand { + cmdId: 'convert'; +} +``` + +#### Response +```typescript +// As this is the base response everything is based off, +// it is exactly the same as `ExtensionResponse`. +type AdapterConvertResponse = ExtensionResponse; +``` + +```json +{ + "graphs": [{ + // Model Explorer Graph JSON Object + }] +} +``` + +### Initialize +Called from `TTExplorer.initialize`, used to Load SystemDesc into environment. + +#### Request + +```typescript +interface InitializeCommand extends ExtensionCommand { + cmdId: 'initialize'; +} +``` + +#### Response + +```typescript +type AdapterInitializeResponse = ExtensionResponse<[{ + system_desc_path: string +}]>; +``` + +```json +{ + "graphs": [{ + "system_desc_path": "" + }] +} +``` + +### Execute +Called from `TTExplorer.execute_model`, executes a model. + +#### Request + +```typescript +interface AdapterExecuteCommand extends ExtensionCommand { + cmdId: 'execute'; +} +``` + +#### Response +```typescript +// When the request is successful, we don't expect any response back. +// Thus, an empty array is returned for `graphs`. +type AdapterExecuteResponse = ExtensionResponse<[]>; +``` + +```json +{ + "graphs": [] +} +``` + +### Status Check + +Called from `...`, it is used for checking the execution status of a model and update the UI accordingly. + +#### Request + +```typescript +interface AdapterStatusCheckCommand extends ExtensionCommand { + cmdId: 'status_check'; +} +``` + +#### Response +```typescript +type AdapterStatusCheckResponse = ExtensionResponse<[{ + isDone: boolean; + progress: number; + total?: number; + timeElapsed?: number; + currentStatus?: string; + error?: string; + stdout?: string; + log_file?: string; +}]>; +``` + +```json +{ + "graphs": [{ + "isDone": false, + "progress": 20, + "total": 100, + "timeElapsed": 234, + "stdout": "Executing model...\nPath: /path/to/model", + "log_file": "/path/to/log/on/the/server" + }] +} +``` +### Override + +Called from `...` to send overrides made through the UI to the server for processing. + +#### Request + +```typescript +interface KeyValue { + key: string; + value: string; +} + +interface AdapterOverrideCommand extends ExtensionCommand { + cmdId: 'override'; + settings: { + graphs: Graph[]; + overrides: Record; + }; +} +``` + +#### Response +```typescript +type AdapterOverrideResponse = ExtensionResponse<[{ + success: boolean; +}]>; +``` + +```json +{ + "graphs": [{ + "success": true + }] +} +``` diff --git a/docs/src/tt-explorer.md b/docs/src/tt-explorer.md index a8073d25be..2be4a13ed1 100644 --- a/docs/src/tt-explorer.md +++ b/docs/src/tt-explorer.md @@ -3,20 +3,18 @@ Welcome to the tt-explorer wiki! The Wiki will serve as a source for documentation, examples, and general knowledge related to the TT-MLIR visualization project. The sidebar will provide navigation to relevant pages. If this is your first time hearing about the project, take a look at Project Architecture for an in-depth introduction to the tool and motivations behind it :) ## Quick Start -TT-Explorer is made to be as painless as possible, as such the installation on top of the pre-existing [`tt-mlir`](https://github.com/tenstorrent/tt-mlir) project is as minimal as possible. - -1. Build `tt-mlir` -2. Run `source env/activate` to be in `tt-mlir` virtualenv for the following steps -3. Install [`tt-adapter`](https://github.com/vprajapati-tt/tt-adapter) using `pip install -e .` in tt-adapter root directory. -4. Install `tt-explorer` using `pip install -e .` in tt-explorer root directory -5. Run `tt-explorer` in terminal to start tt-explorer instance. (Refer to CLI section in API for specifics) -6. Ensure server has started in `tt-explorer` shell instance (check for message below) +TT-Explorer comes packaged as a tool in the `tt-mlir` repo. + +1. Run `source env/activate` to be in `tt-mlir` virtualenv for the following steps +2. Build `explorer` target in `tt-mlir` using `cmake --build build -- explorer` +3. Run `tt-explorer` in terminal to start tt-explorer instance. (Refer to CLI section in API for specifics) +4. Ensure server has started in `tt-explorer` shell instance (check for message below) ```sh Starting Model Explorer server at: http://localhost:8080 ``` -Visualizer tool for `ttmlir`-powered compiler results. Visualizes from emitted `.mlir` files to display compiled model, attributes, performance results, and provide a platform for human-driven overrides to _gameify_ model tuning. +Visualizer tool for `ttmlir`-powered compiler results. Visualizes from emitted `.mlir` files to display compiled model, attributes, performance results, and provides a platform for human-driven overrides to _gameify_ model tuning. ## TT-Explorer - Project Architecture @@ -60,9 +58,9 @@ TT-RT is the runtime library for TT-Forge, which provides an API to run Flatbuff **Ingests**: Flatbuffers **Emits**: Performance Trace, Model Results -#### [Model-Explorer](https://github.com/google-ai-edge/model-explorer) +#### [Model-Explorer](https://github.com/tenstorrent/model-explorer) -Model Explorer is the backbone of the client and visualization of these models. It is deceptively placed in the “Client” portion of the diagram, but realistically TT-Explorer will be run on the host, and so will the model-explorer instance. The frontend will be a client of the REST API created by TT-Adapter and will use URLs from the model-explorer server to visualize the models. +Model Explorer is the backbone of the client and visualization of these models. It is deceptively placed in the “Client” portion of the diagram, but realistically TT-Explorer will be run on the host, and so will the model-explorer instance. The frontend will be a client of the REST API created by TT-Adapter and will use URLs from the model-explorer server to visualize the models. Currently TT maintains a fork of model-explorer which has overriden UI elements for overrides and displaying performance traces. **Ingests**: Model Explorer Graph, User-Provided Overrides (UI), Performance Trace **Emits**: Overrides JSON, Model Visualization diff --git a/include/ttmlir/Dialect/TTIR/IR/TTIROps.td b/include/ttmlir/Dialect/TTIR/IR/TTIROps.td index 710f88cfef..3bfbe8c45a 100644 --- a/include/ttmlir/Dialect/TTIR/IR/TTIROps.td +++ b/include/ttmlir/Dialect/TTIR/IR/TTIROps.td @@ -897,6 +897,58 @@ def TTIR_Conv2dOp : TTIR_DPSOp<"conv2d"> { let hasVerifier = 1; } +def TTIR_ConvTranspose2dOp : TTIR_DPSOp<"conv_transpose2d"> { + let summary = "ConvTranspose2d operation."; + let description = [{ + Applies a 2D transposed convolution operator over an input image composed of several input planes. + + Inputs: + - `input` AnyRankedTensor: NHWC format (batch_size x height x width x channels) + - `weight` AnyRankedTensor: OIHW format (output_channels x input_channels x height x width) + - `bias` Optional: (1 x 1 x 1 x output_channels) + - `output` AnyRankedTensor: NHWC format (batch_size x height x width x channels) + + Attributes: + - `stride` (i32 | array): Controls the stride for the cross-correlation. + - `padding` (i32 | array): Controls the amount of implicit zero padding on both sides for dilation * (kernel_size - 1) - padding number of points. + - `output_padding` (i32 | array): Controls the additional size added to one side of the output shape. + - `dilation` (i32 | array): Controls the spacing between the kernel points + - `groups` i32: Controls the connections between inputs and outputs. Must be divisible by input and output channels. + + Example: + %input = tensor.empty() : () -> tensor<256x256x3x3xbf16> + %weight = tensor.empty() : () -> tensor<256x256x3x3xbf16> + %bias = tensor.empty() : () -> tensor<1x1x1x256xbf16> + %output = tensor.empty() : () -> tensor<1x10x10x256xbf16> + %0 = "ttir.conv_transpose2d"(%input, %weight, %bias, %output) + <{ + stride = = array, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32 + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$weight, + Optional:$bias, + AnyRankedTensor:$output, + AnyAttrOf<[I32Attr, DenseI32ArrayAttr]>:$stride, + AnyAttrOf<[I32Attr, DenseI32ArrayAttr]>:$padding, + AnyAttrOf<[I32Attr, DenseI32ArrayAttr]>:$output_padding, + AnyAttrOf<[I32Attr, DenseI32ArrayAttr]>:$dilation, + I32Attr:$groups); + + let results = (outs AnyRankedTensor:$result); + + let extraClassDeclaration = [{ + MutableOperandRange getDpsInitsMutable() { return getOutputMutable(); } + }]; + + let hasVerifier = 1; +} + def TTIR_ConvolutionOp : TTIR_DPSOp<"convolution"> { let summary = "Generalized convolution op."; let description = [{ diff --git a/include/ttmlir/Dialect/TTNN/IR/TTNNOps.td b/include/ttmlir/Dialect/TTNN/IR/TTNNOps.td index ba2484ac5f..5609810f12 100644 --- a/include/ttmlir/Dialect/TTNN/IR/TTNNOps.td +++ b/include/ttmlir/Dialect/TTNN/IR/TTNNOps.td @@ -861,6 +861,57 @@ def TTNN_Conv2dOp : TTNN_NamedDPSOp<"conv2d"> { let hasVerifier = 1; } +def TTNN_ConvTranspose2dOp : TTNN_NamedDPSOp<"conv_transpose2d"> { + let summary = "ConvTranspose2d operation."; + let description = [{ + Applies a 2D transposed convolution operator over an input image composed of several input planes. + + Inputs: + - `input` AnyRankedTensor: NHWC format (batch_size x height x width x channels) + - `weight` AnyRankedTensor: OIHW format (output_channels x input_channels x height x width) + - `bias` Optional: (1 x 1 x 1 x output_channels) + - `output` AnyRankedTensor: (1 x 1 x (batch_size * height * width) x channels) + + Attributes: + - `in_channels` i32: The number of input channels. + - `out_channels` i32: The number of output channels. + - `batch_size` i32: The batch size. + - `input_height` i32: The input height. + - `input_width` i32: The input width. + - `kernel_size` array: The kernel size. + - `stride` array: Controls the stride for the cross-correlation. + - `padding` array: Controls the amount of implicit zero padding on both sides for dilation * (kernel_size - 1) - padding number of points. + - `output_padding` array: Controls the additional size added to one side of the output shape. + - `dilation` array: Controls the spacing between the kernel points + - `groups` i32: Controls the connections between inputs and outputs. Must be divisible by input and output channels. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$weight, + Optional:$bias, + AnyRankedTensor:$output, + TT_Device:$device, + I32Attr:$in_channels, + I32Attr:$out_channels, + I32Attr:$batch_size, + I32Attr:$input_height, + I32Attr:$input_width, + DenseI32ArrayAttr:$kernel_size, + DenseI32ArrayAttr:$stride, + DenseI32ArrayAttr:$padding, + DenseI32ArrayAttr:$output_padding, + DenseI32ArrayAttr:$dilation, + I32Attr:$groups); + + let results = (outs AnyRankedTensor:$result); + + let extraClassDeclaration = [{ + MutableOperandRange getDpsInitsMutable() { return getOutputMutable(); } + }]; + + let hasVerifier = 1; +} + def TTNN_MaxPool2dOp : TTNN_NamedDPSOp<"max_pool2d"> { let summary = "Applies a 2D max pooling over an input signal composed of several input planes."; let description = [{ diff --git a/include/ttmlir/Target/TTNN/program.fbs b/include/ttmlir/Target/TTNN/program.fbs index 0838c629d0..442f9f7258 100644 --- a/include/ttmlir/Target/TTNN/program.fbs +++ b/include/ttmlir/Target/TTNN/program.fbs @@ -269,6 +269,25 @@ table Conv2dOp { groups: uint32; } +table ConvTranspose2dOp { + input: tt.target.TensorRef; + weight: tt.target.TensorRef; + bias: tt.target.TensorRef; + out: tt.target.TensorRef; + device: tt.target.DeviceRef; + in_channels: uint32; + out_channels: uint32; + batch_size: uint32; + input_height: uint32; + input_width: uint32; + kernel_size: [int32]; + stride: [int32]; + padding: [int32]; + output_padding: [int32]; + dilation: [int32]; + groups: uint32; +} + table MaxPool2dOp { in: tt.target.TensorRef; out: tt.target.TensorRef; @@ -346,6 +365,7 @@ union OpType { SoftmaxOp, TransposeOp, Conv2dOp, + ConvTranspose2dOp, ConcatOp, ReshapeOp, SliceOp, diff --git a/include/ttmlir/Target/Utils/FuncOpToProgram.h b/include/ttmlir/Target/Utils/FuncOpToProgram.h index a28f2f5e9a..8072ed0cad 100644 --- a/include/ttmlir/Target/Utils/FuncOpToProgram.h +++ b/include/ttmlir/Target/Utils/FuncOpToProgram.h @@ -25,10 +25,14 @@ struct Program { inline std::string getOpDebugString(mlir::Operation *op, OpPrintingFlags printFlags) { +#ifdef TTMLIR_ENABLE_DEBUG_STRINGS std::string str; llvm::raw_string_ostream os(str); op->print(os, printFlags); return str; +#else + return ""; +#endif }; inline std::string getOpLocInfo(mlir::Operation *op) { diff --git a/lib/Conversion/TTIRToTTNN/TTIRToTTNN.cpp b/lib/Conversion/TTIRToTTNN/TTIRToTTNN.cpp index dbd1e17e5c..16656df85b 100644 --- a/lib/Conversion/TTIRToTTNN/TTIRToTTNN.cpp +++ b/lib/Conversion/TTIRToTTNN/TTIRToTTNN.cpp @@ -12,6 +12,7 @@ #include "ttmlir/Dialect/TTNN/Types/Types.h" #include "ttmlir/Dialect/TTNN/Utils/TransformUtils.h" #include "ttmlir/Dialect/TTNN/Utils/Utils.h" +#include "ttmlir/Utils.h" #include "mlir/Dialect/Tensor/IR/Tensor.h" #include "mlir/IR/Attributes.h" @@ -26,6 +27,7 @@ #include "llvm/ADT/SmallVector.h" #include "llvm/Support/Casting.h" #include "llvm/Support/ErrorHandling.h" +#include #include @@ -893,6 +895,105 @@ class Conv2dOpConversionPattern : public OpConversionPattern { } }; +class ConvTranspose2dOpConversionPattern + : public OpConversionPattern { +public: + using OpConversionPattern::OpConversionPattern; + + LogicalResult + matchAndRewrite(ttir::ConvTranspose2dOp op, OpAdaptor adaptor, + ConversionPatternRewriter &rewriter) const override { + auto device = ::ttnn::utils::getOrInsertDevice(rewriter, op); + + auto inputTy = mlir::cast(adaptor.getInput().getType()); + auto kernelTy = mlir::cast(adaptor.getWeight().getType()); + auto outputTy = mlir::cast(adaptor.getOutput().getType()); + + std::function getLastDim = + [](const RankedTensorType &ty, int offset = 1) { + return ty.getShape()[ty.getRank() - offset]; + }; + + auto inChannelsAttr = rewriter.getI32IntegerAttr(getLastDim(inputTy, 1)); + auto outChannelsAttr = rewriter.getI32IntegerAttr(getLastDim(outputTy, 1)); + auto batchSizeAttr = rewriter.getI32IntegerAttr(getLastDim(inputTy, 4)); + auto inputHeightAttr = rewriter.getI32IntegerAttr(getLastDim(inputTy, 3)); + auto inputWidthAttr = rewriter.getI32IntegerAttr(getLastDim(inputTy, 2)); + + auto kernelSizeAttr = rewriter.getDenseI32ArrayAttr( + {static_cast(getLastDim(kernelTy, 2)), + static_cast(getLastDim(kernelTy, 1))}); + + auto strideAttr = attrToDenseI32ArrayAttr(adaptor.getStride(), rewriter); + if (auto error = strideAttr.takeError()) { + return LogicalResult::failure(); + } + + auto paddingAttr = attrToDenseI32ArrayAttr(adaptor.getPadding(), rewriter); + if (auto error = paddingAttr.takeError()) { + return LogicalResult::failure(); + } + + auto outputPaddingAttr = + attrToDenseI32ArrayAttr(adaptor.getOutputPadding(), rewriter); + if (auto error = outputPaddingAttr.takeError()) { + return LogicalResult::failure(); + } + + auto dilationAttr = + attrToDenseI32ArrayAttr(adaptor.getDilation(), rewriter); + if (auto error = dilationAttr.takeError()) { + return LogicalResult::failure(); + } + + auto groupsAttr = rewriter.getI32IntegerAttr(adaptor.getGroups()); + + // Transposed convolution in ttnn returns a tensor in a flattened shape + // (1 x 1 x N * H * W x C) + llvm::ArrayRef output_shape = outputTy.getShape(); + llvm::SmallVector flattenedOutputShape = { + 1, 1, output_shape[0] * output_shape[1] * output_shape[2], + output_shape[3]}; + outputTy = mlir::cast(getTypeConverter()->convertType( + outputTy.cloneWith(flattenedOutputShape, outputTy.getElementType()))); + + // Using a tensor::EmptyOp so that the rewriter for EmptyOp can handle the + // attribute determination + auto convDPSOutput = rewriter.replaceOpWithNewOp( + adaptor.getOutput().getDefiningOp(), flattenedOutputShape, + outputTy.getElementType()); + + // Must set the type to the output type to maintain the layout attributes + convDPSOutput.getResult().setType(outputTy); + + ttnn::ConvTranspose2dOp new_conv = rewriter.create( + op.getLoc(), outputTy, adaptor.getInput(), adaptor.getWeight(), + adaptor.getBias(), convDPSOutput, device, inChannelsAttr, + outChannelsAttr, batchSizeAttr, inputHeightAttr, inputWidthAttr, + kernelSizeAttr, *strideAttr, *paddingAttr, *outputPaddingAttr, + *dilationAttr, groupsAttr); + + // Restore the normal shape (N x H x W x C) + Value output = + ttir_to_ttnn::utils::generateReshape(new_conv, output_shape, rewriter); + + rewriter.replaceOp(op, output); + return success(); + } + +private: + llvm::Expected + attrToDenseI32ArrayAttr(mlir::Attribute attr, + ConversionPatternRewriter &rewriter) const { + auto pair = ttmlir::utils::getPairOfInteger(attr); + if (auto error = pair.takeError()) { + return error; + } + + return rewriter.getDenseI32ArrayAttr({pair->first, pair->second}); + } +}; + class MaxPool2dOpConversionPattern : public OpConversionPattern { public: @@ -1233,6 +1334,7 @@ void populateTTIRToTTNNPatterns(MLIRContext *ctx, RewritePatternSet &patterns, LinearOpConversionPattern, MatmulOpConversionPattern, Conv2dOpConversionPattern, + ConvTranspose2dOpConversionPattern, MaxPool2dOpConversionPattern, SubtractOpConversionPattern, MeshShardOpConversionPattern, diff --git a/lib/Conversion/TTNNToEmitC/TTNNToEmitC.cpp b/lib/Conversion/TTNNToEmitC/TTNNToEmitC.cpp index f92e730baf..93940c0d5f 100644 --- a/lib/Conversion/TTNNToEmitC/TTNNToEmitC.cpp +++ b/lib/Conversion/TTNNToEmitC/TTNNToEmitC.cpp @@ -851,6 +851,8 @@ void populateTTNNToEmitCPatterns(mlir::MLIRContext *ctx, // Conv ops // patterns.add>(typeConverter, ctx); + patterns.add>( + typeConverter, ctx); patterns.add>(typeConverter, ctx); diff --git a/lib/Dialect/TTIR/IR/TTIROps.cpp b/lib/Dialect/TTIR/IR/TTIROps.cpp index 0b58ed860e..1145a27822 100644 --- a/lib/Dialect/TTIR/IR/TTIROps.cpp +++ b/lib/Dialect/TTIR/IR/TTIROps.cpp @@ -187,6 +187,154 @@ ::mlir::LogicalResult mlir::tt::ttir::Conv2dOp::verify() { return success(); } +//===----------------------------------------------------------------------===// +// ConvTranspose2dOp +//===----------------------------------------------------------------------===// + +// ConvTranspose2dOp verification +mlir::LogicalResult mlir::tt::ttir::ConvTranspose2dOp::verify() { + mlir::RankedTensorType inputType = getInput().getType(); + mlir::RankedTensorType weightType = getWeight().getType(); + mlir::RankedTensorType outputType = getOutput().getType(); + std::optional bias = + getBias().getImpl() ? std::make_optional(getBias().getType()) + : std::nullopt; + + if (inputType.getRank() != 4) { + return emitOpError("Input must be a 4D tensor"); + } + + if (outputType.getRank() != 4) { + return emitOpError("Output must be a 4D tensor"); + } + + if (weightType.getRank() != 4) { + return emitOpError("Weight must be a 4D tensor"); + } + + if (bias.has_value()) { + if (bias->getRank() != 4) { + return emitOpError("Bias must be a 4D tensor"); + } + } + + if (inputType.getShape()[0] != outputType.getShape()[0]) { + return emitOpError("Batch size of input and output tensors must match"); + } + + auto stride = ttmlir::utils::getPairOfInteger(getStride()); + if (auto error = stride.takeError()) { + return emitOpError() << llvm::toString(std::move(error)) << " for stride"; + } + if (stride->first < 1 || stride->second < 1) { + return emitOpError("Stride values must be greater than 0"); + } + + auto padding = ttmlir::utils::getPairOfInteger(getPadding()); + if (auto error = padding.takeError()) { + return emitOpError() << llvm::toString(std::move(error)) << " for padding"; + } + if (padding->first < 0 || padding->second < 0) { + return emitOpError("Padding values must be greater or equal than 0"); + } + + auto outputPadding = + ttmlir::utils::getPairOfInteger(getOutputPadding()); + if (auto error = outputPadding.takeError()) { + return emitOpError() << llvm::toString(std::move(error)) + << " for output padding"; + } + if (outputPadding->first < 0 || outputPadding->second < 0) { + return emitOpError("Output padding values must be greater or equal than 0"); + } + + auto dilation = ttmlir::utils::getPairOfInteger(getDilation()); + if (auto error = dilation.takeError()) { + return emitOpError() << llvm::toString(std::move(error)) << " for dilation"; + } + if (dilation->first < 1 || dilation->second < 1) { + return emitOpError("Dilation values must be greater than 0"); + } + + llvm::ArrayRef kernelShape = weightType.getShape(); + + int32_t inputChannels = inputType.getDimSize(inputType.getRank() - 1); + int32_t outputChannels = outputType.getDimSize(outputType.getRank() - 1); + uint32_t groups = getGroups(); + + if (inputChannels % groups != 0) { + return emitOpError() << "Number of input channels from input tensor must " + "be divisible by the number of groups. " + << "Got " << inputChannels << " input channels and " + << groups << " groups."; + } + + if (outputChannels % groups != 0) { + return emitOpError() << "Number of output channels from output tensor must " + "be divisible by the number of groups. " + << "Got " << outputChannels << " output channels and " + << groups << " groups."; + } + + if (inputChannels != kernelShape[0]) { + return emitOpError() << "Number of input channels from input tensor must " + "match the first dimension of the weight tensor. " + << "Got " << inputChannels << " input channels and " + << kernelShape[0] << " in the weight tensor."; + } + + if (outputChannels / groups != kernelShape[1]) { + return emitOpError() << "Number of output channels per group must match " + "the second dimension of the weight tensor. " + << "Got " << (outputChannels / groups) + << " output channels per group and " << kernelShape[1] + << " in the weight tensor."; + } + + if (bias) { + if (bias->getDimSize(bias->getRank() - 1) != outputChannels) { + return emitOpError() << "Mismatch in bias tensor dimensions. " + << "Bias tensor has " + << bias->getDimSize(bias->getRank() - 1) + << " channels, " + << "but the output tensor has " << outputChannels + << " channels."; + } + } + + int32_t kernelHeight = kernelShape[2]; + int32_t kernelWidth = kernelShape[3]; + + int32_t Hin = inputType.getDimSize(inputType.getRank() - 3); + int32_t Win = inputType.getDimSize(inputType.getRank() - 2); + + int32_t expectedHOut = (Hin - 1) * stride->first - 2 * padding->first + + dilation->first * (kernelHeight - 1) + + outputPadding->first + 1; + int32_t expectedWOut = (Win - 1) * stride->second - 2 * padding->second + + dilation->second * (kernelWidth - 1) + + outputPadding->second + 1; + if (expectedHOut < 0 || expectedWOut < 0) { + return emitOpError() << "Given input size per channel: (" << Hin << " x " + << Win << "). " + << "Calculated output size per channel: (" + << expectedHOut << " x " << expectedWOut << "). " + << "Output size is too small"; + } + + int32_t HOut = outputType.getDimSize(outputType.getRank() - 3); + int32_t WOut = outputType.getDimSize(outputType.getRank() - 2); + if (HOut != expectedHOut || WOut != expectedWOut) { + return emitOpError() << "Mismatch between expected output size per channel " + "and got output tensor dimensions. " + << "Expected: (" << expectedHOut << " x " + << expectedWOut << "), " + << "got: (" << HOut << " x " << WOut << ")."; + } + + return success(); +} + //===----------------------------------------------------------------------===// // ConvolutionOp //===----------------------------------------------------------------------===// diff --git a/lib/Dialect/TTNN/IR/TTNNOps.cpp b/lib/Dialect/TTNN/IR/TTNNOps.cpp index eccb1e9ba7..2560170528 100644 --- a/lib/Dialect/TTNN/IR/TTNNOps.cpp +++ b/lib/Dialect/TTNN/IR/TTNNOps.cpp @@ -81,6 +81,168 @@ ::mlir::LogicalResult mlir::tt::ttnn::Conv2dOp::verify() { return success(); } +//===----------------------------------------------------------------------===// +// ConvTranspose2dOp +//===----------------------------------------------------------------------===// + +// ConvTranspose2dOp verification +::mlir::LogicalResult mlir::tt::ttnn::ConvTranspose2dOp::verify() { + mlir::RankedTensorType inputType = getInput().getType(); + mlir::RankedTensorType weightType = getWeight().getType(); + mlir::RankedTensorType outputType = getOutput().getType(); + std::optional bias = + getBias().getImpl() ? std::make_optional(getBias().getType()) + : std::nullopt; + + if (inputType.getRank() != 4) { + return emitOpError("Input must be a 4D tensor"); + } + + if (outputType.getRank() != 4) { + return emitOpError("Output must be a 4D tensor"); + } + + if (weightType.getRank() != 4) { + return emitOpError("Weight must be a 4D tensor"); + } + + if (bias.has_value()) { + if (bias->getRank() != 4) { + return emitOpError("Bias must be a 4D tensor"); + } + } + + std::function &, const char *, + int32_t)> + checkBiggerThan = [&](llvm::ArrayRef &values, const char *name, + int32_t minValue) -> mlir::LogicalResult { + for (int32_t value : values) { + if (value < minValue) { + return emitOpError() << "Attribute '" << name + << "' contains a value less than: " << minValue; + } + } + return mlir::success(); + }; + + uint32_t inChannels = getInChannels(); + if (inChannels != inputType.getDimSize(inputType.getRank() - 1)) { + return emitOpError("Input channels attribute must match " + "the last dimension of the input tensor"); + } + + uint32_t outChannels = getOutChannels(); + if (outChannels != outputType.getDimSize(outputType.getRank() - 1)) { + return emitOpError("Output channels attribute match " + "the last dimension of the output tensor"); + } + + uint32_t batchSize = getBatchSize(); + if (batchSize != inputType.getDimSize(0)) { + return emitOpError("Batch size attribute must match the first " + "dimension of the input tensor"); + } + + uint32_t inputHeight = getInputHeight(); + if (inputHeight != inputType.getDimSize(inputType.getRank() - 3)) { + return emitOpError("Input height attribute must match the third " + "dimension of the input tensor"); + } + + uint32_t inputWidth = getInputWidth(); + if (inputWidth != inputType.getDimSize(inputType.getRank() - 2)) { + return emitOpError("Input width attribute must match the second " + "dimension of the input tensor"); + } + + llvm::ArrayRef stride = getStride(); + if (failed(checkBiggerThan(stride, "stride", 1))) { + return mlir::failure(); + } + + llvm::ArrayRef padding = getPadding(); + if (failed(checkBiggerThan(padding, "padding", 0))) { + return mlir::failure(); + } + + llvm::ArrayRef outputPadding = getOutputPadding(); + if (failed(checkBiggerThan(outputPadding, "output padding", 0))) { + return mlir::failure(); + } + + llvm::ArrayRef dilation = getDilation(); + if (failed(checkBiggerThan(dilation, "dilation", 1))) { + return mlir::failure(); + } + + llvm::ArrayRef kernelShape = weightType.getShape(); + + int32_t inputChannels = inputType.getDimSize(inputType.getRank() - 1); + int32_t outputChannels = outputType.getDimSize(outputType.getRank() - 1); + uint32_t groups = getGroups(); + + if (inputChannels % groups != 0) { + return emitOpError() << "Number of input channels from input tensor must " + "be divisible by the number of groups. " + << "Got " << inputChannels << " input channels and " + << groups << " groups."; + } + + if (outputChannels % groups != 0) { + return emitOpError() << "Number of output channels from output tensor must " + "be divisible by the number of groups. " + << "Got " << outputChannels << " output channels and " + << groups << " groups."; + } + + if (inputChannels != kernelShape[0]) { + return emitOpError() << "Number of input channels from input tensor must " + "match the first dimension of the weight tensor. " + << "Got " << inputChannels << " input channels and " + << kernelShape[0] << " in the weight tensor."; + } + + if (outputChannels / groups != kernelShape[1]) { + return emitOpError() << "Number of output channels per group must match " + "the second dimension of the weight tensor. " + << "Got " << (outputChannels / groups) + << " output channels per group and " << kernelShape[1] + << " in the weight tensor."; + } + + if (bias) { + if (bias->getDimSize(bias->getRank() - 1) != outputChannels) { + return emitOpError() << "Mismatch in bias tensor dimensions. " + << "Bias tensor has " + << bias->getDimSize(bias->getRank() - 1) + << " channels, " + << "but the output tensor has " << outputChannels + << " channels."; + } + } + + int32_t kernelHeight = kernelShape[2]; + int32_t kernelWidth = kernelShape[3]; + + int32_t Hin = inputType.getDimSize(inputType.getRank() - 3); + int32_t Win = inputType.getDimSize(inputType.getRank() - 2); + + int32_t expectedHOut = (Hin - 1) * stride[0] - 2 * padding[0] + + dilation[0] * (kernelHeight - 1) + outputPadding[0] + + 1; + int32_t expectedWOut = (Win - 1) * stride[1] - 2 * padding[1] + + dilation[1] * (kernelWidth - 1) + outputPadding[1] + 1; + if (expectedHOut < 0 || expectedWOut < 0) { + return emitOpError() << "Given input size per channel: (" << Hin << " x " + << Win << "). " + << "Calculated output size per channel: (" + << expectedHOut << " x " << expectedWOut << "). " + << "Output size is too small"; + } + + return success(); +} + //===----------------------------------------------------------------------===// // MaxPool2dOp //===----------------------------------------------------------------------===// diff --git a/lib/Dialect/TTNN/Transforms/TTNNLayout.cpp b/lib/Dialect/TTNN/Transforms/TTNNLayout.cpp index c9e9c04819..9dd6163f36 100644 --- a/lib/Dialect/TTNN/Transforms/TTNNLayout.cpp +++ b/lib/Dialect/TTNN/Transforms/TTNNLayout.cpp @@ -284,11 +284,13 @@ class TTNNLayoutDPSOperandsRewriter bool modified = false; for (OpOperand &operand : op->getOpOperands()) { // Check if the operand is a dps result - bool isResult = op.isDpsInit(&operand); + bool isDPSResult = op.isDpsInit(&operand); // TTNN Conv2d moves input, weight, and bias from host to device // itself. Inserting the ToLayoutOp on these operands is thus problematic. - if (mlir::isa(op.getOperation()) && !isResult) { + if (!isDPSResult && + (mlir::isa(op.getOperation()) || + mlir::isa(op.getOperation()))) { // For the weight input of the conv2d op, it specifically needs to be on // host, so we create a host to layout op (issue // https://github.com/tenstorrent/tt-mlir/issues/1528). @@ -320,7 +322,7 @@ class TTNNLayoutDPSOperandsRewriter modified = true; op->setOperand(operand.getOperandNumber(), *desiredLayout); // If operand is dps result, update the result type on current op - if (isResult) { + if (isDPSResult) { op->getResult(0).setType(desiredLayout->getType()); } }); diff --git a/lib/Target/TTNN/TTNNToFlatbuffer.cpp b/lib/Target/TTNN/TTNNToFlatbuffer.cpp index 1709443113..7cf8adb527 100644 --- a/lib/Target/TTNN/TTNNToFlatbuffer.cpp +++ b/lib/Target/TTNN/TTNNToFlatbuffer.cpp @@ -467,6 +467,40 @@ createOp(FlatbufferObjectCache &cache, Conv2dOp op) { op.getGroups()); } +::flatbuffers::Offset<::tt::target::ttnn::ConvTranspose2dOp> +createOp(FlatbufferObjectCache &cache, ConvTranspose2dOp op) { + auto in0 = + cache.at<::tt::target::TensorRef>(getOperandThroughDPSOps(op.getInput())); + auto in1 = cache.at<::tt::target::TensorRef>( + getOperandThroughDPSOps(op.getWeight())); + auto in2 = op.getODSOperands(2).empty() + ? flatbuffers::Offset<::tt::target::TensorRef>() + : cache.at<::tt::target::TensorRef>( + getOperandThroughDPSOps(op.getBias())); + auto output = cache.at<::tt::target::TensorRef>( + getOperandThroughDPSOps(op.getResult())); + + auto device = getOperandThroughDPSOps(op.getDevice()); + + ::flatbuffers::Offset<::flatbuffers::Vector> kernelSize = + toFlatbuffer(cache, op.getKernelSize()); + ::flatbuffers::Offset<::flatbuffers::Vector> stride = + toFlatbuffer(cache, op.getStride()); + ::flatbuffers::Offset<::flatbuffers::Vector> padding = + toFlatbuffer(cache, op.getPadding()); + ::flatbuffers::Offset<::flatbuffers::Vector> outputPadding = + toFlatbuffer(cache, op.getOutputPadding()); + ::flatbuffers::Offset<::flatbuffers::Vector> dilation = + toFlatbuffer(cache, op.getDilation()); + + return ::tt::target::ttnn::CreateConvTranspose2dOp( + *cache.fbb, in0, in1, in2, output, + cache.at<::tt::target::DeviceRef>(device), op.getInChannels(), + op.getOutChannels(), op.getBatchSize(), op.getInputHeight(), + op.getInputWidth(), kernelSize, stride, padding, outputPadding, dilation, + op.getGroups()); +} + ::flatbuffers::Offset<::tt::target::ttnn::AllGatherOp> createOp(FlatbufferObjectCache &cache, AllGatherOp op) { auto input = @@ -1142,6 +1176,11 @@ emitTTNNOperation(FlatbufferObjectCache &cache, Operation *op, return createOperation(cache, createOp(cache, conv2dOp), debugString, locInfo); } + if (auto conv_transpose2dOp = dyn_cast(op); + conv_transpose2dOp) { + return createOperation(cache, createOp(cache, conv_transpose2dOp), + debugString, locInfo); + } if (auto allGatherOp = dyn_cast(op); allGatherOp) { return createOperation(cache, createOp(cache, allGatherOp), debugString, locInfo); diff --git a/runtime/include/tt/runtime/detail/ttmetal.h b/runtime/include/tt/runtime/detail/ttmetal.h index 187fa5542f..aa7808bac0 100644 --- a/runtime/include/tt/runtime/detail/ttmetal.h +++ b/runtime/include/tt/runtime/detail/ttmetal.h @@ -35,7 +35,8 @@ tt::target::DataType getTensorDataType(Tensor tensor); size_t getNumAvailableDevices(); -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1); +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1, + std::optional l1SmallSize = std::nullopt); void closeDevice(Device device); diff --git a/runtime/include/tt/runtime/detail/ttnn.h b/runtime/include/tt/runtime/detail/ttnn.h index b1007d4057..081ef02fe6 100644 --- a/runtime/include/tt/runtime/detail/ttnn.h +++ b/runtime/include/tt/runtime/detail/ttnn.h @@ -86,7 +86,8 @@ tt::target::DataType getTensorDataType(Tensor tensor); size_t getNumAvailableDevices(); -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1); +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1, + std::optional l1SmallSize = std::nullopt); void closeDevice(Device device); diff --git a/runtime/include/tt/runtime/runtime.h b/runtime/include/tt/runtime/runtime.h index 2f278ffc1c..7725e4b565 100644 --- a/runtime/include/tt/runtime/runtime.h +++ b/runtime/include/tt/runtime/runtime.h @@ -71,7 +71,8 @@ tt::target::DataType getTensorDataType(Tensor tensor); size_t getNumAvailableDevices(); -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1); +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs = 1, + std::optional l1SmallSize = std::nullopt); void closeDevice(Device device); diff --git a/runtime/lib/common/system_desc.cpp b/runtime/lib/common/system_desc.cpp index ad86edf47f..f11dcc09c8 100644 --- a/runtime/lib/common/system_desc.cpp +++ b/runtime/lib/common/system_desc.cpp @@ -262,7 +262,9 @@ std::pair<::tt::runtime::SystemDesc, DeviceIds> getCurrentSystemDesc() { ::tt::tt_metal::distributed::MeshShape meshShape = {1, numDevices}; std::shared_ptr<::tt::tt_metal::distributed::MeshDevice> meshDevice = ::tt::tt_metal::distributed::MeshDevice::create( - meshShape, DEFAULT_L1_SMALL_SIZE, DEFAULT_TRACE_REGION_SIZE, 1, + ::tt::tt_metal::distributed::MeshDeviceConfig{.mesh_shape = + meshShape}, + DEFAULT_L1_SMALL_SIZE, DEFAULT_TRACE_REGION_SIZE, 1, ::tt::tt_metal::DispatchCoreType::WORKER); std::exception_ptr eptr = nullptr; std::unique_ptr<::tt::runtime::SystemDesc> desc; @@ -271,7 +273,7 @@ std::pair<::tt::runtime::SystemDesc, DeviceIds> getCurrentSystemDesc() { } catch (...) { eptr = std::current_exception(); } - meshDevice->close_devices(); + meshDevice->close(); if (eptr) { std::rethrow_exception(eptr); } diff --git a/runtime/lib/runtime.cpp b/runtime/lib/runtime.cpp index c25cfed51b..b0ac1ee43e 100644 --- a/runtime/lib/runtime.cpp +++ b/runtime/lib/runtime.cpp @@ -216,16 +216,17 @@ size_t getNumAvailableDevices() { LOG_FATAL("runtime is not enabled"); } -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs) { +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs, + std::optional l1SmallSize) { #if defined(TT_RUNTIME_ENABLE_TTNN) if (getCurrentRuntime() == DeviceRuntime::TTNN) { - return ::tt::runtime::ttnn::openDevice(deviceIds, numHWCQs); + return ::tt::runtime::ttnn::openDevice(deviceIds, numHWCQs, l1SmallSize); } #endif #if defined(TT_RUNTIME_ENABLE_TTMETAL) if (getCurrentRuntime() == DeviceRuntime::TTMetal) { - return ::tt::runtime::ttmetal::openDevice(deviceIds, numHWCQs); + return ::tt::runtime::ttmetal::openDevice(deviceIds, numHWCQs, l1SmallSize); } #endif LOG_FATAL("runtime is not enabled"); diff --git a/runtime/lib/ttmetal/runtime.cpp b/runtime/lib/ttmetal/runtime.cpp index 9cca242a58..f2702b52a3 100644 --- a/runtime/lib/ttmetal/runtime.cpp +++ b/runtime/lib/ttmetal/runtime.cpp @@ -66,13 +66,16 @@ size_t getNumAvailableDevices() { return ::tt::tt_metal::GetNumAvailableDevices(); } -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs) { +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs, + std::optional l1SmallSize) { LOG_ASSERT(deviceIds.size(), "No devices specified"); ::tt::tt_metal::distributed::MeshShape grid = {1, deviceIds.size()}; + size_t l1SmallSizeValue = l1SmallSize.value_or(DEFAULT_L1_SMALL_SIZE); std::shared_ptr<::tt::tt_metal::distributed::MeshDevice> meshDevice = ::tt::tt_metal::distributed::MeshDevice::create( - grid, DEFAULT_L1_SMALL_SIZE, DEFAULT_TRACE_REGION_SIZE, numHWCQs, + ::tt::tt_metal::distributed::MeshDeviceConfig{.mesh_shape = grid}, + l1SmallSizeValue, DEFAULT_TRACE_REGION_SIZE, numHWCQs, ::tt::tt_metal::DispatchCoreType::WORKER); return Device(std::static_pointer_cast(meshDevice), @@ -90,7 +93,7 @@ void closeDevice(Device device) { ::tt::tt_metal::detail::DumpDeviceProfileResults(ttmetalDevice); } #endif - ttmetalMeshDevice.close_devices(); + ttmetalMeshDevice.close(); } void deallocateBuffers(Device deviceHandle) { diff --git a/runtime/lib/ttnn/operations/CMakeLists.txt b/runtime/lib/ttnn/operations/CMakeLists.txt index d65c01b21c..a5066ed60e 100644 --- a/runtime/lib/ttnn/operations/CMakeLists.txt +++ b/runtime/lib/ttnn/operations/CMakeLists.txt @@ -8,6 +8,7 @@ set(TTNN_OPS_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/ccl/mesh_shard.cpp ${CMAKE_CURRENT_SOURCE_DIR}/conv/conv2d.cpp ${CMAKE_CURRENT_SOURCE_DIR}/creation/arange.cpp + ${CMAKE_CURRENT_SOURCE_DIR}/conv/conv_transpose2d.cpp ${CMAKE_CURRENT_SOURCE_DIR}/creation/empty.cpp ${CMAKE_CURRENT_SOURCE_DIR}/creation/ones.cpp ${CMAKE_CURRENT_SOURCE_DIR}/creation/full.cpp diff --git a/runtime/lib/ttnn/operations/ccl/mesh_shard.cpp b/runtime/lib/ttnn/operations/ccl/mesh_shard.cpp index 71fa5b8438..87c38c7749 100644 --- a/runtime/lib/ttnn/operations/ccl/mesh_shard.cpp +++ b/runtime/lib/ttnn/operations/ccl/mesh_shard.cpp @@ -17,8 +17,9 @@ void FullToShardShape(const ::ttnn::Tensor &input, ::ttnn::Tensor &out, const std::vector &shardShape) { if (shardType == ::tt::target::MeshShardType::Replicate) { out = ::ttnn::distributed::distribute_tensor( - input, meshDevice, - *::ttnn::distributed::replicate_tensor_to_mesh_mapper(meshDevice)); + input, + *::ttnn::distributed::replicate_tensor_to_mesh_mapper(meshDevice), + meshDevice); } else { LOG_ASSERT( input.get_shape().rank() > 1, @@ -48,9 +49,10 @@ void FullToShardShape(const ::ttnn::Tensor &input, ::ttnn::Tensor &out, } out = ::ttnn::distributed::distribute_tensor( - input, meshDevice, + input, *::ttnn::distributed::shard_tensor_to_2d_mesh_mapper( - meshDevice, meshDevice.shape(), shard2dConfig)); + meshDevice, meshDevice.shape(), shard2dConfig), + meshDevice); } } diff --git a/runtime/lib/ttnn/operations/conv/conv_transpose2d.cpp b/runtime/lib/ttnn/operations/conv/conv_transpose2d.cpp new file mode 100644 index 0000000000..e711594d22 --- /dev/null +++ b/runtime/lib/ttnn/operations/conv/conv_transpose2d.cpp @@ -0,0 +1,58 @@ +// SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC +// +// SPDX-License-Identifier: Apache-2.0 + +#include "operations/conv/conv_transpose2d.h" +#include "tt/runtime/detail/logger.h" +#include "tt/runtime/detail/ttnn.h" +#include "tt/runtime/ttnn/operations/utils.h" +#include "tt/runtime/ttnn/utils.h" +#include "ttmlir/Target/TTNN/program_generated.h" +#include "ttnn/operations/conv/conv_transpose2d/conv_transpose2d.hpp" +#include "ttnn/types.hpp" + +namespace tt::runtime::ttnn::operations::conv { +void run(const ::tt::target::ttnn::ConvTranspose2dOp *op, + ProgramContext &context) { + ProgramTensorPool &tensorPool = context.getTensorPool(); + const ::ttnn::Tensor &input = tensorPool.at(op->input()->global_id()); + const ::ttnn::Tensor &weight = tensorPool.at(op->weight()->global_id()); + DEBUG_ASSERT(input.is_allocated()); + DEBUG_ASSERT(weight.is_allocated()); + + std::optional<::ttnn::Tensor> bias = + op->bias() ? std::make_optional(tensorPool.at(op->bias()->global_id())) + : std::nullopt; + + std::array kernelSize, stride, padding, outputPadding, dilation; + std::copy(op->kernel_size()->begin(), op->kernel_size()->end(), + kernelSize.begin()); + std::copy(op->stride()->begin(), op->stride()->end(), stride.begin()); + std::copy(op->padding()->begin(), op->padding()->end(), padding.begin()); + std::copy(op->output_padding()->begin(), op->output_padding()->end(), + outputPadding.begin()); + std::copy(op->dilation()->begin(), op->dilation()->end(), dilation.begin()); + + auto config = ::ttnn::operations::conv::Conv2dConfig(); + config.dtype = utils::getDataType(op->input()); + config.weights_dtype = utils::getDataType(op->weight()); + config.shard_layout = ::ttnn::TensorMemoryLayout::WIDTH_SHARDED; + ::ttnn::MemoryConfig outMemConfig = + ::tt::runtime::ttnn::utils::createMemoryConfig(op->out()); + + DeviceVariant targetDevice = + context.getTargetDevice(op->device()->global_id()); + ::ttnn::Tensor out = std::visit( + [&](auto &&targetDevice) -> ::ttnn::Tensor { + return std::get<0>(::ttnn::conv_transpose2d( + ::ttnn::DefaultQueueId, input, weight, &(targetDevice.get()), + op->in_channels(), op->out_channels(), op->batch_size(), + op->input_height(), op->input_width(), kernelSize, stride, padding, + outputPadding, dilation, op->groups(), bias, config)); + }, + targetDevice); + + tensorPool.insert_or_assign(op->out()->global_id(), out); +} + +} // namespace tt::runtime::ttnn::operations::conv diff --git a/runtime/lib/ttnn/operations/conv/conv_transpose2d.h b/runtime/lib/ttnn/operations/conv/conv_transpose2d.h new file mode 100644 index 0000000000..a3be8431c9 --- /dev/null +++ b/runtime/lib/ttnn/operations/conv/conv_transpose2d.h @@ -0,0 +1,17 @@ +// SPDX-FileCopyrightText: (c) 2024 Tenstorrent AI ULC +// +// SPDX-License-Identifier: Apache-2.0 + +#ifndef RUNTIME_LIB_TTNN_OPERATIONS_CONV_CONVTRANSPOSE2D_H +#define RUNTIME_LIB_TTNN_OPERATIONS_CONV_CONVTRANSPOSE2D_H + +#include "tt/runtime/ttnn/types.h" +#include "ttmlir/Target/TTNN/program_generated.h" + +namespace tt::runtime::ttnn::operations::conv { +void run(const ::tt::target::ttnn::ConvTranspose2dOp *op, + ProgramContext &context); + +} // namespace tt::runtime::ttnn::operations::conv + +#endif diff --git a/runtime/lib/ttnn/program.cpp b/runtime/lib/ttnn/program.cpp index 8c47bfb20e..c210ce447b 100644 --- a/runtime/lib/ttnn/program.cpp +++ b/runtime/lib/ttnn/program.cpp @@ -6,6 +6,7 @@ #include "operations/ccl/reduce_scatter.h" #include "operations/context/get_device.h" #include "operations/conv/conv2d.h" +#include "operations/conv/conv_transpose2d.h" #include "operations/creation/arange.h" #include "operations/creation/empty.h" #include "operations/creation/full.h" @@ -221,6 +222,9 @@ void ProgramExecutor::runOperation(const ::tt::target::ttnn::Operation *op) { case ::tt::target::ttnn::OpType::Conv2dOp: { return operations::conv::run(op->type_as_Conv2dOp(), context); } + case ::tt::target::ttnn::OpType::ConvTranspose2dOp: { + return operations::conv::run(op->type_as_ConvTranspose2dOp(), context); + } case ::tt::target::ttnn::OpType::DeallocateOp: { return operations::deletion::run(op->type_as_DeallocateOp(), context); } diff --git a/runtime/lib/ttnn/runtime.cpp b/runtime/lib/ttnn/runtime.cpp index c527a94d7e..72378f881c 100644 --- a/runtime/lib/ttnn/runtime.cpp +++ b/runtime/lib/ttnn/runtime.cpp @@ -178,11 +178,14 @@ size_t getNumAvailableDevices() { return ::tt::tt_metal::GetNumAvailableDevices(); } -Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs) { +Device openDevice(DeviceIds const &deviceIds, size_t numHWCQs, + std::optional l1SmallSize) { LOG_ASSERT(deviceIds.size(), "No devices specified"); ::tt::tt_metal::distributed::MeshShape grid = {1, deviceIds.size()}; + size_t l1SmallSizeValue = l1SmallSize.value_or(kL1SmallSize); std::shared_ptr<::ttnn::MeshDevice> meshDevice = ::ttnn::MeshDevice::create( - grid, kL1SmallSize, DEFAULT_TRACE_REGION_SIZE, numHWCQs, + ::tt::tt_metal::distributed::MeshDeviceConfig{.mesh_shape = grid}, + l1SmallSizeValue, DEFAULT_TRACE_REGION_SIZE, numHWCQs, ::tt::tt_metal::DispatchCoreType::WORKER); bool enableAsync = debug::Env::get().enableAsyncTTNN; @@ -203,7 +206,7 @@ void closeDevice(Device device) { } #endif - ttnnMeshDevice.close_devices(); + ttnnMeshDevice.close(); } void deallocateBuffers(Device deviceHandle) { diff --git a/runtime/tools/python/ttrt/runtime/module.cpp b/runtime/tools/python/ttrt/runtime/module.cpp index 4c3eb8c690..f55b6b81e1 100644 --- a/runtime/tools/python/ttrt/runtime/module.cpp +++ b/runtime/tools/python/ttrt/runtime/module.cpp @@ -99,6 +99,7 @@ PYBIND11_MODULE(_C, m) { "Get the number of available devices"); m.def("open_device", &tt::runtime::openDevice, py::arg("device_ids"), py::arg("num_hw_cqs") = size_t{1}, + py::arg("l1_small_size") = py::none(), "Open a mesh of devices for execution"); m.def("close_device", &tt::runtime::closeDevice, "Close a mesh device"); m.def("to_host", &tt::runtime::toHost, py::arg("tensor"), diff --git a/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_negative.mlir b/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_negative.mlir new file mode 100644 index 0000000000..f29180ead2 --- /dev/null +++ b/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_negative.mlir @@ -0,0 +1,363 @@ +// RUN: not ttmlir-opt --split-input-file %s 2>&1 | FileCheck %s +// Negative tests for conv_transpose2d operation + +// Verify that the parsing fails if tensors don't have four dimensions +module attributes {} { + func.func @conv_transpose2d_invalid_input_shape(%arg0: tensor<8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Input must be a 4D tensor + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_weight_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x8x8x256xbf16> { + %0 = tensor.empty() : tensor<1x8x8x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Weight must be a 4D tensor + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x8x8x256xbf16>) -> tensor<1x8x8x256xbf16> + return %1 : tensor<1x8x8x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_bias_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<256xbf16>) -> tensor<1x8x8x256xbf16> { + %0 = tensor.empty() : tensor<1x8x8x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Bias must be a 4D tensor + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<256xbf16>, tensor<1x8x8x256xbf16>) -> tensor<1x8x8x256xbf16> + return %1 : tensor<1x8x8x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_output_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<10x10x256xbf16> { + %0 = tensor.empty() : tensor<10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Output must be a 4D tensor + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<10x10x256xbf16>) -> tensor<10x10x256xbf16> + return %1 : tensor<10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_output_shape(%arg0: tensor<4x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<2x10x10x256xbf16> { + %0 = tensor.empty() : tensor<2x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Batch size of input and output tensors must match + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<4x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<2x10x10x256xbf16>) -> tensor<2x10x10x256xbf16> + return %1 : tensor<2x10x10x256xbf16> + } +} + +// Verify that the parsing fails if attributes are not integers or pair of integers +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_stride_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Expected integer or pair of integers, got tuple of size 3 for stride + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = array, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_padding_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Expected integer or pair of integers, got tuple of size 3 for padding + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = array, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_output_padding_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Expected integer or pair of integers, got tuple of size 3 for output padding + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = array, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_dilation_shape(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Expected integer or pair of integers, got tuple of size 3 for dilation + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = array, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// Verify that the parsing fails if attributes have invalid values +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_stride_values(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Stride values must be greater than 0 + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = array, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_padding_values(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Padding values must be greater or equal than 0 + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = array, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_output_padding_values(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Output padding values must be greater or equal than 0 + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = -6: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_invalid_dilation_values(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Dilation values must be greater than 0 + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = array, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// Verify the parsing fails if number of channels are incorrect +// ----- +module attributes {} { + func.func @conv_transpose2d_input_channels_not_divisible_by_groups(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Number of input channels from input tensor must be divisible by the number of groups. Got 256 input channels and 3 groups + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 3: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_output_channels_not_divisible_by_groups(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x350x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x350xbf16> { + %0 = tensor.empty() : tensor<1x10x10x350xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Number of output channels from output tensor must be divisible by the number of groups. Got 350 output channels and 4 groups. + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 4: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x350x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x350xbf16>) -> tensor<1x10x10x350xbf16> + return %1 : tensor<1x10x10x350xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_input_channels_missmatch_with_weight(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<128x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Number of input channels from input tensor must match the first dimension of the weight tensor. Got 256 input channels and 128 in the weight tensor. + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<128x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_output_channels_missmatch_with_weight(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Number of output channels per group must match the second dimension of the weight tensor. Got 64 output channels per group and 256 in the weight tensor. + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 4: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_output_channels_missmatch_with_bias(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x128xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Mismatch in bias tensor dimensions. Bias tensor has 128 channels, but the output tensor has 256 channels. + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x128xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// Verify the parsing fails if calculated output size per channel is below zero or different from the output tensor +// ----- +module attributes {} { + func.func @conv_transpose2d_output_channels_missmatch_with_bias(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x128xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Mismatch in bias tensor dimensions. Bias tensor has 128 channels, but the output tensor has 256 channels. + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x128xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_calculated_output_size_per_channel_below_zero(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x10x10x256xbf16> { + %0 = tensor.empty() : tensor<1x10x10x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Given input size per channel: (8 x 8). Calculated output size per channel: (-2 x -4). Output size is too small + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = array, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x10x10x256xbf16>) -> tensor<1x10x10x256xbf16> + return %1 : tensor<1x10x10x256xbf16> + } +} + +// ----- +module attributes {} { + func.func @conv_transpose2d_calculated_output_size_per_channel_missmatch_with_output_tensor(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x2x2x256xbf16> { + %0 = tensor.empty() : tensor<1x2x2x256xbf16> + // CHECK: error: 'ttir.conv_transpose2d' op Mismatch between expected output size per channel and got output tensor dimensions. Expected: (10 x 10), got: (2 x 2). + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x2x2x256xbf16>) -> tensor<1x2x2x256xbf16> + return %1 : tensor<1x2x2x256xbf16> + } +} diff --git a/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_positive.mlir b/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_positive.mlir new file mode 100644 index 0000000000..bf1d52f0d1 --- /dev/null +++ b/test/ttmlir/Dialect/TTIR/conv_transpose2d/conv_transpose2d_tests_positive.mlir @@ -0,0 +1,101 @@ +// RUN: ttmlir-opt %s | FileCheck %s + +module attributes {} { + func.func @conv_transpose2d_simple(%arg0: tensor<4x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<4x10x10x256xbf16> { + %0 = tensor.empty() : tensor<4x10x10x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<4x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<4x10x10x256xbf16>) -> tensor<4x10x10x256xbf16> + return %1 : tensor<4x10x10x256xbf16> + } + + func.func @conv_transpose2d_stride(%arg0: tensor<1x16x32x256xbf16>, %arg1: tensor<256x256x8x8xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x38x132x256xbf16> { + %0 = tensor.empty() : tensor<1x38x132x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = array, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x16x32x256xbf16>, tensor<256x256x8x8xbf16>, tensor<1x1x1x256xbf16>, tensor<1x38x132x256xbf16>) -> tensor<1x38x132x256xbf16> + return %1 : tensor<1x38x132x256xbf16> + } + + func.func @conv_transpose2d_padding(%arg0: tensor<1x64x64x256xbf16>, %arg1: tensor<256x256x16x16xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x73x67x256xbf16> { + %0 = tensor.empty() : tensor<1x73x67x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = array, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x64x64x256xbf16>, tensor<256x256x16x16xbf16>, tensor<1x1x1x256xbf16>, tensor<1x73x67x256xbf16>) -> tensor<1x73x67x256xbf16> + return %1 : tensor<1x73x67x256xbf16> + } + + func.func @conv_transpose2d_output_padding(%arg0: tensor<1x32x32x128xbf16>, %arg1: tensor<128x256x8x8xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x45x47x256xbf16> { + %0 = tensor.empty() : tensor<1x45x47x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = array, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<1x32x32x128xbf16>, tensor<128x256x8x8xbf16>, tensor<1x1x1x256xbf16>, tensor<1x45x47x256xbf16>) -> tensor<1x45x47x256xbf16> + return %1 : tensor<1x45x47x256xbf16> + } + + func.func @conv_transpose2d_dilation(%arg0: tensor<1x32x32x128xbf16>, %arg1: tensor<128x256x16x32xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x77x94x256xbf16> { + %0 = tensor.empty() : tensor<1x77x94x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = array, + groups = 1: i32} + > : (tensor<1x32x32x128xbf16>, tensor<128x256x16x32xbf16>, tensor<1x1x1x256xbf16>, tensor<1x77x94x256xbf16>) -> tensor<1x77x94x256xbf16> + return %1 : tensor<1x77x94x256xbf16> + } + + func.func @conv_transpose2d_groups(%arg0: tensor<1x16x32x192xbf16>, %arg1: tensor<192x126x8x8xbf16>, %arg2: tensor<1x1x1x252xbf16>) -> tensor<1x23x39x252xbf16> { + %0 = tensor.empty() : tensor<1x23x39x252xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 2: i32} + > : (tensor<1x16x32x192xbf16>, tensor<192x126x8x8xbf16>, tensor<1x1x1x252xbf16>, tensor<1x23x39x252xbf16>) -> tensor<1x23x39x252xbf16> + return %1 : tensor<1x23x39x252xbf16> + } + + func.func @conv_transpose2d(%arg0: tensor<1x8x8x256xbf16>, %arg1: tensor<256x64x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<1x21x38x256xbf16> { + %0 = tensor.empty() : tensor<1x21x38x256xbf16> + // CHECK: %[[C:.*]] = "ttir.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = array, + padding = array, + output_padding = array, + dilation = array, + groups = 4: i32} + > : (tensor<1x8x8x256xbf16>, tensor<256x64x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<1x21x38x256xbf16>) -> tensor<1x21x38x256xbf16> + return %1 : tensor<1x21x38x256xbf16> + } +} diff --git a/test/ttmlir/Silicon/TTNN/perf_unit/test_perf_conv_transpose2d.mlir b/test/ttmlir/Silicon/TTNN/perf_unit/test_perf_conv_transpose2d.mlir new file mode 100644 index 0000000000..a268c7bab7 --- /dev/null +++ b/test/ttmlir/Silicon/TTNN/perf_unit/test_perf_conv_transpose2d.mlir @@ -0,0 +1,19 @@ +// RUN: ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=%system_desc_path%" %s > %t.mlir +// RUN: FileCheck %s --input-file=%t.mlir +// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %t.ttnn + +module attributes {} { + func.func @forward(%arg0: tensor<3x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<3x10x10x256xbf16> { + %0 = tensor.empty() : tensor<3x10x10x256xbf16> + // CHECK: %[[C:.*]] = "ttnn.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<3x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<3x10x10x256xbf16>) -> tensor<3x10x10x256xbf16> + return %1 : tensor<3x10x10x256xbf16> + } +} diff --git a/test/ttmlir/Silicon/TTNN/simple_conv_transpose2d.mlir b/test/ttmlir/Silicon/TTNN/simple_conv_transpose2d.mlir new file mode 100644 index 0000000000..a268c7bab7 --- /dev/null +++ b/test/ttmlir/Silicon/TTNN/simple_conv_transpose2d.mlir @@ -0,0 +1,19 @@ +// RUN: ttmlir-opt --ttir-to-ttnn-backend-pipeline="system-desc-path=%system_desc_path%" %s > %t.mlir +// RUN: FileCheck %s --input-file=%t.mlir +// RUN: ttmlir-translate --ttnn-to-flatbuffer %t.mlir > %t.ttnn + +module attributes {} { + func.func @forward(%arg0: tensor<3x8x8x256xbf16>, %arg1: tensor<256x256x3x3xbf16>, %arg2: tensor<1x1x1x256xbf16>) -> tensor<3x10x10x256xbf16> { + %0 = tensor.empty() : tensor<3x10x10x256xbf16> + // CHECK: %[[C:.*]] = "ttnn.conv_transpose2d"[[C:.*]] + %1 = "ttir.conv_transpose2d"(%arg0, %arg1, %arg2, %0) + <{ + stride = 1: i32, + padding = 0: i32, + output_padding = 0: i32, + dilation = 1: i32, + groups = 1: i32} + > : (tensor<3x8x8x256xbf16>, tensor<256x256x3x3xbf16>, tensor<1x1x1x256xbf16>, tensor<3x10x10x256xbf16>) -> tensor<3x10x10x256xbf16> + return %1 : tensor<3x10x10x256xbf16> + } +} diff --git a/third_party/CMakeLists.txt b/third_party/CMakeLists.txt index b1bf234e6e..ae321d425c 100644 --- a/third_party/CMakeLists.txt +++ b/third_party/CMakeLists.txt @@ -1,6 +1,6 @@ include(ExternalProject) -set(TT_METAL_VERSION "5d4c047dacf2606dd56c7b4d51d5049bf2c6846a") +set(TT_METAL_VERSION "eadc98f1c0f714c423fdfc97689afbc50c0dca3b") if ("$ENV{ARCH_NAME}" STREQUAL "grayskull") set(ARCH_NAME "grayskull")