Skip to content

Commit d82ce76

Browse files
authored
[Feature Enhancement] add BladeDISC for compiler backend (#242)
* [Feature Enhancement] add BladeDISC for compiler backend * Fix test_compiler * Fix test_compiler * Fix test_compiler and blade_disc_backend * update bladedisc version info
1 parent 6f7b3a7 commit d82ce76

File tree

8 files changed

+1251
-36
lines changed

8 files changed

+1251
-36
lines changed

docs/BladeDISC_batch_test.txt

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
Expected bias to be of same shape as normalized_shape, but got bias of shape [2048] and normalized_shape = [512]
2+
Failed! Try to export it through torch.jit.script:
3+
4+
5+
Arguments for call are not valid.
6+
The following variants are available:
7+
8+
aten::device(str a) -> (Device):
9+
Argument a not provided.
10+
11+
device(str type) -> (Device):
12+
Keyword argument index unknown.
13+
14+
The original call is:
15+
File "/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M/model.py", line 409
16+
q_with_bias_v = transpose_8 = None
17+
zero_pad = torch.zeros(
18+
(1, 8, 763, 1), device=device(type="cuda", index=0), dtype=torch.float32
19+
~~~~~~ <--- HERE
20+
)
21+
x_padded = torch.cat([zero_pad, matrix_bd], dim=-1)
22+
23+
Fail to export torchscript on the top level of the model, We will iterate over the submodules and replace those that can be successfully exported by the torch.jit.script
24+
graph-net-test-compiler-log equal model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
25+
graph-net-test-compiler-log all_close_atol8_rtol8 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
26+
graph-net-test-compiler-log all_close_atol8_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
27+
graph-net-test-compiler-log all_close_atol5_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
28+
graph-net-test-compiler-log all_close_atol3_rtol2 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
29+
graph-net-test-compiler-log all_close_atol2_rtol1 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 1
30+
graph-net-test-compiler-log max_diff model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0.0
31+
graph-net-test-compiler-log mean_diff model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0.0
32+
graph-net-test-compiler-log diff_count_atol8_rtol8 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0
33+
graph-net-test-compiler-log diff_count_atol8_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0
34+
graph-net-test-compiler-log diff_count_atol5_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0
35+
graph-net-test-compiler-log diff_count_atol3_rtol2 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0
36+
graph-net-test-compiler-log diff_count_atol2_rtol1 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M 0
37+
graph-net-test-compiler-log duration model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice-300M eager:8.6400 compiled:8.4100
38+
[Profiling] Using device: cuda:0 NVIDIA A100-SXM4-40GB, warm up 3, trials 5
39+
Trial 1: 9.36 ms
40+
Trial 2: 8.55 ms
41+
Trial 3: 8.48 ms
42+
Trial 4: 8.41 ms
43+
Trial 5: 8.41 ms
44+
[Profiling] Using device: cuda:0 NVIDIA A100-SXM4-40GB, warm up 3, trials 5
45+
Trial 1: 8.42 ms
46+
Trial 2: 8.41 ms
47+
Trial 3: 8.4 ms
48+
Trial 4: 8.4 ms
49+
Trial 5: 8.41 ms
50+
Expected bias to be of same shape as normalized_shape, but got bias of shape [2048] and normalized_shape = [512]
51+
Failed! Try to export it through torch.jit.script:
52+
53+
54+
Arguments for call are not valid.
55+
The following variants are available:
56+
57+
aten::device(str a) -> (Device):
58+
Argument a not provided.
59+
60+
device(str type) -> (Device):
61+
Keyword argument index unknown.
62+
63+
The original call is:
64+
File "/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B/model.py", line 291
65+
q_with_bias_v = transpose_8 = None
66+
zero_pad = torch.zeros(
67+
(1, 8, 796, 1), device=device(type="cuda", index=0), dtype=torch.float32
68+
~~~~~~ <--- HERE
69+
)
70+
x_padded = torch.cat([zero_pad, matrix_bd], dim=-1)
71+
72+
Fail to export torchscript on the top level of the model, We will iterate over the submodules and replace those that can be successfully exported by the torch.jit.script
73+
graph-net-test-compiler-log equal model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
74+
graph-net-test-compiler-log all_close_atol8_rtol8 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
75+
graph-net-test-compiler-log all_close_atol8_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
76+
graph-net-test-compiler-log all_close_atol5_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
77+
graph-net-test-compiler-log all_close_atol3_rtol2 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
78+
graph-net-test-compiler-log all_close_atol2_rtol1 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 1
79+
graph-net-test-compiler-log max_diff model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0.0
80+
graph-net-test-compiler-log mean_diff model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0.0
81+
graph-net-test-compiler-log diff_count_atol8_rtol8 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0
82+
graph-net-test-compiler-log diff_count_atol8_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0
83+
graph-net-test-compiler-log diff_count_atol5_rtol5 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0
84+
graph-net-test-compiler-log diff_count_atol3_rtol2 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0
85+
graph-net-test-compiler-log diff_count_atol2_rtol1 model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B 0
86+
graph-net-test-compiler-log duration model_path:/daiwenhao/GraphNet/samples/cosyvoice/CosyVoice2-0.5B eager:6.0600 compiled:6.0400
87+
[Profiling] Using device: cuda:0 NVIDIA A100-SXM4-40GB, warm up 3, trials 5
88+
Trial 1: 6.11 ms
89+
Trial 2: 6.05 ms
90+
Trial 3: 6.04 ms
91+
Trial 4: 6.04 ms
92+
Trial 5: 6.05 ms
93+
[Profiling] Using device: cuda:0 NVIDIA A100-SXM4-40GB, warm up 3, trials 5
94+
Trial 1: 6.03 ms
95+
Trial 2: 6.03 ms
96+
Trial 3: 6.04 ms
97+
Trial 4: 6.04 ms
98+
Trial 5: 6.06 ms

docs/BladeDISC_tech_report.md

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
# 1 - Introduction
2+
3+
BladeDISC is an end-to-end **Dynamic Shape Compiler** project for machine learning workloads, which is one of the key components of Alibaba's [PAI-Blade](https://www.aliyun.com/activity/bigdata/blade). For more information, please refer to [Github BladeDISC | TorchBlade Overview](https://github.com/alibaba/BladeDISC/blob/main/docs/developers/bladedisc_torch_overview.md).
4+
5+
This technical report demonstrates that `graph_net.torch.test_compiler` supports using the BladeDISC compiler as a backend, i.e., it supports configuring `--compiler "bladedisc"`, reads subgraphs from the `GraphNet/samples` directory, and successfully executes and obtains correct evaluation results.
6+
7+
Taking BERT as an example [Optimize and Inference BERT with TorchBlade](https://github.com/alibaba/BladeDISC/blob/main/docs/tutorials/torch_bert_inference.md), the main execution process is as follows:
8+
9+
1. Convert the PyTorch model to TorchScript using `torch.jit.trace` or `torch.jit.script`.
10+
2. Compile and optimize the model using BladeDISC's `torch_blade.optimize` to generate the compiled model `compiled_model`.
11+
3. Combine the compiled model with input parameters `compiled_model(input)` to execute the forward pass.
12+
13+
The process of compiling and optimizing with `torch.jit.trace` or `torch.jit.script` can be abstracted as follows:
14+
15+
```shell
16+
# allow_tracing=True using torch.jit.trace(model, inputs)
17+
compiled_model = torch_blade.optimize(model, allow_tracing=True, model_inputs=tuple(inputs))
18+
# allow_tracing=False using torch.jit.script(model)
19+
compiled_model = torch_blade.optimize(model, allow_tracing=False)
20+
```
21+
22+
In the test of this report, `torch.jit.trace` was used.
23+
24+
25+
# 2 - Installation Instructions
26+
27+
> The installation environment in this section is also the test environment used in Chapter 3.
28+
29+
Official quick deployment options include [Install BladeDISC With Docker](https://github.com/alibaba/BladeDISC/blob/main/docs/install_with_docker.md) or [Build BladeDISC from Source](https://github.com/alibaba/BladeDISC/blob/main/docs/build_from_source.md).
30+
31+
However, BladeDISC's last official support ended in 2022, when it was adapted for PyTorch 1.X series. Compiling from source requires specific modifications to adapt to PyTorch 2.X. Therefore, it is recommended to use the official image `bladedisc/bladedisc:latest-runtime-torch1.12.0-cu113` to quickly obtain compiler performance evaluation data.
32+
33+
```shell
34+
docker run -itd --gpus all --name torch_bladedisc_test -v /your_path:/your_path registry.cn-shanghai.aliyuncs.com/bladedisc/bladedisc:latest-runtime-torch1.12.0-cu113 /bin/bash
35+
```
36+
37+
**Note**: Since BladeDISC is not adapted for PyTorch 2.X, certain parts of GraphNet that depend on higher versions of PyTorch should be commented out before execution. For example, `GraphNet/graph_net/torch/__init__.py` should be modified as follows:
38+
39+
```shell
40+
"""
41+
GraphNet PyTorch Implementation
42+
"""
43+
# from .extractor import extract
44+
# from .samples_util import get_default_samples_directory
45+
# __all__ = ["extract", "get_default_samples_directory"]
46+
```
47+
48+
49+
50+
### 3 - Test Report
51+
52+
- BladeDISC for torch (import torch_blade) does not exhibit any entire category of models failing to run in the existing `/samples` (as of 2025.08.30).
53+
54+
- For all models under `/samples/cosyvoice`, batch performance testing on GPU A100-SXM-40GB is documented in `BladeDISC_batch_test.txt`.
55+
56+
- For each category in `/samples`, one model was tested. The validation report can be found in `BladeDISC_validation_report.txt`, with a performance overview as follows:
57+
58+
| Model | Eager (ms) | Compiled (ms) |
59+
| ------------------------------------------------------------ | ---------- | ------------- |
60+
| cosyvoice/CosyVoice-300M | 8.4000 | 8.3600 |
61+
| mmpose/2xmspn_50 | 17.1000 | 14.1000 |
62+
| mmseg/ANN_R50 | 21.7000 | 21.8000 |
63+
| nemo/parakeet-ctc-0.6b | 55.3000 | 54.4000 |
64+
| torchaudio/convtasnet_base_libri2mix | 99.4000 | 99.6000 |
65+
| torchgeometric/LINKX | 1.0300 | 0.7280 |
66+
| timm/darknet17 | 2.1500 | 2.1300 |
67+
| torchvision/deeplabv3_resnet50 | 8.4300 | 7.6200 |
68+
| transformers-auto-model/hf-tiny-model-private_tiny-random-AltCLIPModel | 6.0000 | 4.4200 |
69+
| ultralytics/yolo11l-cls | 17.6000 | 14.8000 |
70+
71+
72+
73+
# 4 - Execution Issue Analysis
74+
75+
### Issue 1: Unsupported Operators
76+
77+
The PyTorch version is too old (1.X), and some operators are only available in newer versions. For example:
78+
79+
```shell
80+
Traceback (most recent call last):
81+
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
82+
return _run_code(code, main_globals, None,
83+
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
84+
exec(code, run_globals)
85+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 490, in <module>
86+
main(args=args)
87+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 442, in main
88+
test_single_model(args)
89+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 243, in test_single_model
90+
model = get_model(args)
91+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 102, in get_model
92+
model_class = load_class_from_file(args, class_name="GraphModule")
93+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 90, in load_class_from_file
94+
exec(compiled_code, module.__dict__)
95+
File "/daiwenhao/GraphNet/samples/torchvision/alexnet/model.py", line 4, in <module>
96+
class GraphModule(torch.nn.Module):
97+
File "/daiwenhao/GraphNet/samples/torchvision/alexnet/model.py", line 9, in GraphModule
98+
s1: torch.SymInt,
99+
AttributeError: module 'torch' has no attribute 'SymInt'
100+
```
101+
102+
Another example:
103+
104+
```shell
105+
weight should have at least three dimensions
106+
Failed! Try to export it through torch.jit.script:
107+
object has no attribute scaled_dot_product_attention:
108+
File "/daiwenhao/GraphNet/samples/torchaudio/hubert_base/model.py", line 609
109+
v = view_2.transpose(2, 1)
110+
view_2 = None
111+
attn_output = torch._C._nn.scaled_dot_product_attention(
112+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
113+
q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False
114+
)
115+
Fail to export torchscript on the top level of the model, We will iterate over the submodules and replace those that can be successfully exported by the torch.jit.script
116+
Traceback (most recent call last):
117+
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
118+
return _run_code(code, main_globals, None,
119+
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
120+
exec(code, run_globals)
121+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 494, in <module>
122+
main(args=args)
123+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 446, in main
124+
test_single_model(args)
125+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 290, in test_single_model
126+
eager_stats = measure_performance(eager_model_call, args, compiler)
127+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 228, in measure_performance
128+
times = time_execution_with_cuda_event(
129+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 167, in time_execution_with_cuda_event
130+
kernel_fn(*args)
131+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 287, in <lambda>
132+
eager_model_call = lambda: model(**input_dict)
133+
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
134+
return forward_call(*input, **kwargs)
135+
File "/daiwenhao/GraphNet/samples/torchaudio/hubert_base/model.py", line 609, in forward
136+
attn_output = torch._C._nn.scaled_dot_product_attention(
137+
AttributeError: module 'torch._C._nn' has no attribute 'scaled_dot_product_attention'
138+
```
139+
140+
### Issue 2: Unsupported Dynamic Types
141+
142+
Still due to the outdated PyTorch version (1.X), dynamic types in models are not supported.
143+
144+
```shell
145+
object has no attribute sym_size:
146+
File "/daiwenhao/GraphNet/samples/torchgeometric/GAT/model.py", line 114
147+
edge_index = l_edge_index_[(slice(None, None, None), mask)]
148+
mask = None
149+
sym_size_int = torch.ops.aten.sym_size.int(edge_index, 1)
150+
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
151+
_check_is_size = torch._check_is_size(sym_size_int)
152+
_check_is_size = None
153+
Fail to export torchscript on the top level of the model, We will iterate over the submodules and replace those that can be successfully exported by the torch.jit.script
154+
Traceback (most recent call last):
155+
File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 198, in __getattr__
156+
op, overload_names = torch._C._jit_get_operation(qualified_op_name)
157+
RuntimeError: No such operator aten::sym_size
158+
The above exception was the direct cause of the following exception:
159+
Traceback (most recent call last):
160+
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
161+
return _run_code(code, main_globals, None,
162+
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
163+
exec(code, run_globals)
164+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 492, in <module>
165+
main(args=args)
166+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 444, in main
167+
test_single_model(args)
168+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 288, in test_single_model
169+
eager_stats = measure_performance(eager_model_call, args, compiler)
170+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 226, in measure_performance
171+
times = time_execution_with_cuda_event(
172+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 165, in time_execution_with_cuda_event
173+
kernel_fn(*args)
174+
File "/daiwenhao/GraphNet/graph_net/torch/test_compiler.py", line 285, in <lambda>
175+
eager_model_call = lambda: model(**input_dict)
176+
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
177+
return forward_call(*input, **kwargs)
178+
File "/daiwenhao/GraphNet/samples/torchgeometric/GAT/model.py", line 114, in forward
179+
sym_size_int = torch.ops.aten.sym_size.int(edge_index, 1)
180+
File "/usr/local/lib/python3.8/dist-packages/torch/_ops.py", line 202, in __getattr__
181+
raise AttributeError(f"'_OpNamespace' object has no attribute '{op_name}'") from e
182+
AttributeError: '_OpNamespace' object has no attribute 'sym_size'
183+
```
184+
185+
### Issue 3: Unsupported `device(type="cuda", index=0)`
186+
187+
In torch.jit.script execution mode, the BladeDISCBackend does not require input specifications, but `device(type="cuda", index=0)` is not supported by TorchScript; only `torch.device("cuda")` is supported.
188+
189+
```shell
190+
The following variants are available:
191+
aten::device(str a) -> (Device):
192+
Argument a not provided.
193+
194+
device(str type) -> (Device):
195+
Keyword argument index unknown.
196+
197+
The original call is:
198+
File "/daiwenhao/GraphNet/samples/ultralytics/yolo11l/model.py", line 6511
199+
l_self_modules_model_modules_23_stride = None
200+
arange = torch.arange(
201+
end=80, device=device(type="cuda", index=0), dtype=torch.float32
202+
~~~~~~ <--- HERE
203+
)
204+
sx = arange + 0.5
205+
206+
Fail to export torchscript on the top level of the model, We will iterate over the submodules and replace those that can be successfully exported by the torch.jit.script
207+
```

0 commit comments

Comments
 (0)