Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可视化GPT2的时候出了问题 #5

Open
liborui opened this issue Aug 3, 2023 · 7 comments
Open

可视化GPT2的时候出了问题 #5

liborui opened this issue Aug 3, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@liborui
Copy link

liborui commented Aug 3, 2023

您好!
我在使用这个仓库来对TVM进行可视化,但是我在可视化GPT2模型时出了问题。
首先在我产生了pass前和paas后的文件后,使用

python ../main.py -bp relay_ir/gpt2_fo_bp.txt -ap relay_ir/gpt2_fo_ap.txt -sn gpt

来生成图,观测到了错误:

Error: visu_gpt_relay_ir: syntax error in line 899 near '.'                                                                                  
Traceback (most recent call last):
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/backend/execute.py", line 91, in run_check
    proc.check_returncode()                                        
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/subprocess.py", line 444, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg', '-O', 'visu_gpt_relay_ir']' returned non-zero exit status 1.
                                                                      
During handling of the above exception, another exception occurred:                                                                          
                                                                      
Traceback (most recent call last):                                                                                                           
  File "../main.py", line 37, in <module>           
    visu_relay_ir(before_pass, after_pass, save_name, with_info)
  File "/home/emnets/MLFaas/VisuTVM/utils.py", line 38, in visu_relay_ir
    g.codegen()                                                                                                                                                                                                                                                                            
  File "/home/emnets/MLFaas/VisuTVM/visu_tvm.py", line 117, in codegen     
    graph.render(filename=self.save_name, format='svg', cleanup=True)
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/_tools.py", line 171, in wrapper
    return func(*args, **kwargs)                                   
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/rendering.py", line 122, in render
    rendered = self._render(*args, **kwargs)
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/_tools.py", line 171, in wrapper
    return func(*args, **kwargs)
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/backend/rendering.py", line 327, in render
    capture_output=True)       
  File "/home/emnets/anaconda3/envs/tvm-build/lib/python3.7/site-packages/graphviz/backend/execute.py", line 93, in run_check
    raise CalledProcessError(*e.args)               
graphviz.backend.execute.CalledProcessError: Command '[PosixPath('dot'), '-Kdot', '-Tsvg', '-O', 'visu_gpt_relay_ir']' returned non-zero exit status 1. [stderr: b"Error: visu_gpt_relay_ir: syntax error in line 899 near '.'\n"]

我进一步去所产生的GraphViz源文件看,看到有一些节点在定义的时候显示"%aten::addmm_0.bias",但是在定义点之间的关联关系时变成了这样子"%aten":"":addmm_17.bias -> "%351" [label=""],这样肯定无法被GraphViz读取并画图。
因此我在想更新这个小bug,方便的话请您指导一下。
附上产生的GraphViz源文件visu_gpt_relay_ir、优化前后的txtgpt2_fo_bp.txt``gpt2_fo_ap.txt和我用于优化模型的代码from_pytorch.py.viz.zip
谢谢!

@xiayouran
Copy link
Owner

您好,已收到信息,我检查了一下这个模型,您可以将模型的输入更改一下吗
random_tokens = torch.randint(10000, (1, 5))改成这样或者更大一些(1,100)这样的,然后再将您的优化前后的txt发给我😅

@liborui
Copy link
Author

liborui commented Aug 4, 2023

new.zip
您好!感谢您的回复。
我将输入调大了:random_tokens = torch.randint(10000, (100,))
Pass前后的IR与生成代码附上。谢谢!

@xiayouran
Copy link
Owner

输入改成二维的哈😄

@liborui
Copy link
Author

liborui commented Aug 4, 2023

输入改成二维的哈😄

请问这个对输入输出有影响吗?因为我想GPT2的模型,作为输入,一般都是一个一维向量?只不过是长度不同。😄

@xiayouran
Copy link
Owner

输入改成二维的哈😄

请问这个对输入输出有影响吗?因为我想GPT2的模型,作为输入,一般都是一个一维向量?只不过是长度不同。😄

一般都是个多维的,比如你输入一个句子,经过tokenizer之后会变成[1,n],n就是token数,而1就是batch-size

@liborui
Copy link
Author

liborui commented Aug 5, 2023

感谢您的热情回答!已经调整好了,附上Python文件和前后txt。new_2.zip
另外,一个题外话:

在Python文件中,之前您提供的案例,是直接使用run_opt_pass()来单纯只跑Pass,我想看看在不同全局opt_level的情况下输出出来究竟是什么样子,因此昨天本来采用了tvm.transform.Sequential()来跑一系列Pass模拟全局opt_level编译时的情况。
但是自己没能找到例如opt_level=3的时候的对应的Pass,且在使用tvm.transform.Sequential()时候,加入FuseOps(),的Pass,总是会提示

InternalError: Check failed: (idx < data_.size() && data_[idx].second != 0) is false: Attribute TOpPattern has not been registered for nn.layer_norm

因此今天改用了

with tvm.transform.PassContext(opt_level=3):
    mod_opt = relay.build(mod, target=target, params=params)

print(mod_opt.ir_mod)
# print('Relay IR after Pass:\n', mod_opt)
relay_ir2txt(mod_opt.ir_mod["main"], file_name, is_ap=True)

这样子完整的编译就不会出问题,看了一下IR文本,算子Fusion也成功了。不知道为啥单纯应用FuseOps就不行:(
特此说明一下代码的修改,希望不会影响您这边。

@xiayouran
Copy link
Owner

针对这个网络,发现了以下问题:

  1. %aten::slice_xxx.bias这种类型的输入参数解析异常
  2. graphviz有一个bug,标识符中含有::会出现解析异常,也就是会将标识符给分开,出现"%aten":"":addmm_33.weight -> "%633",最终可视化失败,这个bug的详情信息:Add support for literal colons in node name (WAS: escaping of ::) xflr6/graphviz#53

针对上述两个问题,在本次提交时已经修复,您可以使用这条指令运行并获得可视化的模型结构图:
python main.py -ri fix/new_2/gpt2_fo_bp.txt -sn gpt2

image

如果要可视化优化后的网络结构图,您可能还需要去对应的可视化类中添加一些方法来解析当前算子(%aten::slice_xxx),比如正则表达式,相应的如果要加上tensor信息,可能也需要这样做,您可以尝试来优化此类问题。如有疑问可随时联系哦😄

@xiayouran xiayouran added the bug Something isn't working label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants