Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VTA] Support TLPP in function simulator. #3555

Merged
merged 2 commits into from
Sep 7, 2019
Merged

[VTA] Support TLPP in function simulator. #3555

merged 2 commits into from
Sep 7, 2019

Conversation

huajsj
Copy link
Contributor

@huajsj huajsj commented Jul 17, 2019

Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

@tqchen
Copy link
Member

tqchen commented Jul 18, 2019

Thanks for the contribution, some general style nits, we use Google C style that means CamelCase for functions, 2-space indent. etc. We might also need someone who is familiar with multi-threading to review the specific logics.

Personally, I think we don't need to use multi-threading, but instead having an event driven queue that schedules the events, and insert random interleaves might be fine(and also makes behavior reproducible)

@huajsj
Copy link
Contributor Author

huajsj commented Jul 19, 2019

Thanks for the contribution, some general style nits, we use Google C style that means CamelCase for functions, 2-space indent. etc. We might also need someone who is familiar with multi-threading to review the specific logics.

Personally, I think we don't need to use multi-threading, but instead having an event driven queue that schedules the events, and insert random interleaves might be fine(and also makes behavior reproducible)

@tqchen , thanks for the kindly follow up, about the style issue yes i would address it.
about the multi-thread concern, the MT logic is very less and most related start/stop the thread, the cross thread synchronization totally depend the TLPP dependency and actually we not add much additional complexity to simulator, even we move to "event queue" solution, the most logic should be same, but we would lost the real parallel simulation.
another reason for MT is scale, we ever did some try to scale vta in house but found actually no good tool to verify these changes, TSIM is a option, but it is much complex for software engineer to do change, and function simulator looks more friendly, a MT simulator should be a good start to handle such scale requirement because by launch more thread we can easily simulate multiple gemm/load/store core.
please kindly let me know how you think.

@vegaluisjose
Copy link
Member

Hey @huajsj

Thanks for the contrib, here is my take on this.

Multi-threading support for functional simulator will only provide simulation-performance benefit or simulation time will be short.

However, if you want to use this infrastructure to have an idea how thread-level-parallelism would work then you got it wrong. The reason is that you are still assuming that Load, Compute and Store are executing on an infinite-clock-speed (functional simulator) but in reality each block/instructions will have different timing based on many factors. Therefore, having multiple threads executing at the same time but finishing early (infinite-clock-speed) won't help much in terms of analysis.

@huajsj
Copy link
Contributor Author

huajsj commented Jul 19, 2019

Hi @vegaluisjose ,

Thanks for the comments, the most advantage of MT simulator with TLPP support is to decouple VTA software developing and hardware developing to make them be parallel, before without MT TLPP support in simulator, to verify a VTA change that need both hardware and software ready, and debug also cross hardware and software and difficult for narrow down, performance is a possible benefit but not the main target.

about the parallel analysis, I agree MT simulator can not help in clock level analysis, but besides of that we also need to verify the synchronization logic between parallel load/compute/store, for this part a MT simulator would can help.

a idea scenario is first using function simulator verify vta run time change and finish software part logic independently, then(or parallel) working on Emulator(TSIM) or FPGA to finish hardware change , after software and hardware ready then do the integrate.

@huajsj
Copy link
Contributor Author

huajsj commented Jul 22, 2019

@tqchen @tmoreau89 @vegaluisjose , the code style issue already addressed, could you please help to review this patch? thanks.

@tmoreau89
Copy link
Contributor

Thank you @huajsj for the contribution!

I don't fully understand how this change helps "decoupling VTA software developing and hardware development", do you mind further elaborating?

Re: "verify the synchronization logic between parallel load/compute/store" I'm not fully convinced that this will help us catch bugs, can you provide an example where the old simulator would not catch a synchronization bug that this simulator would catch?

@tmoreau89
Copy link
Contributor

Do you mind running some simple benchmarks (say the resnet example under vta/tutorials/frontend) and time the performance of the old simulator vs. this parallel implementation?

@huajsj
Copy link
Contributor Author

huajsj commented Jul 22, 2019

Thank you @huajsj for the contribution!

I don't fully understand how this change helps "decoupling VTA software developing and hardware development", do you mind further elaborating?

Re: "verify the synchronization logic between parallel load/compute/store" I'm not fully convinced that this will help us catch bugs, can you provide an example where the old simulator would not catch a synchronization bug that this simulator would catch?

Hi @tmoreau89,
Thanks for the follow up, about catching bugs and decouple HW SW developing help, I can explain that from my experience, I ever did a vta change to let compute module do the 'weight' loading instead of let load module do that, doing so means we need to change the VTA runtime logic and hardware change, I first did the change in VTA runtime . I run it on simulator and every thing is work, then I changed hardware and i put new hardware with this new runtime together, the hardware stuck, the problem trouble shooting very consume time ,first bitstream build time is very long, secondly the hardware debug is not so straightforward, Finally I found the issue caused by a dependency setting issue and be VTA Runtime problem, but to find such root cause I need to spend lot of effort in hardware part for isolation, if that time simulator can do the synchronization(tlpp)/dependency check, I can find the run time problem without involve hardware and save time, that is what i means , software and hardware developing decouple and catch bug help.

Regards
Hua

@huajsj
Copy link
Contributor Author

huajsj commented Jul 22, 2019

Do you mind running some simple benchmarks (say the resnet example under vta/tutorials/frontend) and time the performance of the old simulator vs. this parallel implementation?

@tmoreau89 , just tried resnet example, first run it on original TVM(no any change), but get some error 'NNVMError: KeyError: 'tile_ic'', seems like this a new issue, not experience such issue weeks ago, do I need to wait this issue get fixed, or any work ground can help?

@huajsj
Copy link
Contributor Author

huajsj commented Jul 22, 2019

old nnvm reset example still work , run it with old simulator and new one , get following data, the old simulator is faster.
old: 1.48 seconds
new: 2.68 seconds

@tmoreau89
Copy link
Contributor

Thanks for the reply; it seems like the runtime degradation is quite a bit higher. Can we find a middleground of catching synchronization bugs as you did earlier on without having to increase complexity due to the multi-threaded nature of this simulator implementation?

@huajsj
Copy link
Contributor Author

huajsj commented Jul 23, 2019

@tmoreau89 , unfortunate there is no other way can help to trouble shooting such sync bug in software level,without TLPP logic in simulator that actually should be there default. what I did before is to review code and debug in hardware, but that is not a good experience and very consume time.

some time overhead should be make sense because time that coming from old simulator actually is not real( older simulator not check the dependency of instructions). in this patch I provided a configuration to disable simtlpp when user more concern time instead of VTA run time logic verification, do you think that may help?

@tqchen
Copy link
Member

tqchen commented Jul 23, 2019

I do think that the multi-threading can be de-coupled with the event driven and async flow. We could have the event queues but drive them in a single thread(or optionally de-couple to multi-thread).

Note that the even queue(with fixed random numbers) might gives reproduciblility, which is a plus over multi-threading

@huajsj
Copy link
Contributor Author

huajsj commented Jul 23, 2019

@tqchen , thanks for the follow up, seems like a single thread solution is more prefer, yes i can work for a solution with ST support, beside of that , is there any concern about "time performance"? after adding additional TLPP logic , for sure time would increase, with remove MT such time may increase more than MT solution, could I know how you think about time performance?

@tqchen
Copy link
Member

tqchen commented Jul 23, 2019

I think as long as we don't have order of magnitude time decrease it should be fine.

I also doubt if event driven code will slows things down, depends on how we implement it it might get similar perf if the execution of the logic is the bottleneck.

@tmoreau89
Copy link
Contributor

Thank you for clarifying @huajsj ; I do see utility in your contribution now; what it asserts is that if a dependence is not enforced properly (say a deadlock, or starvation), it will be caught by this new simulator rather than run correctly in simulation; then hang in hardware. What you are striving for is catching errors that would occur in hardware during simulation, correct?

@tmoreau89
Copy link
Contributor

One thing I'd argue (and I believe @tqchen and @vegaluisjose agree) is that MT is not necessary to ensure correctness of dependence mechanmisms. We can use some type of event driven loop; it might even improve performance, and reduce complexity of the code (MT code is hard to debug).

@huajsj
Copy link
Contributor Author

huajsj commented Jul 23, 2019

Thank you for clarifying @huajsj ; I do see utility in your contribution now; what it asserts is that if a dependence is not enforced properly (say a deadlock, or starvation), it will be caught by this new simulator rather than run correctly in simulation; then hang in hardware. What you are striving for is catching errors that would occur in hardware during simulation, correct?

@tmoreau89 , yes, to catch errors early in simulator instead of doing trouble shooting in hardware is what this patch try to do, thanks for the more clearly explain.

@huajsj
Copy link
Contributor Author

huajsj commented Jul 23, 2019

@tqchen , @tmoreau89, about ST mode, yes I would work for a ST mode change and update later, thanks for the follow up and comments.

@huajsj
Copy link
Contributor Author

huajsj commented Jul 26, 2019

@tqchen @vegaluisjose @tmoreau89 , the TLPP single thread change already done, please help to review once you have time, following are the performance number.
old simulator: 1.44 seconds
new simulator: 1.44 -1.46 seconds

@tmoreau89
Copy link
Contributor

Thank you for the update, @huajsj ; the performance degradation looks a lot more tolerable

vta/src/sim/sim_mt.cc Outdated Show resolved Hide resolved
vta/src/sim/sim_mt.cc Outdated Show resolved Hide resolved
@tmoreau89
Copy link
Contributor

I made a few small comments about naming the functions and using consistent scheme. For guidance please follow: https://google.github.io/styleguide/cppguide.html

@huajsj
Copy link
Contributor Author

huajsj commented Aug 23, 2019

@tmoreau89 , thanks for the help, the style and cmake commets already get addressed, please kindly let me know if you have any other comments.

@tmoreau89
Copy link
Contributor

@huajsj it appears that you might have merged recent commits into this PR rather than perform a git rebase operation. To keep PRs clean of unrelated changes, we strongly recommend performing rebase operations against TVM master.

Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.
@huajsj
Copy link
Contributor Author

huajsj commented Aug 26, 2019

@huajsj it appears that you might have merged recent commits into this PR rather than perform a git rebase operation. To keep PRs clean of unrelated changes, we strongly recommend performing rebase operations against TVM master.

@tmoreau89 , thanks for the kindly follow up, just did rebase and issue fixed, now the PR is clean.

@tmoreau89
Copy link
Contributor

@huajsj I was thinking, is there any reason why we would ever want to turn USE_VTA_FSIM_TLPP off? Perhaps we should just build the simulator with TLPP simulation on all the time, since it doesn't affect performance and it will catch more bugs than the current simulator.

@huajsj
Copy link
Contributor Author

huajsj commented Aug 29, 2019

@huajsj I was thinking, is there any reason why we would ever want to turn USE_VTA_FSIM_TLPP off? Perhaps we should just build the simulator with TLPP simulation on all the time, since it doesn't affect performance and it will catch more bugs than the current simulator.
@tmoreau89 , sure, the tlpp control macro USE_VTA_FSIM_TLPP already get removed.

@tmoreau89
Copy link
Contributor

Great thanks!

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update

@tmoreau89
Copy link
Contributor

I'll go ahead and merge the changes

@tmoreau89 tmoreau89 merged commit 50c4546 into apache:master Sep 7, 2019
MarisaKirisame pushed a commit to MarisaKirisame/tvm that referenced this pull request Sep 7, 2019
* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.

* remove USE_VTA_FSIM_TLPP.
@@ -0,0 +1,162 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, given this interface is not part of public interface used by the user, let us move it to the internal header, as opposed to keep it in the include folder

wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.

* remove USE_VTA_FSIM_TLPP.
wweic pushed a commit to wweic/tvm that referenced this pull request Sep 16, 2019
* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.

* remove USE_VTA_FSIM_TLPP.
wweic pushed a commit to neo-ai/tvm that referenced this pull request Sep 16, 2019
* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.

* remove USE_VTA_FSIM_TLPP.
tqchen pushed a commit to tqchen/tvm that referenced this pull request Mar 29, 2020
* [VTA] Support TLPP in function simulator.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.

Solution:
make the simulator driver to be multiple thread and support TLPP.

Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.

replace boost lockfree queue

add configure control for simulator tlpp enable or disable.

change code tyle into google style.

Wrap queue read/write and sync logic to make function call more simple.

Add some comments.

Remove MT logic, change into Single thread mode.

address review comments.

code style change to match google code style and add comments.

add cmake macro to enable/disable simulator tlpp logic.

submodule update.

correct file name mentioned in comments.

* remove USE_VTA_FSIM_TLPP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants