-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VTA] Support TLPP in function simulator. #3555
Conversation
Thanks for the contribution, some general style nits, we use Google C style that means CamelCase for functions, 2-space indent. etc. We might also need someone who is familiar with multi-threading to review the specific logics. Personally, I think we don't need to use multi-threading, but instead having an event driven queue that schedules the events, and insert random interleaves might be fine(and also makes behavior reproducible) |
@tqchen , thanks for the kindly follow up, about the style issue yes i would address it. |
Hey @huajsj Thanks for the contrib, here is my take on this. Multi-threading support for functional simulator will only provide simulation-performance benefit or simulation time will be short. However, if you want to use this infrastructure to have an idea how thread-level-parallelism would work then you got it wrong. The reason is that you are still assuming that Load, Compute and Store are executing on an infinite-clock-speed (functional simulator) but in reality each block/instructions will have different timing based on many factors. Therefore, having multiple threads executing at the same time but finishing early (infinite-clock-speed) won't help much in terms of analysis. |
Hi @vegaluisjose , Thanks for the comments, the most advantage of MT simulator with TLPP support is to decouple VTA software developing and hardware developing to make them be parallel, before without MT TLPP support in simulator, to verify a VTA change that need both hardware and software ready, and debug also cross hardware and software and difficult for narrow down, performance is a possible benefit but not the main target. about the parallel analysis, I agree MT simulator can not help in clock level analysis, but besides of that we also need to verify the synchronization logic between parallel load/compute/store, for this part a MT simulator would can help. a idea scenario is first using function simulator verify vta run time change and finish software part logic independently, then(or parallel) working on Emulator(TSIM) or FPGA to finish hardware change , after software and hardware ready then do the integrate. |
@tqchen @tmoreau89 @vegaluisjose , the code style issue already addressed, could you please help to review this patch? thanks. |
Thank you @huajsj for the contribution! I don't fully understand how this change helps "decoupling VTA software developing and hardware development", do you mind further elaborating? Re: "verify the synchronization logic between parallel load/compute/store" I'm not fully convinced that this will help us catch bugs, can you provide an example where the old simulator would not catch a synchronization bug that this simulator would catch? |
Do you mind running some simple benchmarks (say the resnet example under vta/tutorials/frontend) and time the performance of the old simulator vs. this parallel implementation? |
Hi @tmoreau89, Regards |
@tmoreau89 , just tried resnet example, first run it on original TVM(no any change), but get some error 'NNVMError: KeyError: 'tile_ic'', seems like this a new issue, not experience such issue weeks ago, do I need to wait this issue get fixed, or any work ground can help? |
old nnvm reset example still work , run it with old simulator and new one , get following data, the old simulator is faster. |
Thanks for the reply; it seems like the runtime degradation is quite a bit higher. Can we find a middleground of catching synchronization bugs as you did earlier on without having to increase complexity due to the multi-threaded nature of this simulator implementation? |
@tmoreau89 , unfortunate there is no other way can help to trouble shooting such sync bug in software level,without TLPP logic in simulator that actually should be there default. what I did before is to review code and debug in hardware, but that is not a good experience and very consume time. some time overhead should be make sense because time that coming from old simulator actually is not real( older simulator not check the dependency of instructions). in this patch I provided a configuration to disable simtlpp when user more concern time instead of VTA run time logic verification, do you think that may help? |
I do think that the multi-threading can be de-coupled with the event driven and async flow. We could have the event queues but drive them in a single thread(or optionally de-couple to multi-thread). Note that the even queue(with fixed random numbers) might gives reproduciblility, which is a plus over multi-threading |
@tqchen , thanks for the follow up, seems like a single thread solution is more prefer, yes i can work for a solution with ST support, beside of that , is there any concern about "time performance"? after adding additional TLPP logic , for sure time would increase, with remove MT such time may increase more than MT solution, could I know how you think about time performance? |
I think as long as we don't have order of magnitude time decrease it should be fine. I also doubt if event driven code will slows things down, depends on how we implement it it might get similar perf if the execution of the logic is the bottleneck. |
Thank you for clarifying @huajsj ; I do see utility in your contribution now; what it asserts is that if a dependence is not enforced properly (say a deadlock, or starvation), it will be caught by this new simulator rather than run correctly in simulation; then hang in hardware. What you are striving for is catching errors that would occur in hardware during simulation, correct? |
One thing I'd argue (and I believe @tqchen and @vegaluisjose agree) is that MT is not necessary to ensure correctness of dependence mechanmisms. We can use some type of event driven loop; it might even improve performance, and reduce complexity of the code (MT code is hard to debug). |
@tmoreau89 , yes, to catch errors early in simulator instead of doing trouble shooting in hardware is what this patch try to do, thanks for the more clearly explain. |
@tqchen , @tmoreau89, about ST mode, yes I would work for a ST mode change and update later, thanks for the follow up and comments. |
@tqchen @vegaluisjose @tmoreau89 , the TLPP single thread change already done, please help to review once you have time, following are the performance number. |
Thank you for the update, @huajsj ; the performance degradation looks a lot more tolerable |
I made a few small comments about naming the functions and using consistent scheme. For guidance please follow: https://google.github.io/styleguide/cppguide.html |
@tmoreau89 , thanks for the help, the style and cmake commets already get addressed, please kindly let me know if you have any other comments. |
@huajsj it appears that you might have merged recent commits into this PR rather than perform a |
Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments.
@tmoreau89 , thanks for the kindly follow up, just did rebase and issue fixed, now the PR is clean. |
@huajsj I was thinking, is there any reason why we would ever want to turn |
|
Great thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update
I'll go ahead and merge the changes |
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
@@ -0,0 +1,162 @@ | |||
/* | |||
* Licensed to the Apache Software Foundation (ASF) under one | |||
* or more contributor license agreements. See the NOTICE file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, given this interface is not part of public interface used by the user, let us move it to the internal header, as opposed to keep it in the include folder
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
* [VTA] Support TLPP in function simulator. Issue: currently vta function simulator just doing serialized instruction execution, the dependency logic of runtime ISA which use for task level pipe line parallelism can not get verified by function simulator. Solution: make the simulator driver to be multiple thread and support TLPP. Benefit: TLPP support VTA function simulator would make VTA logic testing/debug /change more easy. replace boost lockfree queue add configure control for simulator tlpp enable or disable. change code tyle into google style. Wrap queue read/write and sync logic to make function call more simple. Add some comments. Remove MT logic, change into Single thread mode. address review comments. code style change to match google code style and add comments. add cmake macro to enable/disable simulator tlpp logic. submodule update. correct file name mentioned in comments. * remove USE_VTA_FSIM_TLPP.
Issue:
currently vta function simulator just doing serialized instruction
execution, the dependency logic of runtime ISA which use for task
level pipe line parallelism can not get verified by function simulator.
Solution:
make the simulator driver to be multiple thread and support TLPP.
Benefit:
TLPP support VTA function simulator would make VTA logic testing/debug
/change more easy.