-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[microTVM][ARM] Keep microtvm testing only in QEMU Image #11809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d9b08dc to
9b789b4
Compare
areusch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's see what @u99127 @Mousius @manupa-arm have to say here first. i think this is mainly to make it operationally easier for us to ensure the microTVM upstream tests use the same deps as we install in Reference VM. i think that as we get towards supporting tvmc in a release and doing further testing with microTVM against tvmc in its own venv with e.g. Zephyr installed separately, we could probs focus on that in place of Reference VM. anyway, the ci_ images remain the test authority, so this is again mainly an operational convenience for folks using Reference VM as we do to test on-device at OctoML.
| COPY install/ubuntu_install_caffe.sh /install/ubuntu_install_caffe.sh | ||
| RUN bash /install/ubuntu_install_caffe.sh | ||
|
|
||
| # Github Arm(R) Ethos(TM)-N NPU driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this one should stay in ci_cpu, it's unrelated to microTVM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct and this is used for running this command:
run_pytest ctypes ${TVM_INTEGRATION_TESTSUITE_NAME}-contrib-test_ethosu tests/python/contrib/test_ethosu -n auto
I suggest we also move this to ci_qemu to have all microtvm tests in one place.
|
In order to reason about this, it would be great to understand why do we maintain two seperate images for microTVM and TVM, especially when QEMU is a software that runs on the cpu. (Sorry I might've missed the discussion why went on to create to two seperate images) In an alternate world, we could consolidate all testing dependencies to ci_cpu and keep TVM and microTVM aligned in-terms of testing environment, unless we need conflicting dependencies in the testing for TVM vs microTVM not just additive ones. |
|
hmm yeah so here's what i know about the history here:
|
|
Adding to Andrew's point, I think we can remove |
I support this approach as well, having two smaller containers with a specific responsibility rather than one monolithic container should make it easier to manage in the long term as the needs evolve. The only downside is the typical container hopping for development, though I think we can figure out how to get that to work also. Minor note, I think |
|
I agree that having modular (smaller) container aids the ease of reproducability from a developer PoV. I'm worried with this approach is that this creates room for set of diveregent dependencies to be introduced between microTVM vs TVM (the deps that are not specific to micro), especially when they are not seperate projects. Right now, only thing that keeps the common deps align is on the basis that we use the same helper scripts (ubuntu_install*.sh) to construct the image. E.g. : This could open up the possibility to advance the version of tensorflow supported by TVM project while microTVM lagging behind because there will now be tested in two environments. Maybe we could create a base container with such common deps to ensure such divergence are harder to do ? (and extend that image into ci_cpu and ci_qemu) |
|
cc : @leandron |
|
Looking at the alternatives being proposed until now, I feel that for our current type of usage makes more sense to unify
Given our current Docker update practices that don't require updating all images at once, I feel we better to consolidate the dependencies in one single image until we can devise better rules for Docker updating. |
|
i think while i'm sympathetic to reducing the overall image disk size, in practice:
so i don't think we need to make that a priority in this decision. i think then that it comes down to what's operationally easier, and i think that's the POV @Mousius and @manupa-arm were advocating. i tend to agree here--another way that we could get into trouble with consolidating cc @driazati @konturn in case they have other thoughts from a CI perspective. |
Provided the images have the same initial layers (base image or copy paste both should work), the storage cost should be the same as it'll re-use the layers when pulling the containers - if a node already had
As far as I can tell we'd build Therefore, apart from a small network up/down to propagate the image, it should be a similar time to build as we have now? Sorry if I missed something here @leandron? |
Actually, I was arguing that we should avoid the need of conflicting packages as micro is another backend of TVM. Such a divergence is worrying for a compiler that is aiming to support both micro and non-micro compilation flows. NOTE : I do acknowledge micro and non-micro have different requirement for dependencies but the common ones should not diverge IMO. |
|
I also support keeping the images focused on their actual use instead of everything under the sun. With #11768 it should be less common that we're using different tags for the images (everything should be built and updated at once if we follow that process, though it is possible still to do manual updates) and we can avoid dependency drift by being careful in reviewing Docker image changes. Docker image size for us is a problem (more than storage) that makes CI slower and harder to reproduce, so they should contain only what they need to run their specific tests. |
|
I'm in favor of having a base image with common dependencies to eliminate human errors in build process for each docker image, but even with that people might add any script to any of the docker files. So as a reviewer we need to keep an eye on changes on those files. Also I think there shouldn't be any divergence between TVM and microTVM dependencies and I would like to second @manupa-arm's point on this. |
|
Thanks all for the discussion!.
If we are solely relying on the reviewers, we need would need better written guidance in an OSS project to aid both reviewers (to cite) and authors (to follow), on what could be added to : ci_base (if we are taking this approach), ci_micro and ci_cpu.
I suppose how ci_base would help here is that we can make the above guidance to be simplified to say all the layers in the image added by ci_micro Dockerfile has to be micro specific. |
|
@manupa-arm I agree with you. Both better documentation and having a ci_base image which has common dependencies are great ideas. |
|
i think most of the discussion here has aligned around purpose-built CI images. @leandron are your concerns assuaged? regarding I do want to note that we also mitigate drift in other was e.g. via common install scripts and (soon) via locked python dependencies (this more addresses the example of tensorflow divergence). however, i acknowledge the drawback that this is merely convention and dependent on reviewers. for verticals like microTVM, i think a separate image makes sense. for e.g. backends, it's less clear--what if you want to test cuda with paddlepaddle? there is no such paddlepaddle dep for arm64, so it can't go in the base image. does this mean base images need to be arch-specific--if so, where's the line (e.g. what if it's available for 20.04 but not 18.04)? so far as action items, i think we should capture the need of either a common base image or documentation into a GH issue. we should also update our docs accordingly. would love to hear more thoughts @Mousius @manupa-arm @leandron @driazati |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im just suggesting a mandatory change orthogonal to the conversation.
| ENV CMSIS_PATH=/opt/arm/ethosu/cmsis/ | ||
|
|
||
| # Arm(R) Ethos(TM)-U NPU driver | ||
| COPY install/ubuntu_install_ethosu_driver_stack.sh /install/ubuntu_install_ethosu_driver_stack.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_pytest ctypes ${TVM_INTEGRATION_TESTSUITE_NAME}-contrib-test_ethosu tests/python/contrib/test_ethosu -n auto
we need this line moved alongside as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should move that after we updated the qemu image with the dependencies for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update this PR by adding Arm(R) Ethos(TM)-U NPU driver to the qemu and merge this. Then we will update the qemu image and send another PR to remove test_ethosu and move it to qemu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry we already have that. I meant Github Arm(R) Ethos(TM)-N NPU driver, I believe that is required, correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it in qemu, looks fine. So I moved it
|
@manupa-arm can you take a look again? |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! looking good.
It would be great to capture the steps discussed and agreed in the issue (as per @areusch) and link to the PR.
@ashutosh-arm could you take a look as well from cmsis tests, whether they would be all run in ci_qemu now ?
afaik CMSIS-NN tests do not run on ci_qemu at present. In this code change, I don't see CMSIS-NN tests moved to tests/scripts/task_python_microtvm.sh. I am referring to
|
|
I see. Yes none of the contrib tests are run (that is how cmsis tests are run today in ci_cpu). |
|
@ashutosh-arm thanks for catching this. I moved test_cmsisnn tests to qemu, there's also test_ethosn which we should move, but we need to update qemu image for that. |
|
@mehrdadh test_ethosn (unlike test_ethosu) is not related to microTVM. |
|
@manupa-arm thanks for clarification! Then I think this PR is ready, I just need to figure out the timeout issue. |
|
just musing about naming--previously we called this image
|
|
@areusch I like to change the name, but I think ci_arm_cortexm is too specific. We could have riscv support with Zephyr in future which I assume the testing is gonna be part of this image. Then it would be confusing. I think ci_micro works better here. wdyt? |
I'd suggest that ci_arm is renamed to ci_aarch64 . Ramana |
|
@u99127 i'm fine with that, should we then |
0cf89c7 to
74de280
Compare
|
@manupa-arm please take another look when you have a minute |
|
just rebased with main since https://github.com/apache/tvm/pull/12258/files is merged, timeout issue should be resolved. |
|
@manupa-arm friendly reminder about this PR. thanks! |
manupak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM!
* Move scripts * Address comments * move ethosu tests * move cmsisnn tests to qemu
This would prevent rebuilding multiple image on each update on CMSIS, ethosu, etc.
cc @Mousius @alanmacd @areusch @gromero @leandron