-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoroutineExecutorService is causing flakes in CI #1942
Comments
What I'm currently unsure of is whether this is specifically an issue with the executor service or the environment. It's possible to deadlock CoroutineExecutorService if it has tasks that require test coroutine disaptchers to run--shutdown will always fail with a timeout. This is a known limitation, but it shouldn't happen in these cases since it seems to be occurring when setting up Glide for the first time (resetting the old instance of Glide shouldn't be using CoroutineExecutorService since tests are supposed to run in a new process, and we haven't yet replaced the executor services). |
Also, it seems like if one test triggers this all of the tests in the suite will trigger it. A local run of tests led to this being triggered for 2/3 test suites that rely on MockGlideExecutor: StateFragmentTest and StateFragmentLocalTest (not QuestionPlayerActivityTest) (leading to >15 minutes for the whole module suite to run since each individual test failure is ~10 seconds). |
Aha, I think Gradle might be doing the wrong thing here. After my previous run where most of the state fragment tests failed with a similar error to the above:
I also noticed a failure in ExplorationActivityTest:
This is particularly noteworthy because ExplorationActivityTest does NOT set up a mock Glide executor. This seems to strongly suggest that Gradle is not actually using multiple processes for each test, and the shared Glide state + lack of test coroutine synchronization on test exit is likely to trigger a deadlock when shutting down the previous version of Glide. While we should be making sure all tasks complete on test tear-down, we should also be using multiple processes for each test for proper state separation. |
Gradle documentation is a bit contradictory here. Per https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:maxParallelForks it indicates that a single test is run in a single process at a time by default, but https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:forkEvery mentions that the same test process is reused for all test classes. I think what the documentation actually means is that only 1 process will be created (maxParallelForks = 1), and it will be reused everywhere (forkEvery = 0). However, when changing forkEvery to 1 locally, it doesn't actually fix the underlying issue. This might be a configuration issue with Java vs. Android tests. |
Another issue I'm running into: Android Studio doesn't seem to respect Gradle settings for fork & parallelization. Users need to manually set the fork mode within the IDE in order to get the correct results when running the full suite of app module tests. However, running the suite with class forking seems to fix the underlying issues. Really curious why this is suddenly becoming an issue on actions now. |
Note that turning on forking significantly slows down Gradle tests, probably because Robolectric needs to be re-setup each time. This is still more correct, though, and explains why Bazel tests take a lot longer to run. Turning on parallel runs helps a lot (for my machine with ~6 cores it brings it to about the same runtime as before), but for machines with less parallelization (e.g. Github Actions machines) it will probably significantly lengthen the test runtime. |
So I can't seem to get parallel runs to work in Android Studio's runner UI, so anyone using the UI to batch run tests is going to have a painful time. The CLI does parallelize correctly and runs significantly faster. Individual test runs from within Android Studio should be unaffected. |
…st processes (#1943) * Enable full stack traces for all Robolectric tests * Show stack traces for failing Robolectric tests. * Introduce class-level forking & enable test parallelization.
See https://github.com/oppia/oppia-android/actions/runs/294030287 & others at https://github.com/oppia/oppia-android/actions. We're getting a bunch of failures due to CoroutineExecutorService failing to shutdown under some circumstances. This might be more obvious on faster machines, and I suspect that GitHub actions or the cloud infra it depends on may have had an updated configuration leading to tests running faster.
I can consistently trigger QuestionPlayerActivityTest ti fail when I run it inside the of the entire app module suite. The failure I see is:
I'm not yet sure why other tests sometimes fail, too.
The text was updated successfully, but these errors were encountered: