Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoroutineExecutorService is causing flakes in CI #1942

Closed
BenHenning opened this issue Oct 7, 2020 · 7 comments · Fixed by #1943
Closed

CoroutineExecutorService is causing flakes in CI #1942

BenHenning opened this issue Oct 7, 2020 · 7 comments · Fixed by #1943
Assignees
Labels
Hacktoberfest This is a suggested Hacktoberfest issue. Priority: Essential This work item must be completed for its milestone. Z-ibt Temporary label for Ben to keep track of issues he's triaged.

Comments

@BenHenning
Copy link
Member

See https://github.com/oppia/oppia-android/actions/runs/294030287 & others at https://github.com/oppia/oppia-android/actions. We're getting a bunch of failures due to CoroutineExecutorService failing to shutdown under some circumstances. This might be more obvious on faster machines, and I suspect that GitHub actions or the cloud infra it depends on may have had an updated configuration leading to tests running faster.

I can consistently trigger QuestionPlayerActivityTest ti fail when I run it inside the of the entire app module suite. The failure I see is:

java.lang.RuntimeException: Failed to shutdown

	at com.bumptech.glide.util.Executors.shutdownAndAwaitTermination(Executors.java:52)
	at com.bumptech.glide.load.engine.Engine$EngineJobFactory.shutdown(Engine.java:576)
	at com.bumptech.glide.load.engine.Engine.shutdown(Engine.java:408)
	at com.bumptech.glide.Glide.tearDown(Glide.java:234)
	at com.bumptech.glide.Glide.init(Glide.java:224)
	at org.oppia.android.app.topic.questionplayer.QuestionPlayerActivityTest.setUp(QuestionPlayerActivityTest.kt:131)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.robolectric.RobolectricTestRunner$HelperTestRunner$1.evaluate(RobolectricTestRunner.java:546)
	at org.robolectric.internal.SandboxTestRunner$2.lambda$evaluate$0(SandboxTestRunner.java:252)
	at org.robolectric.internal.bytecode.Sandbox.lambda$runOnMainThread$0(Sandbox.java:89)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm not yet sure why other tests sometimes fail, too.

@BenHenning BenHenning added Type: Bug Priority: Essential This work item must be completed for its milestone. Hacktoberfest This is a suggested Hacktoberfest issue. labels Oct 7, 2020
@BenHenning BenHenning added this to the Beta milestone Oct 7, 2020
@BenHenning BenHenning self-assigned this Oct 7, 2020
@BenHenning
Copy link
Member Author

What I'm currently unsure of is whether this is specifically an issue with the executor service or the environment. It's possible to deadlock CoroutineExecutorService if it has tasks that require test coroutine disaptchers to run--shutdown will always fail with a timeout. This is a known limitation, but it shouldn't happen in these cases since it seems to be occurring when setting up Glide for the first time (resetting the old instance of Glide shouldn't be using CoroutineExecutorService since tests are supposed to run in a new process, and we haven't yet replaced the executor services).

@BenHenning
Copy link
Member Author

BenHenning commented Oct 7, 2020

Also, it seems like if one test triggers this all of the tests in the suite will trigger it. A local run of tests led to this being triggered for 2/3 test suites that rely on MockGlideExecutor: StateFragmentTest and StateFragmentLocalTest (not QuestionPlayerActivityTest) (leading to >15 minutes for the whole module suite to run since each individual test failure is ~10 seconds).

@BenHenning
Copy link
Member Author

Aha, I think Gradle might be doing the wrong thing here. After my previous run where most of the state fragment tests failed with a similar error to the above:

org.oppia.android.app.player.state.StateFragmentTest > testStateFragment_loadExp_secondState_submitInvalidAnswer_disablesSubmitAndShowsError FAILED
    java.lang.RuntimeException: Failed to shutdown
        at com.bumptech.glide.util.Executors.shutdownAndAwaitTermination(Executors.java:52)
        at com.bumptech.glide.load.engine.Engine$EngineJobFactory.shutdown(Engine.java:576)
        at com.bumptech.glide.load.engine.Engine.shutdown(Engine.java:408)
        at com.bumptech.glide.Glide.tearDown(Glide.java:234)
        at com.bumptech.glide.Glide.init(Glide.java:224)
        at org.oppia.android.app.player.state.StateFragmentTest.setUp(StateFragmentTest.kt:182)

I also noticed a failure in ExplorationActivityTest:

org.oppia.android.app.player.exploration.ExplorationActivityTest > testAudioWithWifi_openFractionsExploration_changeLanguage_clickNext_checkLanguageIsHinglish FAILED
    java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl$RobolectricUiTaskCoordinator.idle$testing_debug(TestCoroutineDispatchersRobolectricImpl.kt:138)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl.flushNextTasks(TestCoroutineDispatchersRobolectricImpl.kt:91)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl.runCurrent(TestCoroutineDispatchersRobolectricImpl.kt:32)
        at org.oppia.android.app.player.exploration.ExplorationActivityTest.testAudioWithWifi_openFractionsExploration_changeLanguage_clickNext_checkLanguageIsHinglish(ExplorationActivityTest.kt:572)

        Caused by:
        java.util.concurrent.RejectedExecutionException
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:271)
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:266)
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:262)
            at org.oppia.android.testing.CoroutineExecutorService.execute(CoroutineExecutorService.kt:231)
            at com.bumptech.glide.load.engine.executor.GlideExecutor.execute(GlideExecutor.java:220)
            at com.bumptech.glide.load.engine.EngineJob.start(EngineJob.java:131)
            at com.bumptech.glide.load.engine.Engine.waitForExistingOrStartNewJob(Engine.java:286)
            at com.bumptech.glide.load.engine.Engine.load(Engine.java:193)
            at com.bumptech.glide.request.SingleRequest.onSizeReady(SingleRequest.java:449)
            at com.bumptech.glide.request.target.CustomTarget.getSize(CustomTarget.java:107)
            at com.bumptech.glide.request.SingleRequest.begin(SingleRequest.java:251)
            at com.bumptech.glide.manager.RequestTracker.runRequest(RequestTracker.java:43)
            at com.bumptech.glide.RequestManager.track(RequestManager.java:674)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:641)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:608)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:599)
            at org.oppia.android.util.parser.GlideImageLoader.intoTarget(GlideImageLoader.kt:55)
            at org.oppia.android.util.parser.GlideImageLoader.loadBitmap(GlideImageLoader.kt:32)
            at org.oppia.android.util.parser.UrlImageParser.getDrawable(UrlImageParser.kt:50)
            at android.text.HtmlToSpannedConverter.startImg(Html.java:1122)
            at android.text.HtmlToSpannedConverter.handleStartTag(Html.java:835)
            at android.text.HtmlToSpannedConverter.startElement(Html.java:1209)
            at org.oppia.android.util.parser.CustomHtmlContentHandler.startElement(CustomHtmlContentHandler.kt:55)
            at org.ccil.cowan.tagsoup.Parser.push(Parser.java:795)
            at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1062)
            at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1017)
            at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:624)
            at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:450)
            at android.text.HtmlToSpannedConverter.convert(Html.java:746)
            at android.text.Html.fromHtml(Html.java:242)
            at androidx.core.text.HtmlCompat.fromHtml(HtmlCompat.java:162)
            at org.oppia.android.util.parser.CustomHtmlContentHandler$Companion.fromHtml(CustomHtmlContentHandler.kt:168)
            at org.oppia.android.util.parser.HtmlParser.parseOppiaHtml(HtmlParser.kt:77)
            at org.oppia.android.app.player.state.StatePlayerRecyclerViewAssembler$Builder$addContentSupport$2.invoke(StatePlayerRecyclerViewAssembler.kt:800)
            at org.oppia.android.app.player.state.StatePlayerRecyclerViewAssembler$Builder$addContentSupport$2.invoke(StatePlayerRecyclerViewAssembler.kt:744)
            at org.oppia.android.app.recyclerview.BindableAdapter$MultiTypeBuilder$registerViewBinder$viewHolderFactory$1$1.bind$app_debug(BindableAdapter.kt:242)
            at org.oppia.android.app.recyclerview.BindableAdapter.onBindViewHolder(BindableAdapter.kt:86)
            at org.oppia.android.app.recyclerview.BindableAdapter.onBindViewHolder(BindableAdapter.kt:30)
            at androidx.recyclerview.widget.RecyclerView$Adapter.onBindViewHolder(RecyclerView.java:7065)
            at androidx.recyclerview.widget.RecyclerView$Adapter.bindViewHolder(RecyclerView.java:7107)
            at androidx.recyclerview.widget.RecyclerView$Recycler.tryBindViewHolderByDeadline(RecyclerView.java:6012)
            at androidx.recyclerview.widget.RecyclerView$Recycler.tryGetViewHolderForPositionByDeadline(RecyclerView.java:6279)
            at androidx.recyclerview.widget.RecyclerView$Recycler.getViewForPosition(RecyclerView.java:6118)
            at androidx.recyclerview.widget.RecyclerView$Recycler.getViewForPosition(RecyclerView.java:6114)
            at androidx.recyclerview.widget.LinearLayoutManager$LayoutState.next(LinearLayoutManager.java:2303)
            at androidx.recyclerview.widget.LinearLayoutManager.layoutChunk(LinearLayoutManager.java:1627)
            at androidx.recyclerview.widget.LinearLayoutManager.fill(LinearLayoutManager.java:1587)
            at androidx.recyclerview.widget.LinearLayoutManager.onLayoutChildren(LinearLayoutManager.java:665)
            at androidx.recyclerview.widget.RecyclerView.dispatchLayoutStep2(RecyclerView.java:4134)
            at androidx.recyclerview.widget.RecyclerView.dispatchLayout(RecyclerView.java:3851)
            at androidx.recyclerview.widget.RecyclerView.onLayout(RecyclerView.java:4404)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at androidx.constraintlayout.widget.ConstraintLayout.onLayout(ConstraintLayout.java:1915)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at androidx.constraintlayout.widget.ConstraintLayout.onLayout(ConstraintLayout.java:1915)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at com.android.internal.policy.DecorView.onLayout(DecorView.java:753)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.view.ViewRootImpl.performLayout(ViewRootImpl.java:2792)
            at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2319)
            at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1460)
            at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7183)
            at android.view.Choreographer$CallbackRecord.run(Choreographer.java:949)
            at android.view.Choreographer.doCallbacks(Choreographer.java:761)
            at android.view.Choreographer.doFrame(Choreographer.java:696)
            at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:935)
            at android.os.Handler.handleCallback(Handler.java:873)
            at android.os.Handler.dispatchMessage(Handler.java:99)
            at org.robolectric.shadows.ShadowPausedLooper$IdlingRunnable.run(ShadowPausedLooper.java:308)
            at org.robolectric.shadows.ShadowPausedLooper.executeOnLooper(ShadowPausedLooper.java:273)
            at org.robolectric.shadows.ShadowPausedLooper.idle(ShadowPausedLooper.java:85)
            ... 4 more

This is particularly noteworthy because ExplorationActivityTest does NOT set up a mock Glide executor. This seems to strongly suggest that Gradle is not actually using multiple processes for each test, and the shared Glide state + lack of test coroutine synchronization on test exit is likely to trigger a deadlock when shutting down the previous version of Glide.

While we should be making sure all tasks complete on test tear-down, we should also be using multiple processes for each test for proper state separation.

@BenHenning
Copy link
Member Author

BenHenning commented Oct 7, 2020

Gradle documentation is a bit contradictory here. Per https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:maxParallelForks it indicates that a single test is run in a single process at a time by default, but https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:forkEvery mentions that the same test process is reused for all test classes.

I think what the documentation actually means is that only 1 process will be created (maxParallelForks = 1), and it will be reused everywhere (forkEvery = 0). However, when changing forkEvery to 1 locally, it doesn't actually fix the underlying issue. This might be a configuration issue with Java vs. Android tests.

@BenHenning
Copy link
Member Author

Another issue I'm running into: Android Studio doesn't seem to respect Gradle settings for fork & parallelization. Users need to manually set the fork mode within the IDE in order to get the correct results when running the full suite of app module tests. However, running the suite with class forking seems to fix the underlying issues.

Really curious why this is suddenly becoming an issue on actions now.

@BenHenning
Copy link
Member Author

BenHenning commented Oct 7, 2020

Note that turning on forking significantly slows down Gradle tests, probably because Robolectric needs to be re-setup each time. This is still more correct, though, and explains why Bazel tests take a lot longer to run.

Turning on parallel runs helps a lot (for my machine with ~6 cores it brings it to about the same runtime as before), but for machines with less parallelization (e.g. Github Actions machines) it will probably significantly lengthen the test runtime.

@BenHenning
Copy link
Member Author

So I can't seem to get parallel runs to work in Android Studio's runner UI, so anyone using the UI to batch run tests is going to have a painful time. The CLI does parallelize correctly and runs significantly faster. Individual test runs from within Android Studio should be unaffected.

BenHenning added a commit that referenced this issue Oct 8, 2020
…st processes (#1943)

* Enable full stack traces for all Robolectric tests

* Show stack traces for failing Robolectric tests.

* Introduce class-level forking & enable test parallelization.
@BenHenning BenHenning added the Z-ibt Temporary label for Ben to keep track of issues he's triaged. label Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hacktoberfest This is a suggested Hacktoberfest issue. Priority: Essential This work item must be completed for its milestone. Z-ibt Temporary label for Ben to keep track of issues he's triaged.
1 participant