CoroutineExecutorService is causing flakes in CI #1942

BenHenning · 2020-10-07T20:44:11Z

See https://github.com/oppia/oppia-android/actions/runs/294030287 & others at https://github.com/oppia/oppia-android/actions. We're getting a bunch of failures due to CoroutineExecutorService failing to shutdown under some circumstances. This might be more obvious on faster machines, and I suspect that GitHub actions or the cloud infra it depends on may have had an updated configuration leading to tests running faster.

I can consistently trigger QuestionPlayerActivityTest ti fail when I run it inside the of the entire app module suite. The failure I see is:

java.lang.RuntimeException: Failed to shutdown

	at com.bumptech.glide.util.Executors.shutdownAndAwaitTermination(Executors.java:52)
	at com.bumptech.glide.load.engine.Engine$EngineJobFactory.shutdown(Engine.java:576)
	at com.bumptech.glide.load.engine.Engine.shutdown(Engine.java:408)
	at com.bumptech.glide.Glide.tearDown(Glide.java:234)
	at com.bumptech.glide.Glide.init(Glide.java:224)
	at org.oppia.android.app.topic.questionplayer.QuestionPlayerActivityTest.setUp(QuestionPlayerActivityTest.kt:131)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.robolectric.RobolectricTestRunner$HelperTestRunner$1.evaluate(RobolectricTestRunner.java:546)
	at org.robolectric.internal.SandboxTestRunner$2.lambda$evaluate$0(SandboxTestRunner.java:252)
	at org.robolectric.internal.bytecode.Sandbox.lambda$runOnMainThread$0(Sandbox.java:89)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm not yet sure why other tests sometimes fail, too.

The text was updated successfully, but these errors were encountered:

BenHenning · 2020-10-07T20:46:29Z

What I'm currently unsure of is whether this is specifically an issue with the executor service or the environment. It's possible to deadlock CoroutineExecutorService if it has tasks that require test coroutine disaptchers to run--shutdown will always fail with a timeout. This is a known limitation, but it shouldn't happen in these cases since it seems to be occurring when setting up Glide for the first time (resetting the old instance of Glide shouldn't be using CoroutineExecutorService since tests are supposed to run in a new process, and we haven't yet replaced the executor services).

BenHenning · 2020-10-07T20:54:32Z

Also, it seems like if one test triggers this all of the tests in the suite will trigger it. A local run of tests led to this being triggered for 2/3 test suites that rely on MockGlideExecutor: StateFragmentTest and StateFragmentLocalTest (not QuestionPlayerActivityTest) (leading to >15 minutes for the whole module suite to run since each individual test failure is ~10 seconds).

BenHenning · 2020-10-07T20:59:56Z

Aha, I think Gradle might be doing the wrong thing here. After my previous run where most of the state fragment tests failed with a similar error to the above:

org.oppia.android.app.player.state.StateFragmentTest > testStateFragment_loadExp_secondState_submitInvalidAnswer_disablesSubmitAndShowsError FAILED
    java.lang.RuntimeException: Failed to shutdown
        at com.bumptech.glide.util.Executors.shutdownAndAwaitTermination(Executors.java:52)
        at com.bumptech.glide.load.engine.Engine$EngineJobFactory.shutdown(Engine.java:576)
        at com.bumptech.glide.load.engine.Engine.shutdown(Engine.java:408)
        at com.bumptech.glide.Glide.tearDown(Glide.java:234)
        at com.bumptech.glide.Glide.init(Glide.java:224)
        at org.oppia.android.app.player.state.StateFragmentTest.setUp(StateFragmentTest.kt:182)

I also noticed a failure in ExplorationActivityTest:

org.oppia.android.app.player.exploration.ExplorationActivityTest > testAudioWithWifi_openFractionsExploration_changeLanguage_clickNext_checkLanguageIsHinglish FAILED
    java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl$RobolectricUiTaskCoordinator.idle$testing_debug(TestCoroutineDispatchersRobolectricImpl.kt:138)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl.flushNextTasks(TestCoroutineDispatchersRobolectricImpl.kt:91)
        at org.oppia.android.testing.TestCoroutineDispatchersRobolectricImpl.runCurrent(TestCoroutineDispatchersRobolectricImpl.kt:32)
        at org.oppia.android.app.player.exploration.ExplorationActivityTest.testAudioWithWifi_openFractionsExploration_changeLanguage_clickNext_checkLanguageIsHinglish(ExplorationActivityTest.kt:572)

        Caused by:
        java.util.concurrent.RejectedExecutionException
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:271)
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:266)
            at org.oppia.android.testing.CoroutineExecutorService.dispatchAsync(CoroutineExecutorService.kt:262)
            at org.oppia.android.testing.CoroutineExecutorService.execute(CoroutineExecutorService.kt:231)
            at com.bumptech.glide.load.engine.executor.GlideExecutor.execute(GlideExecutor.java:220)
            at com.bumptech.glide.load.engine.EngineJob.start(EngineJob.java:131)
            at com.bumptech.glide.load.engine.Engine.waitForExistingOrStartNewJob(Engine.java:286)
            at com.bumptech.glide.load.engine.Engine.load(Engine.java:193)
            at com.bumptech.glide.request.SingleRequest.onSizeReady(SingleRequest.java:449)
            at com.bumptech.glide.request.target.CustomTarget.getSize(CustomTarget.java:107)
            at com.bumptech.glide.request.SingleRequest.begin(SingleRequest.java:251)
            at com.bumptech.glide.manager.RequestTracker.runRequest(RequestTracker.java:43)
            at com.bumptech.glide.RequestManager.track(RequestManager.java:674)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:641)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:608)
            at com.bumptech.glide.RequestBuilder.into(RequestBuilder.java:599)
            at org.oppia.android.util.parser.GlideImageLoader.intoTarget(GlideImageLoader.kt:55)
            at org.oppia.android.util.parser.GlideImageLoader.loadBitmap(GlideImageLoader.kt:32)
            at org.oppia.android.util.parser.UrlImageParser.getDrawable(UrlImageParser.kt:50)
            at android.text.HtmlToSpannedConverter.startImg(Html.java:1122)
            at android.text.HtmlToSpannedConverter.handleStartTag(Html.java:835)
            at android.text.HtmlToSpannedConverter.startElement(Html.java:1209)
            at org.oppia.android.util.parser.CustomHtmlContentHandler.startElement(CustomHtmlContentHandler.kt:55)
            at org.ccil.cowan.tagsoup.Parser.push(Parser.java:795)
            at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1062)
            at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1017)
            at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:624)
            at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:450)
            at android.text.HtmlToSpannedConverter.convert(Html.java:746)
            at android.text.Html.fromHtml(Html.java:242)
            at androidx.core.text.HtmlCompat.fromHtml(HtmlCompat.java:162)
            at org.oppia.android.util.parser.CustomHtmlContentHandler$Companion.fromHtml(CustomHtmlContentHandler.kt:168)
            at org.oppia.android.util.parser.HtmlParser.parseOppiaHtml(HtmlParser.kt:77)
            at org.oppia.android.app.player.state.StatePlayerRecyclerViewAssembler$Builder$addContentSupport$2.invoke(StatePlayerRecyclerViewAssembler.kt:800)
            at org.oppia.android.app.player.state.StatePlayerRecyclerViewAssembler$Builder$addContentSupport$2.invoke(StatePlayerRecyclerViewAssembler.kt:744)
            at org.oppia.android.app.recyclerview.BindableAdapter$MultiTypeBuilder$registerViewBinder$viewHolderFactory$1$1.bind$app_debug(BindableAdapter.kt:242)
            at org.oppia.android.app.recyclerview.BindableAdapter.onBindViewHolder(BindableAdapter.kt:86)
            at org.oppia.android.app.recyclerview.BindableAdapter.onBindViewHolder(BindableAdapter.kt:30)
            at androidx.recyclerview.widget.RecyclerView$Adapter.onBindViewHolder(RecyclerView.java:7065)
            at androidx.recyclerview.widget.RecyclerView$Adapter.bindViewHolder(RecyclerView.java:7107)
            at androidx.recyclerview.widget.RecyclerView$Recycler.tryBindViewHolderByDeadline(RecyclerView.java:6012)
            at androidx.recyclerview.widget.RecyclerView$Recycler.tryGetViewHolderForPositionByDeadline(RecyclerView.java:6279)
            at androidx.recyclerview.widget.RecyclerView$Recycler.getViewForPosition(RecyclerView.java:6118)
            at androidx.recyclerview.widget.RecyclerView$Recycler.getViewForPosition(RecyclerView.java:6114)
            at androidx.recyclerview.widget.LinearLayoutManager$LayoutState.next(LinearLayoutManager.java:2303)
            at androidx.recyclerview.widget.LinearLayoutManager.layoutChunk(LinearLayoutManager.java:1627)
            at androidx.recyclerview.widget.LinearLayoutManager.fill(LinearLayoutManager.java:1587)
            at androidx.recyclerview.widget.LinearLayoutManager.onLayoutChildren(LinearLayoutManager.java:665)
            at androidx.recyclerview.widget.RecyclerView.dispatchLayoutStep2(RecyclerView.java:4134)
            at androidx.recyclerview.widget.RecyclerView.dispatchLayout(RecyclerView.java:3851)
            at androidx.recyclerview.widget.RecyclerView.onLayout(RecyclerView.java:4404)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at androidx.constraintlayout.widget.ConstraintLayout.onLayout(ConstraintLayout.java:1915)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at androidx.constraintlayout.widget.ConstraintLayout.onLayout(ConstraintLayout.java:1915)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.LinearLayout.setChildFrame(LinearLayout.java:1812)
            at android.widget.LinearLayout.layoutVertical(LinearLayout.java:1656)
            at android.widget.LinearLayout.onLayout(LinearLayout.java:1565)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.widget.FrameLayout.layoutChildren(FrameLayout.java:323)
            at android.widget.FrameLayout.onLayout(FrameLayout.java:261)
            at com.android.internal.policy.DecorView.onLayout(DecorView.java:753)
            at android.view.View.layout(View.java:20672)
            at android.view.ViewGroup.layout(ViewGroup.java:6194)
            at android.view.ViewRootImpl.performLayout(ViewRootImpl.java:2792)
            at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2319)
            at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1460)
            at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7183)
            at android.view.Choreographer$CallbackRecord.run(Choreographer.java:949)
            at android.view.Choreographer.doCallbacks(Choreographer.java:761)
            at android.view.Choreographer.doFrame(Choreographer.java:696)
            at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:935)
            at android.os.Handler.handleCallback(Handler.java:873)
            at android.os.Handler.dispatchMessage(Handler.java:99)
            at org.robolectric.shadows.ShadowPausedLooper$IdlingRunnable.run(ShadowPausedLooper.java:308)
            at org.robolectric.shadows.ShadowPausedLooper.executeOnLooper(ShadowPausedLooper.java:273)
            at org.robolectric.shadows.ShadowPausedLooper.idle(ShadowPausedLooper.java:85)
            ... 4 more

This is particularly noteworthy because ExplorationActivityTest does NOT set up a mock Glide executor. This seems to strongly suggest that Gradle is not actually using multiple processes for each test, and the shared Glide state + lack of test coroutine synchronization on test exit is likely to trigger a deadlock when shutting down the previous version of Glide.

While we should be making sure all tasks complete on test tear-down, we should also be using multiple processes for each test for proper state separation.

BenHenning · 2020-10-07T21:06:14Z

Gradle documentation is a bit contradictory here. Per https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:maxParallelForks it indicates that a single test is run in a single process at a time by default, but https://docs.gradle.org/current/dsl/org.gradle.api.tasks.testing.Test.html#org.gradle.api.tasks.testing.Test:forkEvery mentions that the same test process is reused for all test classes.

I think what the documentation actually means is that only 1 process will be created (maxParallelForks = 1), and it will be reused everywhere (forkEvery = 0). However, when changing forkEvery to 1 locally, it doesn't actually fix the underlying issue. This might be a configuration issue with Java vs. Android tests.

BenHenning · 2020-10-07T21:42:06Z

Another issue I'm running into: Android Studio doesn't seem to respect Gradle settings for fork & parallelization. Users need to manually set the fork mode within the IDE in order to get the correct results when running the full suite of app module tests. However, running the suite with class forking seems to fix the underlying issues.

Really curious why this is suddenly becoming an issue on actions now.

BenHenning · 2020-10-07T21:44:11Z

Note that turning on forking significantly slows down Gradle tests, probably because Robolectric needs to be re-setup each time. This is still more correct, though, and explains why Bazel tests take a lot longer to run.

Turning on parallel runs helps a lot (for my machine with ~6 cores it brings it to about the same runtime as before), but for machines with less parallelization (e.g. Github Actions machines) it will probably significantly lengthen the test runtime.

BenHenning · 2020-10-07T21:57:20Z

So I can't seem to get parallel runs to work in Android Studio's runner UI, so anyone using the UI to batch run tests is going to have a painful time. The CLI does parallelize correctly and runs significantly faster. Individual test runs from within Android Studio should be unaffected.

…st processes (#1943) * Enable full stack traces for all Robolectric tests * Show stack traces for failing Robolectric tests. * Introduce class-level forking & enable test parallelization.

BenHenning added Type: Bug Priority: Essential This work item must be completed for its milestone. Hacktoberfest This is a suggested Hacktoberfest issue. labels Oct 7, 2020

BenHenning added this to the Beta milestone Oct 7, 2020

BenHenning self-assigned this Oct 7, 2020

BenHenning mentioned this issue Oct 7, 2020

Enable full stack traces for all Robolectric tests #1939

Closed

BenHenning mentioned this issue Oct 7, 2020

Fix #1942: Fix CI app module test flakiness/failures due to shared test processes #1943

Merged

BenHenning closed this as completed in #1943 Oct 8, 2020

BenHenning added the issue_type_infrastructure label Aug 9, 2022

BenHenning added the Z-ibt Temporary label for Ben to keep track of issues he's triaged. label Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoroutineExecutorService is causing flakes in CI #1942

CoroutineExecutorService is causing flakes in CI #1942

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 •

edited

Loading

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 •

edited

Loading

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 •

edited

Loading

BenHenning commented Oct 7, 2020

CoroutineExecutorService is causing flakes in CI #1942

CoroutineExecutorService is causing flakes in CI #1942

Comments

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 • edited Loading

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 • edited Loading

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 • edited Loading

BenHenning commented Oct 7, 2020

BenHenning commented Oct 7, 2020 •

edited

Loading

BenHenning commented Oct 7, 2020 •

edited

Loading

BenHenning commented Oct 7, 2020 •

edited

Loading