Disable parts of batch_span_processor test as flakes #743

jmacd · 2020-05-19T05:43:34Z

This:

enqueue() uses the a defer/recover to avoid panicking on a closed queue
processQueue() is factored apart from drainQueue() and exportSpans() (avoids named loop)
The test now does not validate the outcome, only that the code completes. TODO: fix.

Part of #741.

…routine

jmacd · 2020-05-19T05:44:41Z

@vmihailenco It turns out we didn't fix the flakes last time. I did two things here, one was to eliminate the leaked goroutine to throw-away spans (using the wait group), two was to wait for the drain to finish to ensure the test doesn't race with draining the queue.

vmihailenco · 2020-05-19T06:20:43Z

I believe it has the same problem you've indicated previously - bsp.stopWait.Wait() in Shutdown and bsp.stopWait.Add(1) in enqueue are executed concurrently and therefore can panic. E,g, in this code

		close(bsp.stopCh)
		// Here processQueue can receive on stopCh and execute bsp.stopWait.Done()
		// Then bsp.stopWait can be 0 at this point and Wait can panic.
		bsp.stopWait.Wait()

What is worse

there is no reason for WaitGroup to panic in these situations - it just tries to ensure that people write "correct" code (I believe there is nothing wrong with how WaitGroup is used here)
there is also no reason for channel to panic on send when it is closed - again Go tries to make sure people write "good" code

As a result we spend hours trying to write "good" code and still have this monster. And I believe it is still not 100% correct. TBH I would just add defer + recover and move forward. It is correct, simple and fast as long as there are no panics.

jmacd · 2020-05-19T08:10:04Z

The defer/recover addresses the safety issue, I think, but the test is forcing this "monster" as you call it. I'm ready to disable this test.

jmacd · 2020-05-19T08:36:19Z

@vmihailenco Thanks for your help. I think this is an improvement and that we can postpone a proper test.

sdk/trace/batch_span_processor.go

jmacd added 2 commits May 18, 2020 21:57

Name the BSP tests

efc5965

Add a drain wait group; use the stop wait group to avoid leaking a go…

4a81198

…routine

jmacd requested review from Aneurysm9, evantorrie, lizthegrey, MrAlias and paivagustavo as code owners May 19, 2020 05:43

jmacd added 6 commits May 18, 2020 23:24

Lint & comments

bf88e5f

Fix

b21de4e

Use defer/recover

7df46de

Restore the Add/Done...

e42a8fd

Restore the Add/Done...

d77ae11

Consolidate select stmts

70ec39e

jmacd closed this May 19, 2020

Disable the test

ce4ad21

jmacd changed the title ~~Fix flaky batch_span_processor test~~ Disable parts of batch_span_processor test as flakes May 19, 2020

jmacd reopened this May 19, 2020

Lint

2e7d793

vmihailenco reviewed May 19, 2020

View reviewed changes

sdk/trace/batch_span_processor.go Outdated Show resolved Hide resolved

Use better recover

8d5fe7c

MrAlias approved these changes May 19, 2020

View reviewed changes

Merge branch 'master' into jmacd/batch_flake

a102ac8

jmacd merged commit 055e9c5 into open-telemetry:master May 19, 2020

jmacd deleted the jmacd/batch_flake branch May 19, 2020 16:41

pellared added this to the untracked milestone Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable parts of batch_span_processor test as flakes #743

Disable parts of batch_span_processor test as flakes #743

jmacd commented May 19, 2020 •

edited

Loading

jmacd commented May 19, 2020

vmihailenco commented May 19, 2020 •

edited

Loading

jmacd commented May 19, 2020

jmacd commented May 19, 2020

Disable parts of batch_span_processor test as flakes #743

Disable parts of batch_span_processor test as flakes #743

Conversation

jmacd commented May 19, 2020 • edited Loading

jmacd commented May 19, 2020

vmihailenco commented May 19, 2020 • edited Loading

jmacd commented May 19, 2020

jmacd commented May 19, 2020

jmacd commented May 19, 2020 •

edited

Loading

vmihailenco commented May 19, 2020 •

edited

Loading