test: Fix flaky TestCloseThroughContext test #1265

AndrewSisley · 2023-03-30T18:20:02Z

Relevant issue(s)

Resolves #1253

Description

Fixes flaky TestCloseThroughContext test.

The test was failing because within the memory datastore ctx.Done is being handled by the same loop that d.commit is being read. There was no guarantee that ctx.Done will be handled before d.commit, which would result in an exit from the control loop on d.commit, before the handling of ctx.Done has had chance to close the datastore.

Things could be improved by breaking up that select-loop, but even then there is no guarentee that the context close would be handled before s.Close and so the sleep would still be required.

It would happen very infrequently, but I have also only ever heard of this test failing once, so that kind of makes sense.

codecov · 2023-03-30T18:25:18Z

Codecov Report

Merging #1265 (737f291) into develop (bce78a1) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #1265      +/-   ##
===========================================
- Coverage    70.17%   70.10%   -0.08%     
===========================================
  Files          184      184              
  Lines        17394    17392       -2     
===========================================
- Hits         12206    12192      -14     
- Misses        4251     4260       +9     
- Partials       937      940       +3

Impacted Files	Coverage Δ
datastore/memory/memory.go	`95.70% <100.00%> (-0.04%)`	⬇️

... and 4 files with indirect coverage changes

shahzadlone

TBH I am not sure how to test this, but I am assuming you probably have.

AndrewSisley · 2023-03-30T20:00:07Z

TBH I am not sure how to test this, but I am assuming you probably have.

The failure is pretty common without the (removed) chan call, and without the sleep. It is rare with the chan call, and should be really hard to hit with a 10ms sleep (quite massive compared to the chan)

fredcarle · 2023-03-30T20:02:21Z

The test was failing because within the memory datastore ctx.Done is being handled by the same loop that d.commit is being read. There was no guarantee that ctx.Done will be handled before d.commit, which would result in an exit from the control loop on d.commit, before the handling of ctx.Done has had chance to close the datastore.

For some reason I highly doubt that that is what's happening. Where you able to track the execution flow to match what you're saying?

AndrewSisley · 2023-03-30T21:01:00Z

For some reason I highly doubt that that is what's happening. Where you able to track the execution flow to match what you're saying?

With an interactive debugger? No, but you can see how it is possible in the code. And if the context close is handled after the database close there will be no error reported.

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

fredcarle · 2023-03-30T21:06:24Z

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

Indeed and that's why the channel receiver is there. To make sure that the context close call closes the datastore before the next close call.

fredcarle · 2023-03-30T21:13:53Z

I'm pretty sure that changing from

func (d *Datastore) Close() error {
	d.closeOnce.Do(func() {
		close(d.closing)
	})
	d.closeLk.Lock()
	defer d.closeLk.Unlock()
	if d.closed {
		return ErrClosed
	}
	d.closed = true
	close(d.commit)
	return nil
}

to

func (d *Datastore) Close() error {
	d.closeLk.Lock()
	defer d.closeLk.Unlock()
	if d.closed {
		return ErrClosed
	}
        d.closeOnce.Do(func() {
		close(d.closing)
	})
	d.closed = true
	close(d.commit)
	return nil
}

would solve the issue without using a sleep call.

AndrewSisley · 2023-03-30T21:17:08Z

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

Indeed and that's why the channel receiver is there. To make sure that the context close call closes the datastore before the next close call.

That leaves a race between the if and the Do you flagged, moving the do to after the if would also solve it, but the internal chan receiver on an internal chan is an odd mechanic to use for the test (especially when undocumented).

The Sleep will work regardless of datastore internals, and doesn't add additional references to internal stuff (which hinders implementation change).

The only impact it had on the test was to slightly delay things before calling s.Close making the test a little less flaky.

Also removed the Do as it becomes pointless on this side of the lock.

AndrewSisley added bug Something isn't working area/testing Related to any test or testing suite action/no-benchmark Skips the action that runs the benchmark. labels Mar 30, 2023

AndrewSisley added this to the DefraDB v0.5 milestone Mar 30, 2023

AndrewSisley requested a review from a team March 30, 2023 18:20

AndrewSisley self-assigned this Mar 30, 2023

shahzadlone approved these changes Mar 30, 2023

View reviewed changes

AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 4131d60 to 9dde17b Compare March 31, 2023 01:19

AndrewSisley added 2 commits March 30, 2023 21:19

Add sleep to allow cancel to close the db

55e1bb0

Remove read from chan

a5abd9d

The only impact it had on the test was to slightly delay things before calling s.Close making the test a little less flaky.

AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 9dde17b to 114cc2f Compare March 31, 2023 01:19

PR FIXUP - Move close of closing chan to within lock

737f291

Also removed the Do as it becomes pointless on this side of the lock.

AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 114cc2f to 737f291 Compare March 31, 2023 01:20

AndrewSisley merged commit a261c2f into develop Mar 31, 2023

AndrewSisley deleted the sisley/test/I1253-close-via-context branch March 31, 2023 01:36

shahzadlone pushed a commit that referenced this pull request Apr 13, 2023

test: Fix flaky TestCloseThroughContext test (#1265)

6fb3535

shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024

test: Fix flaky TestCloseThroughContext test (sourcenetwork#1265)

fd599d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Fix flaky TestCloseThroughContext test #1265

test: Fix flaky TestCloseThroughContext test #1265

AndrewSisley commented Mar 30, 2023

codecov bot commented Mar 30, 2023 •

edited

Loading

shahzadlone left a comment

AndrewSisley commented Mar 30, 2023

fredcarle commented Mar 30, 2023

AndrewSisley commented Mar 30, 2023 •

edited

Loading

fredcarle commented Mar 30, 2023

fredcarle commented Mar 30, 2023

AndrewSisley commented Mar 30, 2023

test: Fix flaky TestCloseThroughContext test #1265

test: Fix flaky TestCloseThroughContext test #1265

Conversation

AndrewSisley commented Mar 30, 2023

Relevant issue(s)

Description

codecov bot commented Mar 30, 2023 • edited Loading

Codecov Report

shahzadlone left a comment

Choose a reason for hiding this comment

AndrewSisley commented Mar 30, 2023

fredcarle commented Mar 30, 2023

AndrewSisley commented Mar 30, 2023 • edited Loading

fredcarle commented Mar 30, 2023

fredcarle commented Mar 30, 2023

AndrewSisley commented Mar 30, 2023

codecov bot commented Mar 30, 2023 •

edited

Loading

AndrewSisley commented Mar 30, 2023 •

edited

Loading