Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Fix flaky TestCloseThroughContext test #1265

Merged
merged 3 commits into from
Mar 31, 2023

Conversation

AndrewSisley
Copy link
Contributor

Relevant issue(s)

Resolves #1253

Description

Fixes flaky TestCloseThroughContext test.

The test was failing because within the memory datastore ctx.Done is being handled by the same loop that d.commit is being read. There was no guarantee that ctx.Done will be handled before d.commit, which would result in an exit from the control loop on d.commit, before the handling of ctx.Done has had chance to close the datastore.

Things could be improved by breaking up that select-loop, but even then there is no guarentee that the context close would be handled before s.Close and so the sleep would still be required.

It would happen very infrequently, but I have also only ever heard of this test failing once, so that kind of makes sense.

@AndrewSisley AndrewSisley added bug Something isn't working area/testing Related to any test or testing suite action/no-benchmark Skips the action that runs the benchmark. labels Mar 30, 2023
@AndrewSisley AndrewSisley added this to the DefraDB v0.5 milestone Mar 30, 2023
@AndrewSisley AndrewSisley requested a review from a team March 30, 2023 18:20
@AndrewSisley AndrewSisley self-assigned this Mar 30, 2023
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #1265 (737f291) into develop (bce78a1) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1265      +/-   ##
===========================================
- Coverage    70.17%   70.10%   -0.08%     
===========================================
  Files          184      184              
  Lines        17394    17392       -2     
===========================================
- Hits         12206    12192      -14     
- Misses        4251     4260       +9     
- Partials       937      940       +3     
Impacted Files Coverage Δ
datastore/memory/memory.go 95.70% <100.00%> (-0.04%) ⬇️

... and 4 files with indirect coverage changes

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I am not sure how to test this, but I am assuming you probably have.

@AndrewSisley
Copy link
Contributor Author

TBH I am not sure how to test this, but I am assuming you probably have.

The failure is pretty common without the (removed) chan call, and without the sleep. It is rare with the chan call, and should be really hard to hit with a 10ms sleep (quite massive compared to the chan)

@fredcarle
Copy link
Collaborator

The test was failing because within the memory datastore ctx.Done is being handled by the same loop that d.commit is being read. There was no guarantee that ctx.Done will be handled before d.commit, which would result in an exit from the control loop on d.commit, before the handling of ctx.Done has had chance to close the datastore.

For some reason I highly doubt that that is what's happening. Where you able to track the execution flow to match what you're saying?

@AndrewSisley
Copy link
Contributor Author

AndrewSisley commented Mar 30, 2023

For some reason I highly doubt that that is what's happening. Where you able to track the execution flow to match what you're saying?

With an interactive debugger? No, but you can see how it is possible in the code. And if the context close is handled after the database close there will be no error reported.

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

@fredcarle
Copy link
Collaborator

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

Indeed and that's why the channel receiver is there. To make sure that the context close call closes the datastore before the next close call.

@fredcarle
Copy link
Collaborator

I'm pretty sure that changing from

func (d *Datastore) Close() error {
	d.closeOnce.Do(func() {
		close(d.closing)
	})
	d.closeLk.Lock()
	defer d.closeLk.Unlock()
	if d.closed {
		return ErrClosed
	}
	d.closed = true
	close(d.commit)
	return nil
}

to

func (d *Datastore) Close() error {
	d.closeLk.Lock()
	defer d.closeLk.Unlock()
	if d.closed {
		return ErrClosed
	}
        d.closeOnce.Do(func() {
		close(d.closing)
	})
	d.closed = true
	close(d.commit)
	return nil
}

would solve the issue without using a sleep call.

@AndrewSisley
Copy link
Contributor Author

It is also very easy to reproduce if you remove the strange chan call (2nd commit) in the test without the explicit sleep.

Indeed and that's why the channel receiver is there. To make sure that the context close call closes the datastore before the next close call.

That leaves a race between the if and the Do you flagged, moving the do to after the if would also solve it, but the internal chan receiver on an internal chan is an odd mechanic to use for the test (especially when undocumented).

The Sleep will work regardless of datastore internals, and doesn't add additional references to internal stuff (which hinders implementation change).

@AndrewSisley AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 4131d60 to 9dde17b Compare March 31, 2023 01:19
The only impact it had on the test was to slightly delay things before calling s.Close making the test a little less flaky.
@AndrewSisley AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 9dde17b to 114cc2f Compare March 31, 2023 01:19
Also removed the Do as it becomes pointless on this side of the lock.
@AndrewSisley AndrewSisley force-pushed the sisley/test/I1253-close-via-context branch from 114cc2f to 737f291 Compare March 31, 2023 01:20
@AndrewSisley AndrewSisley merged commit a261c2f into develop Mar 31, 2023
@AndrewSisley AndrewSisley deleted the sisley/test/I1253-close-via-context branch March 31, 2023 01:36
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/testing Related to any test or testing suite bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestCloseThroughContext occasionally fails
3 participants