Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error cleanup in and after OpenTable calls #102

Merged
merged 1 commit into from
Jul 18, 2017
Merged

Fix error cleanup in and after OpenTable calls #102

merged 1 commit into from
Jul 18, 2017

Conversation

srh
Copy link

@srh srh commented Jul 16, 2017

  • We now close fd inside of OpenTable upon error.

  • At many callsites (where we call OpenTable many times in a loop) we
    now cleanup resources properly.


This change is Reviewable

@srh
Copy link
Author

srh commented Jul 16, 2017

Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion.


levels.go, line 313 at r1 (raw file):

		for _, table := range newTables[:i] {
			if table != nil {
				_ = table.Close()

When cleaning up here I chose to close the tables. Instead we could DecrRef and delete the newly created files, or otherwise directly act to delete the newly created files. I don't yet have enough of a "big picture" view of Badger to know what is right here.


Comments from Reviewable

@srh
Copy link
Author

srh commented Jul 16, 2017

Review status: 0 of 4 files reviewed at latest revision, 2 unresolved discussions.


levels.go, line 269 at r2 (raw file):

			y.Check(builder.Add(it.Key(), it.Value()))
		}
		// it.Valid() at least once in the loop above, hence we called .Add, hence

This "builder.Empty()" condition can never happen. That's good news, because if it did happen, we would have needed to push a nil error onto che so that the loop below ("Wait for all table builders to finish") doesn't hang.


Comments from Reviewable

@manishrjain
Copy link
Contributor

This one is a tricky change. Would be hard to test it as well, given it's hard to reproduce FS failures. But, at least do this. We have a populate script in badger-bench. Run it with 250M keys and 128B values. It would consume 40GB. That should ensure that at least this works well for the normal case.


Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


levels.go, line 313 at r1 (raw file):

Previously, srh (Sam Hughes) wrote…

When cleaning up here I chose to close the tables. Instead we could DecrRef and delete the newly created files, or otherwise directly act to delete the newly created files. I don't yet have enough of a "big picture" view of Badger to know what is right here.

You do need to call DecrRef here because when tables are opened, they have at least one ref. This way, we'll delete them using the logic already in place (truncate fanciness and all).


levels.go, line 147 at r2 (raw file):

	for _, tableSlice := range tables {
		for _, table := range tableSlice {
			_ = table.Close()

My first thought was: If you close the table, but someone is holding a reference to it, this would cause issues.

But, I realize that this is only called on tables whose ref is not yet being held by anyone. If that's the case, can you add a comment about it. I think the "unmodified" part is largely irrelevant because all our writes are synced to disk with every write system call because of O_DSYNC flag. So, closing shouldn't make a difference (apart from the number of open file descriptors).


levels.go, line 152 at r2 (raw file):

}

func cleanupLevels(s *levelsController) error {

Make it a member of levelsController.


levels.go, line 155 at r2 (raw file):

	var firstErr error
	for _, l := range s.levels {
		if err := l.close(); err != nil && firstErr == nil {

Shouldn't matter if it's first or last. Can simplify the if statement though:

if err := l.close(); err != nil { rerr = err }


levels.go, line 269 at r2 (raw file):

Previously, srh (Sam Hughes) wrote…

This "builder.Empty()" condition can never happen. That's good news, because if it did happen, we would have needed to push a nil error onto che so that the loop below ("Wait for all table builders to finish") doesn't hang.

Good point. Can you fix up the grammar of comment statement though?


levels.go, line 271 at r2 (raw file):

		// it.Valid() at least once in the loop above, hence we called .Add, hence
		// !builder.Empty.
		y.AssertTruef(!builder.Empty(), "compactBuildTables builder cannot be empty")

No need for truef. They're costly. Just use y.AssertTrue.


levels.go, line 305 at r2 (raw file):

	var firstErr error
	for x := 0; x < i; x++ {
		if err := <-che; err != nil && firstErr == nil {

Again, no need for first error (can be the last one). Can simplify if statement.


levels.go, line 574 at r2 (raw file):

		return errors.Wrap(cleanupErr, "levelsController.Close")
	}
	return err

errors.Wrap here as well.


Comments from Reviewable

@srh
Copy link
Author

srh commented Jul 17, 2017

Currently running the populate script. 54.24M keys so far.


Review status: 3 of 4 files reviewed at latest revision, 8 unresolved discussions.


levels.go, line 313 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You do need to call DecrRef here because when tables are opened, they have at least one ref. This way, we'll delete them using the logic already in place (truncate fanciness and all).

Okay, so the decision here is that we do want to delete those files.


levels.go, line 147 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

My first thought was: If you close the table, but someone is holding a reference to it, this would cause issues.

But, I realize that this is only called on tables whose ref is not yet being held by anyone. If that's the case, can you add a comment about it. I think the "unmodified" part is largely irrelevant because all our writes are synced to disk with every write system call because of O_DSYNC flag. So, closing shouldn't make a difference (apart from the number of open file descriptors).

Right. The only plausible error is when writing last modified time, or in the case of some really janky file system. We should still say unmodified here -- I want to warn people off from blindly using it -- because we might not use O_DSYNC in the future (we might explicitly fdatasync, or under some user option, not sync at all).


levels.go, line 152 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Make it a member of levelsController.

Done.


levels.go, line 155 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Shouldn't matter if it's first or last. Can simplify the if statement though:

if err := l.close(); err != nil { rerr = err }

I'm under the impression that the first is more likely to have useful info, while later errors might be of a more generic nature, or be the result of invalid state resulting from under-tested error handling after the first error. As a general rule. That's why I've been returning first errors.


levels.go, line 269 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Good point. Can you fix up the grammar of comment statement though?

Done.


levels.go, line 271 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No need for truef. They're costly. Just use y.AssertTrue.

Done.


levels.go, line 574 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

errors.Wrap here as well.

Done.


Comments from Reviewable

@manishrjain
Copy link
Contributor

:lgtm:


Reviewed 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

- We now close fd inside of OpenTable upon error.

- At many callsites (where we call OpenTable many times in a loop) we
now cleanup resources properly.
@srh srh force-pushed the sam/opentable branch from 78f10c7 to fc138ad Compare July 18, 2017 19:39
@srh srh merged commit fc138ad into master Jul 18, 2017
@srh srh deleted the sam/opentable branch July 18, 2017 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants