Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retrying jobs are not saved to db when pool is closing #31

Merged
merged 9 commits into from
Jan 19, 2023

Conversation

DNK90
Copy link
Contributor

@DNK90 DNK90 commented Jan 16, 2023

  • Add waitgroup to wait until all retrying jobs that are being on goroutine finish storing job to db
  • Force controller finishes processing all pending jobs before fetching new logs

@DNK90 DNK90 requested review from minh-bq and linh-1 January 16, 2023 18:20
…berOfRetryingJob` in `PrepareRetryableJob` is enough
pool.go Outdated Show resolved Hide resolved
pool.go Outdated Show resolved Hide resolved
pool.go Outdated
dur := time.Until(time.Unix(job.GetNextTry(), 0))
if dur <= 0 {
return
}
atomic.AddInt32(&p.numberOfRetryingJob, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This numberOfRetryingJob seems unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numberOfRetryingJob can be used for stats

func (p *Pool) updateRetryingJob(job JobHandler) {
if job == nil {
return
}

p.lock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this lock is redundant. AFAIK, 2 writes to same row in postgresql are serialized, aren't they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use this to lock the pool, not the db, so that the pool cannot exit without finishing this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For retry job, I think the waitgroup is enough, I'm not sure about the failed job flow. However, I cannot see how this lock can prevent pool from exiting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be some goroutines still calling to Enqueue or RetryJob and panic because of "send to close channel" on both Jobchan and RetryJobChan will happen. This time the pool may already closed and waitgroup for on-the-fly retry-able jobs may be all done. The pool then exits without waiting these events. Calling lock will make sure it cannot close until these events are stored to db

pool.go Show resolved Hide resolved
@DNK90 DNK90 requested a review from minh-bq January 19, 2023 09:17
@DNK90 DNK90 merged commit 8bcddcb into master Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants