-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retrying jobs are not saved to db when pool is closing #31
Conversation
DNK90
commented
Jan 16, 2023
•
edited
Loading
edited
- Add waitgroup to wait until all retrying jobs that are being on goroutine finish storing job to db
- Force controller finishes processing all pending jobs before fetching new logs
…berOfRetryingJob` in `PrepareRetryableJob` is enough
…ing to exit and channels are closed
pool.go
Outdated
dur := time.Until(time.Unix(job.GetNextTry(), 0)) | ||
if dur <= 0 { | ||
return | ||
} | ||
atomic.AddInt32(&p.numberOfRetryingJob, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This numberOfRetryingJob
seems unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfRetryingJob
can be used for stats
func (p *Pool) updateRetryingJob(job JobHandler) { | ||
if job == nil { | ||
return | ||
} | ||
|
||
p.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this lock is redundant. AFAIK, 2 writes to same row in postgresql are serialized, aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use this to lock the pool, not the db, so that the pool cannot exit without finishing this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For retry job, I think the waitgroup is enough, I'm not sure about the failed job flow. However, I cannot see how this lock can prevent pool from exiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be some goroutines still calling to Enqueue
or RetryJob
and panic because of "send to close channel" on both Jobchan
and RetryJobChan
will happen. This time the pool may already closed and waitgroup for on-the-fly retry-able jobs may be all done. The pool then exits without waiting these events. Calling lock will make sure it cannot close until these events are stored to db