-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic when moving to neoq_dead_jobs #98
Comments
Thanks for the report @BillBuilt. I'll look into this soon. |
@BillBuilt this is fixed in |
Thank you! I can confirm the panic is indeed gone and the correct handler is being ran on the retry, however, after moving to the dead jobs table, the error message is empty (neoq_dead_jobs.error) when it should include the error message.
|
Interesting -- I tested that as well, but must have made a change that affected it before cutting a release.
I'll get this sorted out as well. Thanks for the confirmation.
|
Can you confirm whether you see an error-level log with the following form?
If you see |
Hello,
I spoke too soon. The error handler is NOT being ran on retries. And, if I bump max_retries from 1 to 2, things get more strange.
First, results after running with max_retries=1
With max retries = 1
====================
2023/10/16 12:36:05 waiting 10 seconds for delayed job to run...
2023/10/16 12:36:05 nqTestHandler()
2023/10/16 12:36:05 got job id: 1 queue: test_now messsage: [test_now] this is an instant job
2023/10/16 12:36:10 nqTestHandler()
2023/10/16 12:36:10 got job id: 2 queue: test_delayed messsage: [test_delayed] this is a delayed job
2023/10/16 12:36:15 waiting 3 seconds for failed job to run...
2023/10/16 12:36:15 nqTestHandlerWithErr()
2023/10/16 12:36:15 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors retries 0 max retries 0xc0004191a0
time=2023-10-16T12:36:15.276-04:00 level=ERROR msg="job failed" job_error="job failed to process: this is a test error"
2023/10/16 12:36:18 waiting 60 seconds for failed job to get retried...
2023/10/16 12:36:31 nqTestHandler() <<<< should be nqTestHandlerWithErr()
2023/10/16 12:36:31 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors
And the resulting database tables:
<img width="903" alt="max_retries=1" src="https://github.com/acaloiaro/neoq/assets/28831382/fa9f8619-ad77-4a76-b539-fec03c3e10b0">
Things to note about this:
1. The failed job gets retried as expected, but it does not use the correct handler
2. The failed job gets moved to the neoq_dead_jobs table as expected
3. The log shows 'job_error="job failed to process: this is a test error”'
When max_retries=2:
With max retries = 2
====================
2023/10/16 12:47:59 waiting 10 seconds for delayed job to run...
2023/10/16 12:47:59 nqTestHandler()
2023/10/16 12:47:59 got job id: 1 queue: test_now messsage: [test_now] this is an instant job
2023/10/16 12:48:04 nqTestHandler()
2023/10/16 12:48:04 got job id: 2 queue: test_delayed messsage: [test_delayed] this is a delayed job
2023/10/16 12:48:09 waiting 3 seconds for failed job to run...
2023/10/16 12:48:09 nqTestHandlerWithErr()
2023/10/16 12:48:09 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors retries 0 max retries 0xc0003c83d0
time=2023-10-16T12:48:09.883-04:00 level=ERROR msg="job failed" job_error="job failed to process: this is a test error"
2023/10/16 12:48:12 waiting 60 seconds for failed job to get retried...
2023/10/16 12:48:25 nqTestHandler() <<<< should be nqTestHandlerWithErr() and the retry is setting its status as "processed" so it never gets to the neoq_dead_jobs table
2023/10/16 12:48:25 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors
And the resulting database tables:
<img width="1306" alt="max_retries=2" src="https://github.com/acaloiaro/neoq/assets/28831382/78611f82-5c35-47c3-9b65-6e483b2aa2ca">
Things to note about this:
1. The failed job gets retried as expected, but it does not use the correct handler, nor does it increment the `retries` field
2. The failed job DOES NOT get moved to the neoq_dead_jobs table as expected since the retry uses the incorrect handler, and as such gets a ‘processed’ status so the 3rd retry never runs. However, even with these discrepancies, I would expect the `retries` to be ‘1’ at this point.
3. The log shows 'job_error="job failed to process: this is a test error”'
Thank you
- Bill
On Oct 16, 2023, at 11:56 AM, Adriano Caloiaro ***@***.***> wrote:
Can you confirm whether you see a error-level log with the following form?
time=2023-10-16T09:54:30.678-06:00 level=ERROR msg="job failed" job_error="job failed to process: panic [/home/adriano/git/neoq/backends/postgres/postgres_backend_test.go:522]: no good"
If you see job_error="" instead of an actual error message, it means the stringification for your error is the empty string.
—
Reply to this email directly, view it on GitHub <#98 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG365FV5A2D42MHE3RAQ2ZDX7VKKZAVCNFSM6AAAAAA57KH62SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUG44TGOJVHA>.
You are receiving this because you were mentioned.
|
Hi Bill, I’m traveling but will take a look at your data when I can.
…On Mon, Oct 16, 2023, at 11:09, Bill Matlock wrote:
Hello,
I spoke too soon. The error handler is NOT being ran on retries. And, if I bump max_retries from 1 to 2, things get more strange.
First, results after running with max_retries=1
With max retries = 1
====================
2023/10/16 12:36:05 waiting 10 seconds for delayed job to run...
2023/10/16 12:36:05 nqTestHandler()
2023/10/16 12:36:05 got job id: 1 queue: test_now messsage: [test_now] this is an instant job
2023/10/16 12:36:10 nqTestHandler()
2023/10/16 12:36:10 got job id: 2 queue: test_delayed messsage: [test_delayed] this is a delayed job
2023/10/16 12:36:15 waiting 3 seconds for failed job to run...
2023/10/16 12:36:15 nqTestHandlerWithErr()
2023/10/16 12:36:15 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors retries 0 max retries 0xc0004191a0
time=2023-10-16T12:36:15.276-04:00 level=ERROR msg="job failed" job_error="job failed to process: this is a test error"
2023/10/16 12:36:18 waiting 60 seconds for failed job to get retried...
2023/10/16 12:36:31 nqTestHandler() <<<< should be nqTestHandlerWithErr()
2023/10/16 12:36:31 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors
And the resulting database tables:

Things to note about this:
1. The failed job gets retried as expected, but it does not use the correct handler
2. The failed job gets moved to the neoq_dead_jobs table as expected
3. The log shows 'job_error="job failed to process: this is a test error”'
When max_retries=2:
With max retries = 2
====================
2023/10/16 12:47:59 waiting 10 seconds for delayed job to run...
2023/10/16 12:47:59 nqTestHandler()
2023/10/16 12:47:59 got job id: 1 queue: test_now messsage: [test_now] this is an instant job
2023/10/16 12:48:04 nqTestHandler()
2023/10/16 12:48:04 got job id: 2 queue: test_delayed messsage: [test_delayed] this is a delayed job
2023/10/16 12:48:09 waiting 3 seconds for failed job to run...
2023/10/16 12:48:09 nqTestHandlerWithErr()
2023/10/16 12:48:09 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors retries 0 max retries 0xc0003c83d0
time=2023-10-16T12:48:09.883-04:00 level=ERROR msg="job failed" job_error="job failed to process: this is a test error"
2023/10/16 12:48:12 waiting 60 seconds for failed job to get retried...
2023/10/16 12:48:25 nqTestHandler() <<<< should be nqTestHandlerWithErr() and the retry is setting its status as "processed" so it never gets to the neoq_dead_jobs table
2023/10/16 12:48:25 got job id: 3 queue: test_error messsage: [test_error] this is a job with errors
And the resulting database tables:

Things to note about this:
1. The failed job gets retried as expected, but it does not use the correct handler, nor does it increment the `retries` field
2. The failed job DOES NOT get moved to the neoq_dead_jobs table as expected since the retry uses the incorrect handler, and as such gets a ‘processed’ status so the 3rd retry never runs. However, even with these discrepancies, I would expect the `retries` to be ‘1’ at this point.
3. The log shows 'job_error="job failed to process: this is a test error”'
Thank you
- Bill
On Oct 16, 2023, at 11:56 AM, Adriano Caloiaro ***@***.***> wrote:
Can you confirm whether you see a error-level log with the following form?
time=2023-10-16T09:54:30.678-06:00 level=ERROR msg="job failed" job_error="job failed to process: panic [/home/adriano/git/neoq/backends/postgres/postgres_backend_test.go:522]: no good"
If you see job_error="" instead of an actual error message, it means the stringification for your error is the empty string.
—
Reply to this email directly, view it on GitHub <#98 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG365FV5A2D42MHE3RAQ2ZDX7VKKZAVCNFSM6AAAAAA57KH62SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUG44TGOJVHA>.
You are receiving this because you were mentioned.
—
Reply to this email directly, view it on GitHub <#98 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZNMQDHZWHMQCEER57OBZLX7VS35AVCNFSM6AAAAAA57KH62SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUHEYTIOJVG4>.
You are receiving this because you modified the open/close state.Message ID: ***@***.***>
|
Hi @BillBuilt I'm not sure that I can produce this one without any code. Can you supply example code that reproduces what you're reporting? I've tried to reproduce your result, but jobs are ending up in the dead queue as expected for me. Could you also call With reproduction code, I should be able to get any problems fixed up quickly now that I'm in one place. |
Update: I've reproduced it and will get started on a solution soon. |
@BillBuilt A fix is ready for this, but first I want to get a review from @elliotcourant before merging. |
Thank you - and sorry about not getting you some example code - but I see you were able to reproduce. Let me know if you need anything else from me. |
No worries -- if Elliot has the time to review by tomorrow, I'll wait it out. If not, I'll go ahead and merge tomorrow. |
No rush - thank you! |
Fixes a bug that can allow retries to end up on the wrong queue in settings where there are multiple handlers.
Fixes a bug that can allow retries to end up on the wrong queue in settings where there are multiple handlers.
@BillBuilt Release |
@acaloiaro That fixed our issues! Thank you and sorry for the late response. |
Good to hear, Bill. Cheers! |
Thank you for the patch for #85, however this gets me to the next problems.
First, when a failed job exceeds the max retries, it is to be moved to the
neoq_dead_jobs
table, however a panic is being generated there:I believe this is due to the fact that
jobErr
is not being populated, causing thenil pointer dereference
. However the error message is also in the job struct so maybe the fix would be to drop thejobErr
param and set theerror
field to the value ofj.Error
?In
/backends/postgres/postgres_backend.go moveToDeadQueue()
Go from:
To:
Secondly, if I patch the above error as described to get through the panic, the next time the failed job is ran, it is being ran on a different queue (handler) than what is assigned to it. Here is a snippet of my testing log showing that the first time it is getting handled by the correct handler (
nqTestHandlerWithErr()
) but the second time it is not (nqTestHandler()
), but I do not know if my changes above are causing this or not, so I did not open a separate issue:And here are the test handlers:
Thank you!
The text was updated successfully, but these errors were encountered: