Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display job reexecutions and job id #225

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Display job reexecutions and job id #225

wants to merge 3 commits into from

Conversation

dgmora
Copy link

@dgmora dgmora commented Dec 7, 2024

Background / problem

There are some things that are hard to distinguish at first glance regarding job instances, executions and reexecutions. I think a factor is that I am not used to the terminology / internals, since I mostly used sidekiq.

An example: Screenshot 2024-12-07 at 11 55 19

Here you can't see to what instance these jobs belong. Three may be related to a759850e-3b87-44be-ba63-3d98ab92771b and one to c24aa402-b669-437c-80ea-e3b603e87dc2. Knowing the instance can help understand that the same job (instance) has been reexecuted multiple times because there was an exception.

Sometimes when you click one of these "finished jobs", you'll land in a job with the finished label, while in others you'll go to a scheduled job (because it's still being reexecuted due to an exception). It's also a bit surprising to me that these are called "jobs", but we see multiple rows for the same job instance.

Then in scheduled jobs, you can't see if a job is part of a reexecution or not

Changes

Added the first part of the job id to the list. This helps understanding a bit reexecutions

Change: Screenshot 2024-12-07 at 12 21 18

Added a reexecution label in the jobs in finished/scheduled:

Change: Screenshot 2024-12-07 at 12 25 40 Screenshot 2024-12-07 at 12 25 46

and in the scheduled view:

Change: Screenshot 2024-12-07 at 12 27 06

Questions/Asks

I'm not sure if it fits the terminology, but wouldn't Finished job executions fit better than Finished jobs? There are multiple rows for the same job instance.

I don't mind changes to this PR, so if it can serve as a starting point, feel free to modify it (styles or anything) or create a new one 🙏.

I think it'd be nice to show as well in the Finished jobs list if a job is 'done done' or if it's still being reexecuted / failed / blocked. But with just the information job has, I didn't see a way.

@rosa
Copy link
Member

rosa commented Dec 7, 2024

Hey @dgmora, thank you! I'll look into this next week. To answer this question:

There are multiple rows for the same job instance.

The truth is that, because of the way this is implemented, these are different jobs, they just happen to have the same Active Job ID. The way Active Job's handles automated retries is to re-enqueue a job with the same arguments, updating the internal metadata on executions and exceptions. Then the queue adapter proceeds as it wishes with that. In the case of Resque (and I think Sidekiq is the same) finished jobs aren't kept, jobs are discarded when they're run, so it seems as if there was only one job, but the truth is that there are multiple jobs, just that the ones that ran are no longer in the system. Mission Control supports Resque too, and in that case, this won't be as relevant. Solid Queue can also behave in that way if you set preserve_finished_jobs to false.

Now, Solid Queue could do something different here, perhaps, and it's that instead of creating a completely new job when Active Job calls enqueue, it could update the existing record for the same Active Job ID. I went with creating a new job because I thought it was safer and more consistent with Active Job's calls to enqueue, and also useful to see the retries that actually ran.

Now I understand what you meant by scheduled, because, in this case,e you have a delay for the retries, so you're seeing both finished and retried jobs and jobs that are scheduled because they were enqueued with the delay given by the retry_on options.

@rosa
Copy link
Member

rosa commented Dec 7, 2024

Hey @dgmora, thank you! I'll look into this next week. To answer this question:

There are multiple rows for the same job instance.

The truth is that, because of the way this is implemented, these are different jobs; they are related to each other but they're different jobs. The way Active Job's handles automated retries is to re-enqueue a job with the same arguments, updating the internal metadata on executions and exceptions. Then, the queue adapter proceeds as it wishes with that. In the case of Resque (and I think Sidekiq is the same), finished jobs aren't kept, jobs are discarded when they're run, so it seems as if there was only one job, but the truth is that there are multiple jobs, just that the ones that ran are no longer in the system. Mission Control supports Resque too, and in that case, this won't be as relevant. Solid Queue can also behave in that way if you set preserve_finished_jobs to false.

Now, Solid Queue could do something different here, perhaps, and it's that instead of creating a completely new job when Active Job calls enqueue, it could update the existing record for the same Active Job ID. I went with creating a new job because I thought it was safer and more consistent with Active Job's calls to enqueue, and also useful to see the retries that actually ran.

Now I understand what you meant by scheduled, because, in this case, you have a delay for the retries, so you're seeing both finished and retried jobs and jobs that are scheduled because they were enqueued with the delay given by the retry_on options.

I'll look into this more closely next week, together with #204, as that's related 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants