Fix unhelpful error messages in aggregate jobs.#825
Conversation
The vague "[json.exception.type_error.302] type must be string, but is null" is **absolutely** unhelpful in the way Hydra currently handles it on evaluation. This is handling *unexpected* errors only; the following commit will handle the specific instance of the previously mentioned error.
| auto jobName = i.key(); | ||
| auto & job = i.value(); | ||
| // For the error message | ||
| std::string lastTriedJobName = i.key(); |
There was a problem hiding this comment.
I guess the initialization here is not great...
I think when I wrote that up originally I was thinking about using auto to intrinsically get the type. In that case auto lastTriedJobName = jobName is probably better.
Otherwise, initializing with "".
What would be the better option?
There was a problem hiding this comment.
IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.
| if (job2 == state->jobs.end()) | ||
| throw Error("aggregate job '%s' references non-existent job '%s'", jobName, jobName2); | ||
|
|
||
| if ((*job2).find("error") != (*job2).end()) { |
There was a problem hiding this comment.
Needing to dereference here tripped me up. I get that this comes from auto job2 being the result of the search. Is there a pattern we could be using so that we don't have to dereference all the time?
There was a problem hiding this comment.
auto& something = *job2 should do the trick here and then using something instead of *job2.
There was a problem hiding this comment.
Honestly, this is probably fine? I'd probably replace the (*job2). with just job2-> but other than that this seems... fine.
| if (job.find("error") == job.end()) { | ||
| job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName); | ||
| } |
There was a problem hiding this comment.
Is there a better pattern to initialize the error message on the JSON object?
| auto jobName = i.key(); | ||
| auto & job = i.value(); | ||
| // For the error message | ||
| std::string lastTriedJobName = i.key(); |
There was a problem hiding this comment.
IIRC the nlohman value key() function returns reference. auto should then deduct the type T& which is probably not what we want here. The assignments in line 454 and 466 would then override the value in the JSON document. That being said my type deduction knowledge might be a bit rusty... I would stick with the string there.
…jobs It might happen that a job from the aggregate returned an error! This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no `drvPath` to stringify! So we now actively watch for errors and copy them to the aggregate job.
0ae2efc to
b5140c1
Compare
|
I don't know if this PR is ready, but just to let you know that merging it would unblock NixOS/nixpkgs#101071 |
|
I've stated in the past on the IRC channels (a couple of times) that it needs to be reviewed by "code owners" of that section for the open questions. Other than that, in my opinion it is ready, if not the exact code as shown, the same but with the corrections and nits fixed. I know that, during my testing, it was working as expected. |
First because IFD (import-from-derivation) is not allowed on hydra.nixos.org, and second because without NixOS/hydra#825 hydra-eval-jobs crashes instead of skipping aggregated jobs which fail (here because they required an IFD).
|
This would have massively helped debugging NixOS/nixpkgs#132328. |
| if (job.find("error") == job.end()) { | ||
| job["error"] = fmt("Errors aggregating aggregate job '%1%'.\n", jobName); | ||
| } | ||
| job["error"] = fmt("While handling '%1%': %2%\n", jobName2, (std::string) (*job2)["error"]); |
There was a problem hiding this comment.
Doesn't this always overwrite the assignment on line 473?
|
Just saying, today and possibly since two days ago And we can't know why. |
|
I'm going to merge this under the idea that it is strictly better than whats there. |
|
Better in behaviour at least. |
|
This doesn't seem to do what it should. |
From the commit messages:
hydra-eval-jobs: Identify unexpected errors in handling aggregate jobs
The vague "[json.exception.type_error.302] type must be string, but is null" is absolutely unhelpful in the way Hydra currently handles it on evaluation.
This is handling unexpected errors only; the following commit will handle the specific instance of the previously mentioned error.
hydra-eval-jobs: Transmit original Nix error when handling aggregate jobs
It might happen that a job from the aggregate returned an error!
This is what the vague "[json.exception.type_error.302] type must be string, but is null" was all about in this instance; there was no
drvPathto stringify!So we now actively watch for errors and copy them to the aggregate job.
Note: This wasn't used inside Hydra. The evaluator was only ran directly against the repo.
This was verified on top of NixOS/nixpkgs@f262382
So I do not know if adding an
errorattribute on the aggregate job will work as expected.I'm also a bit exhausted from figuring out the error. I'll let someone that knows more about Hydra guide or fix the PR so the aggregate job properly reports that there was an error.
IdeallyThe aggregate job HAS to be failed, otherwise we would have a missing job!With this change, the job in the JSON file gets the original errors, if present.
Additionally, new unexpected errors would show up like this:
cc @andir