Skip to content

fix(robot-server): fail proto load gracefully#18259

Merged
sfoster1 merged 1 commit intoedgefrom
exec-1450-fix-zombie-run
May 6, 2025
Merged

fix(robot-server): fail proto load gracefully#18259
sfoster1 merged 1 commit intoedgefrom
exec-1450-fix-zombie-run

Conversation

@sfoster1
Copy link
Copy Markdown
Member

@sfoster1 sfoster1 commented May 5, 2025

"Loading" a protocol, which is what we generically call the process to do everything required to get it to run, can now fail. We didn't think about this failure mode, and if it happened then we would have created some but not all of our run tracking resources, and our API would return inconsistent results (which is the bug). The orchestrator store would save the orchestrator object in a member, which then means that it would have a "current run", but since we didn't link resource allocation (or at least binding) and initialization, the exception raised after that happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the _orchestrator data member until right before we return, after we do all the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that was syntactically correct (passes ast.parse()) but did something at the file scope that would cause an error (like an import failure - a division by 0 would work too). This will fail when we exec the protocol to check for the presence of runtime parameters, which happens during the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of problems so it doesn't have to be prioritized.

Closes EXEC-1450

@sfoster1 sfoster1 requested a review from a team as a code owner May 5, 2025 18:49
@sfoster1 sfoster1 requested a review from SyntaxColoring May 5, 2025 18:50
Copy link
Copy Markdown
Contributor

@SyntaxColoring SyntaxColoring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Affirming my former (accidental) approval with this (intentional) approval. Thank you!

"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Closes EXEC-1450
@sfoster1 sfoster1 force-pushed the exec-1450-fix-zombie-run branch from 7be77cb to 1bd8021 Compare May 6, 2025 14:19
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 23.62%. Comparing base (bfefd85) to head (1bd8021).
Report is 5 commits behind head on edge.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             edge   #18259   +/-   ##
=======================================
  Coverage   23.62%   23.62%           
=======================================
  Files        3055     3055           
  Lines      255734   255734           
  Branches    30114    30114           
=======================================
  Hits        60419    60419           
  Misses     195301   195301           
  Partials       14       14           
Flag Coverage Δ
protocol-designer 19.06% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sfoster1 sfoster1 merged commit fe8d402 into edge May 6, 2025
37 checks passed
@sfoster1 sfoster1 deleted the exec-1450-fix-zombie-run branch May 6, 2025 15:11
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 16, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 17, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 19, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 19, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 19, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 20, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 20, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 22, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 23, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 24, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 24, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 29, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 29, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
ddcc4 pushed a commit that referenced this pull request May 29, 2025
"Loading" a protocol, which is what we generically call the process to
do everything required to get it to run, can now fail. We didn't think
about this failure mode, and if it happened then we would have created
some but not all of our run tracking resources, and our API would return
inconsistent results (which is the bug). The orchestrator store would
save the orchestrator object in a member, which then means that it would
have a "current run", but since we didn't link resource allocation (or
at least binding) and initialization, the exception raised after that
happens would mean it never went into the database.

The immediate fix is to not bind the orchestrator object to the
_orchestrator data member until right before we return, after we do all
the possibly-failing orchestration.

The thing that would make this fail is to have a python protocol that
was syntactically correct (passes ast.parse()) but did something at the
file scope that would cause an error (like an import failure - a
division by 0 would work too). This will fail when we exec the protocol
to check for the presence of runtime parameters, which happens during
the execution of this function, triggering the bug.

Luckily this is one of those "the error is presented badly" type of
problems so it doesn't have to be prioritized.

Closes EXEC-1450

(cherry picked from commit fe8d402)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants