Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation failed due to loss of intermediate data #1700

Open
renjiezh opened this issue Jul 25, 2024 · 1 comment
Open

Computation failed due to loss of intermediate data #1700

renjiezh opened this issue Jul 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@renjiezh
Copy link
Contributor

renjiezh commented Jul 25, 2024

Describe the bug
Computation failed due to empty input to a certain stage.
The failure reason from kingdom is "Computation Participant failed. INVALID_ARGUMENT: There is neither actual data nor effective noise in the request." It is caused by an empty input to PHASE_THREE.
After check the cloud storage, it appears worker1 PHASE_TWO and following stages' folders are empty.
From the log, neither worker1 or aggregator's mill has any error message or retry.

The aggregator's PHASE_TWO completes successfully but the output is empty given that PHASE_TWO_INPUT from worker1 is not empty.

Steps to reproduce
Run stress test and there is a chance to reproduce.

Component(s) affected
Duchy

Version
v0.5.6

Environment
QA env

Additional context
MeasurementId = 8441441368107331282

@renjiezh renjiezh added the bug Something isn't working label Jul 25, 2024
@renjiezh
Copy link
Contributor Author

Related issue:
All duchies are sharing one bucket in GCS. They also use the same folder name (ComputationGlobalId) for a given Computation. Due to interruption and retry mechanism, it is possible to overwrite each other's target file content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant