Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duchy mill writing output blob error should be transient. #1644

Closed
renjiezh opened this issue Jun 5, 2024 · 3 comments
Closed

Duchy mill writing output blob error should be transient. #1644

renjiezh opened this issue Jun 5, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@renjiezh
Copy link
Contributor

renjiezh commented Jun 5, 2024

Describe the bug
Error during writing output blob to storage by mill is categorized as permanent error thus fail the computation. However, the cause of it could be this instability of cloud storage and retry is possible to resolve it.

Steps to reproduce
Run stress test and there is chance to reproduce.

Component(s) affected
Duchy

Version
v0.5.5

Environment
halo-cmm-qa

Additional context

externalComputationId= 491229066843608164

“COMPUTATION_PARTICIPANT_FAILED","message":"Computation Participant failed. We encountered an internal error. Please try again.
T8bECdEZ2y8@aggregator-liquid-legions-v2-mill-daemon-deployment-86dfc7sv8gx: We encountered an internal error. Please try again.
com.google.cloud.storage.StorageException: We encountered an internal error. Please try again.
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
	at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
	at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
	at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
{
  "code" : 503,
  "errors" : [ {
    "domain" : "global",
    "message" : "We encountered an internal error. Please try again.",
    "reason" : "backendError"
  } ],
  "message" : "We encountered an internal error. Please try again."
}
	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
	... 17 more
@renjiezh renjiezh added the bug Something isn't working label Jun 5, 2024
@renjiezh
Copy link
Contributor Author

renjiezh commented Jun 5, 2024

An instance of storage error with code 503

EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
	at com.google.cloud.storage.Blob.writer(Blob.java:1027)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
	... 18 more
{
errorGroups: [1]
insertId: "lwaxv6u25774qnrl"
labels: {3}
logName: "projects/halo-cmm-qa/logs/stderr"
receiveTimestamp: "2024-06-05T04:13:56.564922711Z"
resource: {2}
severity: "ERROR"
sourceLocation: {1}
textPayload: "EsxcQyBbxQ8@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bcngdpf: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
com.google.cloud.storage.StorageException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1062)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$0(ResumableMedia.java:40)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.ResumableMedia.lambda$startUploadForBlobInfo$1(ResumableMedia.java:34)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:683)
	at com.google.cloud.storage.StorageImpl.writer(StorageImpl.java:95)
	at com.google.cloud.storage.Blob.writer(Blob.java:1027)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:58)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.http.HttpResponseException: 503 Service Unavailable
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?name=computations/EsxcQyBbxQ8/EXECUTION_PHASE_THREE/1&uploadType=resumable
Service Unavailable
	at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.open(HttpStorageRpc.java:1055)
	... 18 more
"
timestamp: "2024-06-05T04:13:53.368Z"

@renjiezh
Copy link
Contributor Author

renjiezh commented Jun 5, 2024

An instance of error code 502

B7zWvx_2Ipk@worker1-liquid-legions-v2-mill-daemon-deployment-75b95f7bclgs2x: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

com.google.cloud.storage.StorageException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

	at com.google.cloud.storage.StorageException.translate(StorageException.java:170)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:329)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:409)
	at com.google.cloud.storage.StorageImpl.lambda$internalCreate$2(StorageImpl.java:213)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.storage.Retrying.run(Retrying.java:65)
	at com.google.cloud.storage.StorageImpl.run(StorageImpl.java:1524)
	at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:210)
	at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:142)
	at org.wfanet.measurement.gcloud.gcs.GcsStorageClient$writeBlob$2.invokeSuspend(GcsStorageClient.kt:56)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:108)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:103)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 502 Bad Gateway
POST https://storage.googleapis.com/upload/storage/v1/b/halo-cmm-qa-bucket/o?projection=full&uploadType=multipart
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>502.</b> <ins>That’s an error.</ins>
  <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.  <ins>That’s all we know.</ins>

	at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:118)
	at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:37)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:570)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:493)
	at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:603)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:406)
	... 17 more

@renjiezh
Copy link
Contributor Author

Fixed by PR #1731

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant