Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 408 on High Load save #833

Closed
daneshka opened this issue Aug 27, 2019 · 21 comments
Closed

Getting 408 on High Load save #833

daneshka opened this issue Aug 27, 2019 · 21 comments
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@daneshka
Copy link

daneshka commented Aug 27, 2019

Environment details

  • OS: alpine
  • Node.js version: 10.16
  • npm version: 6.9.0
  • @google-cloud/storage version:

Steps to reproduce

The error only happens on high load save on the same Bucket.
We use GUID in the URI.
It sends HTML in the error message!

{
  "message":"Cannot parse response as JSON: <!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 408 (Request Timeout)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>408.</b> <ins>That’s an error.</ins>\n  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>\n",
  "exception":{
    "exceptionMessage":"Cannot parse response as JSON: <!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 408 (Request Timeout)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>408.</b> <ins>That’s an error.</ins>\n  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>\n",
    "stacktrace":"InternalError: Cannot parse response as JSON: <!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 408 (Request Timeout)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>408.</b> <ins>That’s an error.</ins>\n  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>\n\n    at GcsClient.filterError (/opt/bss/dist/src/services/gcs-client.js:63:16)\n    at getFile.save.catch.error (/opt/bss/dist/src/services/gcs-client.js:40:42)\n    at process._tickCallback (internal/process/next_tick.js:68:7)"
  },
  "code":13,
  "name":"InternalError",
  "level":"ERROR",
  "loggerName":"bss:middleware:error",
  "traceId":"0",
  "spanId":"0",
  "requestId":"7f975f2e-3516-9d0a-bafe-eb91672d355a",
  "date":"2019-08-27T21:06:21.013Z"
}

The main message in the HTML is:

408. That’s an error. Your client has taken too long to issue its request. That’s all we know.

Thanks!

@laljikanjareeya
Copy link
Contributor

A few questions to help us work out what's happening:

  • what is the size of your data/file that you are uploading?
  • Does this happen all the time or just occasionally?
  • Please Provide code sample that you are using.

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Aug 28, 2019
@AVaksman AVaksman added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Aug 28, 2019
@AVaksman
Copy link
Contributor

@daneshka Please provide the @google-cloud/storage version:

@daneshka
Copy link
Author

daneshka commented Aug 28, 2019

@laljikanjareeya @AVaksman
The @google-cloud/storage version is: 3.2.0
All files are 10 KB
We had this issue all the time in the last couple of days while we were doing a load test before going to production. I'm not sure if the issue exists before because we did other load tests on download and not on save.
Please note that we are sharing the same Storage and Bucket object for all of the calls.

    saveFile(fileName: string, bytes: Buffer): Promise<void> {
        const options = {
            resumable: false,
        };
        return this.getFile(fileName).save(bytes, options)
            .catch(error => { throw this.filterError(error); });
    }

Here you can see the rate of this error during the last 12 hours:
image

@bcoe bcoe added needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. external This issue is blocked on a bug with the actual product. and removed triage me I really want to be triaged. needs more info This issue needs more information from the customer to proceed. labels Aug 28, 2019
@crwilcox
Copy link
Contributor

@daneshka, have you tried retrying on error? If you retry after the 408 is it successful?

@daneshka
Copy link
Author

@crwilcox
No, We only use the built-in retry feature.
I can implement a retry and get back to you.

@crwilcox
Copy link
Contributor

@daneshka it would be interesting to understand if this is a transient error. I imagine it is and we can likely harden the client around this.

@daneshka
Copy link
Author

We didn't see the issue after adding our own retry on top of the library.

@stephenplusplus
Copy link
Contributor

@crwilcox how do you think we should proceed here? Is there anything we can do?

@stephenplusplus
Copy link
Contributor

Our uploads all use our streaming API underneath the surface, so file.save(blob) is akin to file.createWriteStream({resumable:false}).end(blob). The streaming APIs are not able to resend data after a response from the server-- it can only do that if there was an issue preflight. Once data goes out from a stream, we can't rewind it arbitrarily to get it back.

However, I don't think there's a reason file.save() needs to use the streaming API underneath. We can just send the request like any other, putting the blob in the request body. Then, it would benefit from our normal exponential retry logic.

Although, 408 error codes are not currently being retried. We have a small whitelist of retried error codes: 429, 500, 502, and 503.

The benefits of making the change to file.save() may be unsubstantial, however, as this would only improve the file.save() story without having any effect on file.createWriteStream() users.

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 29, 2020
@surjikal
Copy link
Contributor

surjikal commented Mar 10, 2020

Just got this error today when writing images via createWriteStream

@simllll
Copy link

simllll commented Apr 1, 2020

Ran into the same issue today.
we use

 const fileStream = myFile.createWriteStream({
			// Support for HTTP requests made with `Accept-Encoding: gzip`
			metadata: {
				contentType: 'application/json'
				// contentEncoding: 'gzip'
			},
			gzip: true,
			resumable: false,
			validation: 'md5'
			// validation: false
		});

to create the stream, and also catch all errors of the stream, but we get an uncatchable error when the response is 408:
Uncaught exception in ApiError: Cannot parse response as JSON: <!DOCTYPE html>

Error: Cannot parse response as JSON: <!DOCTYPE html> // truncated..whole html body is here
  at Util.parseHttpRespBody (/var/app/current/node_modules/@google-cloud/common/build/src/util.js:188:42)
    at Util.handleResp (/var/app/current/node_modules/@google-cloud/common/build/src/util.js:134:117)
    at /var/app/current/node_modules/@google-cloud/common/build/src/util.js:432:22
    at onResponse (/var/app/current/node_modules/retry-request/index.js:206:7)
    at /var/app/current/node_modules/teeny-request/build/src/index.js:170:17
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:94:5) { killProcess: true, job: 'backend/job.js,backend/syncData.js' }

As far as I read correctly, adding 408 to retryable errors would solve the issue? I've added the shouldRetryRequests function on our end to test this, and it looks good so far. Wouldn't that be sufficient? @stephenplusplus
Still curious why I cannot catch this kind of stream error.

@simllll
Copy link

simllll commented Apr 1, 2020

Regarding my uncatchable error, could it be that createWriteStream does not forward errors correclty? createWriteStream is using dupleify behind the scene, but there is no on('error') method registered on the "fileWriteStream". Not sure, just an idea I thought I write down.

@stephenplusplus
Copy link
Contributor

@simllll is there any chance you could provide a runnable example and any more details about what is happening when the error occurs?

@simllll
Copy link

simllll commented Apr 1, 2020

That's the struggle with this issue, it only happens on production, and I believe it is related either to high load (like the OP) or due to too many parallel transactions. I couldn't find a way to reproduce the original timeout issue, but this one is gone since adding the 408 to retryable errors.
To reproduce just the non catchable error, I was able to reproduce this by adding

throw new Error('TEST');
in the handleResp(err, resp, body, callback) function of @google-cloud/common/build/src/util.ts

I was not able to catch this error so far, probably because it's not a stream error itself, it's a thrown error (like the one where JSON cannot be parsed I guess)

@simllll
Copy link

simllll commented Apr 1, 2020

Indeed, if I wrap the handleResp in a try / catch block and catch the error:

try {
... 
} catch (err) {
  callback(err);
}

I'm able to catch the error :)

See here for details:
googleapis/nodejs-common@master...simllll:patch-2

What I still don't understand is, why this solves the JSON parse exception though (It does indeed; I already have reproduced it successfully on production). The JSON parse error is not thrown directly, it's just passed back as "err" parameter within the parsedHttpRespBody object. I can't see what I'm missing here.

update 1

it seems it has something todo with the "retry" logic.. I just saw the uncatchable exception is (besides the on('error') is called now) still there. I just didn't catch my promise in the test, my fault. ignore.

write stream 2 has failed
Error: TEST
    at Util.handleResp (/home/simon/Dev/hokify/hokify-server/node_modules/@google-cloud/common/build/src/util.js:136:12)
    at /home/simon/Dev/hokify/hokify-server/node_modules/@google-cloud/common/build/src/util.js:441:22
    at onResponse (/home/simon/Dev/hokify/hokify-server/node_modules/retry-request/index.js:206:7)
    at /home/simon/Dev/hokify/hokify-server/node_modules/teeny-request/build/src/index.js:170:17
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
(node:221396) UnhandledPromiseRejectionWarning: Error: TEST
    at Util.handleResp (/home/simon/Dev/hokify/hokify-server/node_modules/@google-cloud/common/build/src/util.js:136:12)
    at /home/simon/Dev/hokify/hokify-server/node_modules/@google-cloud/common/build/src/util.js:441:22
    at onResponse (/home/simon/Dev/hokify/hokify-server/node_modules/retry-request/index.js:206:7)
    at /home/simon/Dev/hokify/hokify-server/node_modules/teeny-request/build/src/index.js:170:17
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

update 2

super strange, I'm not able to reproduce this issue. But here is something strange with the uncaught error. it seem it has a message and a body... but according to the APIError class, it should only have set the "message". where does code, errors, response etc get set, if the constructor cancels right after this.message in case it's a non object that is passed to it? the error object is merged, therefore it makes sense. also looked into the extend function, if this function could cause this, but couldn't find anything useful. It works if I try / catch the error though, but couldn't find a logical reason for this.

Uncaught exception in ApiError: Cannot parse response as JSON: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 408 (Request Timeout)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>408.</b> <ins>That’s an error.</ins>
  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>

    at Util.parseHttpRespBody (/var/app/current/node_modules/@google-cloud/common/build/src/util.js:188:42)
    at Util.handleResp (/var/app/current/node_modules/@google-cloud/common/build/src/util.js:134:117)
    at /var/app/current/node_modules/@google-cloud/common/build/src/util.js:432:22
    at onResponse (/var/app/current/node_modules/retry-request/index.js:206:7)
    at /var/app/current/node_modules/teeny-request/build/src/index.js:170:17
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:94:5) {
  **message**: 'Cannot parse response as JSON: <!DOCTYPE html>\n' +
    '<html lang=en>\n' +
    '  <meta charset=utf-8>\n' +
    '  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">\n' +
    '  <title>Error 408 (Request Timeout)!!1</title>\n' +
    '  <style>\n' +
    '    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n' +
    '  </style>\n' +
    '  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n' +
    '  <p><b>408.</b> <ins>That’s an error.</ins>\n' +
    '  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>\n',
  **response**: PassThrough {
    _readableState: ReadableState {
      objectMode: false,
      highWaterMark: 16384,
      buffer: BufferList { head: null, tail: null, length: 0 },
      length: 0,
      pipes: null,
      pipesCount: 0,
      flowing: true,
      ended: true,
      endEmitted: true,
      reading: false,
      sync: false,
      needReadable: false,
      emittedReadable: false,
      readableListening: false,
      resumeScheduled: false,
      paused: false,
      emitClose: true,
      autoDestroy: false,
      destroyed: false,
      defaultEncoding: 'utf8',
      awaitDrain: 0,
      readingMore: false,
      decoder: null,
      encoding: null
    },
    readable: false,
    _events: [Object: null prototype] {
      prefinish: [Function: prefinish],
      error: [Array],
      data: [Function],
      end: [Function]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    _writableState: WritableState {
      objectMode: false,
      highWaterMark: 16384,
      finalCalled: false,
      needDrain: false,
      ending: true,
      ended: true,
      finished: true,
      destroyed: false,
      decodeStrings: true,
      defaultEncoding: 'utf8',
      length: 0,
      writing: false,
      corked: 0,
      sync: false,
      bufferProcessing: false,
      onwrite: [Function: bound onwrite],
      writecb: null,
      writelen: 0,
      afterWriteTickInfo: null,
      bufferedRequest: null,
      lastBufferedRequest: null,
      pendingcb: 0,
      prefinished: true,
      errorEmitted: false,
      emitClose: true,
      autoDestroy: false,
      bufferedRequestCount: 0,
      corkedRequestsFree: [Object]
    },
    writable: false,
    allowHalfOpen: true,
    _transformState: {
      afterTransform: [Function: bound afterTransform],
      needTransform: false,
      transforming: false,
      writecb: null,
      writechunk: null,
      writeencoding: 'buffer'
    },
    statusCode: 408,
    statusMessage: 'Request Timeout',
    request: {
      agent: [Agent],
      headers: [Object],
      href: 'https://storage.googleapis.com/upload/storage/v1/b/bigquery-sync/o?uploadType=multipart&name=bigquery-upload-joblogs-1585740331077-1.json'
    },
    **body**: '<!DOCTYPE html>\n' +
      '<html lang=en>\n' +
      '  <meta charset=utf-8>\n' +
      '  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">\n' +
      '  <title>Error 408 (Request Timeout)!!1</title>\n' +
      '  <style>\n' +
      '    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n' +
      '  </style>\n' +
      '  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n' +
      '  <p><b>408.</b> <ins>That’s an error.</ins>\n' +
      '  <p>Your client has taken too long to issue its request.  <ins>That’s all we know.</ins>\n',
    headers: {
      'alt-svc': 'quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,h3-T050=":443"; ma=2592000',
      connection: 'close',
      'content-length': '1557',
      'content-type': 'text/html; charset=UTF-8',
      date: 'Wed, 01 Apr 2020 11:37:20... 

@alarmatwork
Copy link

alarmatwork commented Apr 17, 2020

Getting this "Error 408" quite a lot lately. It's making storage very unreliable, it messes up out process a lot, even our load is minimal for allmighty cloud infra (about 1K requests per month) and filesize is ~ 400KB

@pebo
Copy link

pebo commented Apr 25, 2020

Same issue here. We're seeing an increase in 408 errors lately. It would be great if the retry PR googleapis/nodejs-common#562 could be merged.

@stephenplusplus
Copy link
Contributor

@pebo that PR wasn't moving too quickly, so I've thrown together a new one: googleapis/nodejs-common#578. Hopefully, we can merge and release soon.

@pebo
Copy link

pebo commented May 12, 2020

@stephenplusplus thanks for the PR, but after checking our latest error latest logs this PR (which adds retries for 408) might make GCS behave even worse for us.

We're occasionally seeing very long upload times (>6minutes!) for tiny 90byte files before GCS throws the exception mentioned in this issue.

The code is running in Cloud Run on the node:12-slim base image and the container had successfully written to GCS a few seconds before the request that caused 408 and successfully handles writes triggered by other invokations just after the failing operation.

Using @google-cloud/storage v4.7.0 the code path that fails with 408:

    const options = {
      resumable: false,
      contentType: 'application/json'
    };
    await file.save(JSON.stringify(someObject), options);

Can we reduce the timeouts for writes to GCS (then retries makes sense)?

@stephenplusplus
Copy link
Contributor

We now retry 408 errors and allow options.timeout if that helps working around this issue. I'm going to close, but please let me know if there's more we can do.

@evil-shrike
Copy link

I'm on "@google-cloud/storage": "^7.1.0",
but it's still happening for me on high load (many parallel Cloud Run instances writing files on the same bucket)
WriteStreams are created as:

      writeStream = file.createWriteStream({
        // surprisingly setting highWaterMark is crucial,
        // w/ o it we'll get unlimited memory growth
        highWaterMark: 1024 * 1024,
        timeout: 0,
        resumable: true,
        validation: false,
      });

In logs I see some html error. Why are they HTML?! Isn't it a bug by itself (returning errors in HTML on API requests)?

Error: Retry limit exceeded - <!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 408 (Request Timeout)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>408.</b> <ins>That’s an error.</ins>
<p>Your client has taken too long to issue its request. <ins>That’s all we know.</ins>
at Upload.attemptDelayedRetry (/workspace/node_modules/@google-cloud/storage/build/src/resumable-upload.js:705:26)
at Upload.onResponse (/workspace/node_modules/@google-cloud/storage/build/src/resumable-upload.js:669:18)
at Upload.makeRequestStream (/workspace/node_modules/@google-cloud/storage/build/src/resumable-upload.js:655:40)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Upload.startUploading (/workspace/node_modules/@google-cloud/storage/build/src/resumable-upload.js:496:26)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet