[Storage] retry on incomplete XML responses#13076
Conversation
When service times out (default max 30s) it terminates the connection but current stable version of `node_fetch` doesn't report error. Instead it returns the incomplete response which leads to XML parse error. It's unlikely that service would send back incomplete response on purpose so it doesn't hurt to treat this error as a `TIMEOUT` error and retry the request. The deserialization policy factory needs to move below retry policy factory so parse error from deserialization can be retried.
|
/azp run js - storage-blob - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run js - storage-blob - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@ljian3377 please have a look. If looking good I will change queue/fileshare/etc. |
|
The ultimate fix for partial response should be upgrading to node-fetch 3.x? Please leave the original issue open till we do that. The change looks good but I not sure if it's useful. Have you tested this in a real environment? Does this fix mitigate the issue? |
Possibly although I've not tested it. In v3.x we might get an error from node-fetch which we would retry, instead of incomplete response. v3.x is in beta now. I am not sure about its timeline.
Yes it helps on running the repro code. There's still possibility of getting the same error in each retry if unlucky, but it's better than before. |
After changing the order of deserialization policy and retry policy, `error.code` is now populated properly by deserialization policy. This surfaces an issue where an error with code `ResourceNotFound` will also be retried because it contains `eNotFound` and we use `error.code.toString().toUpperCase().includes()` to see if the error is in the list. It passed the check for the network error code `ENOUTFOUND`. This change fixes it by using exact match when checking error code.
|
/azp run js - storage-blob - tests |
|
/azp run js - storage-file-share - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
| .toString() | ||
| .toUpperCase() | ||
| .includes(retriableError)) | ||
| (err.code && err.code.toString().toUpperCase() === retriableError) |
There was a problem hiding this comment.
I think this is right but let's also check with @XiaoningLiu
|
Logged #13119 |
xirzec
left a comment
There was a problem hiding this comment.
Nice! This ended up being a pretty elegant solution.
When service times out (default max 30s) it terminates the connection
but current stable version of
node_fetchdoesn't reporterror. Instead it returns the incomplete response which leads to XML
parse error. It's unlikely that service would send back incomplete
response on purpose so it doesn't hurt to treat this error as a
TIMEOUTerror and retry the request.The deserialization policy factory needs to move below retry policy
factory so parse error from deserialization can be retried.
After changing the order of deserialization policy and retry policy,
error.codeis now populated properly by deserialization policy. Thissurfaces an issue where an error with code
ResourceNotFoundwillalso be retried because it contains
eNotFoundand we useerror.code.toString().toUpperCase().includes()to see if the erroris in the list. It passed the check for the network error code
ENOUTFOUND. This change fixes it by using exact match when checkingerror code.