Skip to content

[Service Bus] Throw LockLostError when AMQP link not alive#2828

Merged
ramya-rao-a merged 3 commits intoAzure:masterfrom
ramya-rao-a:message-settle-errors
May 13, 2019
Merged

[Service Bus] Throw LockLostError when AMQP link not alive#2828
ramya-rao-a merged 3 commits intoAzure:masterfrom
ramya-rao-a:message-settle-errors

Conversation

@ramya-rao-a
Copy link
Contributor

@ramya-rao-a ramya-rao-a commented May 11, 2019

Offshoot from #2761

An AMQP receiver link may die due to errors from service or network loss or due to user closing the receiver object. Messages received from such links can no longer be settled. We throw a generic error for such cases.

In the spirit of #2658, this PR makes the below changes

  • Use SesisonLockLostError or MessageLockLostError for above cases
  • Improve the error message for above cases
  • Document all errors thrown by the library when settling messages

The reason to chose the LockLostError is because the minute the AMQP receiver link dies, the service releases the lock on the message/session

@ramya-rao-a
Copy link
Contributor Author

ramya-rao-a commented May 11, 2019

An alternative is to use a new error code say MessageSettleFailed.

  • The one benefit this adds is that the user would know the difference between lock expiry and AMQP link being killed during run time. But, the difference is made clear in the error message too, so this may not be a big benefit. Differentiating between error codes is useful if at run time different actions has to be taken for the 2 different situations. But in our case, there is nothing different the user can do at run time when facing the 2 error codes.
  • One disadvantage with this alternative is that user now has to have look for multiple error names in the catch block. Since during run time, both errors would result in same action from the user (you cannot renew the lock after it expires, so you cannot do anything different), it is not helping them much during run time.

@ramya-rao-a
Copy link
Contributor Author

ramya-rao-a commented May 12, 2019

fyi, live tests are all green

The code changes in this PR are ready for review.
The only thing pending is the call on whether we use MessageLockLostError and SessionLockLostError for this scenario or use a new Error MessageSettleFailed.

But this need not block the merging of this PR. If we do decide to go with MessageSettleFailed instead, then I'll send a new PR for that on Monday.

Copy link
Member

@ramya0820 ramya0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, okay to address the MessageAlreadySettled error code usage after further discussions.

* The lock held on the message by the receiver is let go, making the message available again in
* Service Bus for another receive operation.
*
* - Throws `SessionLockLostError` (for messages from a Queue/Subscription with sessions enabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@throws in documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this format for GA, as all other user facing APIs in the library are using this format. It renders well in out docs as a bulleted list.

Logged #2853 to track the discussion around moving to using @throws and the follow up work

@ramya-rao-a ramya-rao-a merged commit 2245788 into Azure:master May 13, 2019
@ramya-rao-a ramya-rao-a deleted the message-settle-errors branch May 13, 2019 04:08
@SamJarmanPP
Copy link

Hi @ramya-rao-a we're seeing this a bit now. What is the recommended best practice for working around this error - or handling it properly?

Currently we're seeing messages being processed multiple times, because they're able to be received, but not completed (by auto-complete).

Should we turn off auto complete, complete manually, catch the error and then abandon the message?

Any guidance on best practice here would be hugely appreciated! Thank you

@ramya-rao-a
Copy link
Contributor Author

@SamJarmanPP Are you seeing the LockLostError or the ServiceUnavailableError?

@SamJarmanPP
Copy link

@ramya-rao-a we're seeing [MessageLockLostError]: Failed to complete the message as the AMQP link with which the message was received is no longer alive.

@ramya-rao-a
Copy link
Contributor Author

To settle (complete/abandon/defer/deadletter) a message, we need to use the same underlying AMQP receiver link which was used to receive the message. When this link cannot be found, we throw the [MessageLockLostError]: Failed to complete the message as the AMQP link with which the message was received is no longer alive error.

There are a couple of scenarios where we fail to find the AMQP link

  • As a user, you closed the ServiceBusClient or the QueueClient or the Receiver. This would result in the AMQP link closing and cleaned up.
    • This is most likely not your case, as this is explicit user interaction and I believe you won't be closing entities and expecting things to work :)
  • There was an non retryable error on the receiver link which results in the AMQP link being closed and cleaned up
    • This is most likely not your case either, because you are continuing to receive messages on the receiver. Please let me know if this is not the case and you stop receiving of further messages
  • There was a retryable error on the receiver link which results in the AMQP link being closed, cleaned up and the library tries to establish a new AMQP link so that you can resume receiving messages
    • This is what I'd like to believe is your case. By the time you try to complete the message, a new AMQP link has been created for you behind the scenes which is not capable of completing messages received using a different AMQP link

Next step would be to confirm which of the above category are you falling in.

You can confirm right away whether or not you fall under either of the first 2 categories.
For the third, you need to enable logs which will tell you if a link died with a retryable error and if the library established a new link for you.

Set the environment variable DEBUG to azure:service-bus* to print out logs from the library. For more on logs, please see https://github.com/Azure/azure-sdk-for-js/tree/%40azure/service-bus_1.0.3/sdk/servicebus/service-bus#enable-logs

Look for lines starting with azure:service-bus:receiver and azure:service-bus:error. You will see logs for receiver that has the receiver id/name. Look for logs "re-establishing the receiver link" which is related to bringing up a new AMQP link

@SamJarmanPP
Copy link

Hi @ramya-rao-a, thanks for the detailed response! Hugely appreciated.

I think we can rule out #1 for sure. As for #2, yes we were receiving messages still (to the point they failed to complete retry times and then were abandoned)

We haven't seen this error since the day it occurred in our environment, so I can try enable logs but I doubt if we'll ever catch one :(

Another clue from our stack trace below is the retryable: false, which leads me to believe it might be #2 somehow, but #3 sounds more right. You might gather more from this trace than I?

(Sorry it was not source mapped properly)

at vt (/app/index.js:234:351949)
ft [MessageLockLostError]: Failed to complete the message as the AMQP link with which the message was received is no longer alive.
at Module.l (/app/index.js:16:1309)
at Generator.next (u003canonymousu003e)
at Pr.u003canonymousu003e (/app/index.js:234:407588)
at Ar.complete (/app/index.js:234:382098)
info: undefined,
retryable: false,
translated: true,
name: 'MessageLockLostError',
at /app/index.js:16:1532
at Ar.u003canonymousu003e (/app/index.js:234:382615)
at Generator.next (u003canonymousu003e) {
condition: 'com.microsoft:message-lock-lost'
}
at new Promise (u003canonymousu003e)
at Cr (/app/index.js:234:385661)

@ramya-rao-a
Copy link
Contributor Author

The error [MessageLockLostError]: Failed to complete the message as the AMQP link with which the message was received is no longer alive. in itself is not retryable. Since we lost the link there is nothing that can be done to retry the attempt to complete this message.

This is not the case of case number 2
Case number 2 is that a non retryable error occurred due to which the next you tried to complete a message, you got the [MessageLockLostError]: Failed to complete the message as the AMQP link with which the message was received is no longer alive. error

@SamJarmanPP
Copy link

@ramya-rao-a you may have lost me there sorry! But I will put in the logs and see if theres anything to report in a few days :)

@ramya-rao-a
Copy link
Contributor Author

:) Feel free to log a new issue and mention me with what you have

@josegrobles
Copy link

Hello @ramya-rao-a we are facing with the same issue as @SamJarmanPP we are still investigating to see which one of the three provided scenarios is the one happening to us, at the moment I think it's probably the third one. In case this is the problem what would be the best way to handle this error? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants