Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2 - bug fix] Queue MQTT C2D messages if callback not set #3336

Merged
merged 6 commits into from
Jun 13, 2023

Conversation

tmahmood-microsoft
Copy link
Contributor

@tmahmood-microsoft tmahmood-microsoft commented Jun 5, 2023

Fix for #3335

In v2, if incoming message callback has not been set, messages are still received because cleanSession = false (default) and the client subscribes to topic from last session. This causes messages to be received but the callback has not been set and the message is eventually completed, since MQTT messages cannot be rejected or abandoned.
This PR adds changes where we queue the received messages if callback has not been set and processes them later when callback has been set.

@@ -28,11 +28,13 @@ namespace Microsoft.Azure.Devices.Client.Transport
internal sealed class MqttTransportHandler : TransportHandlerBase, IDisposable
{
private const int ProtocolGatewayPort = 8883;
private const int MessageQueueSize = 10;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the default queue size to 10. The client would always keep track of latest 10 messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect people are going to want to modify this value - why don't we expose this as a settable property?

@tmahmood-microsoft tmahmood-microsoft changed the title [v2 - bug fix] Queue MQTT messages if callback not set [v2 - bug fix] Queue MQTT C2D messages if callback not set Jun 5, 2023
@timtay-microsoft
Copy link
Member

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

private async Task HandleReceivedMessageAsync(MqttApplicationMessageReceivedEventArgs receivedEventArgs)
{
receivedEventArgs.AutoAcknowledge = false;
string topic = receivedEventArgs.ApplicationMessage.Topic;

if (topic.StartsWith(_deviceBoundMessagesTopic, StringComparison.InvariantCulture))
{
await HandleReceivedCloudToDeviceMessageAsync(receivedEventArgs).ConfigureAwait(false);
await HandleReceivedCloudToDeviceMessageAsync(ProcessC2DMessage(receivedEventArgs)).ConfigureAwait(false);
await receivedEventArgs.AcknowledgeAsync(CancellationToken.None).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments here explaining that messages are being added to a queue for later processing, and are being ack'ed immediately

Logging.Error(this, $"Failed to send the acknowledgement for a received cloud to device message {ex}"); ;
{
Logging.Info(this, $"Queue size of {MessageQueueSize} for C2D messages has been reached, removing oldest queued C2D message. " +
$"To avoid losing further messages, set SetIncomingMessageCallbackAsync() to process the messages.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this into doc comments. Users don't usually look at logs until something really goes wrong.

@abhipsaMisra
Copy link
Member

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

I don't think the SDK lets you do anything until the client is opened.

@timtay-microsoft
Copy link
Member

timtay-microsoft commented Jun 6, 2023

Could we just ask users to set the callback before calling open? Or is this to handle a case other than explicit open?

I don't think the SDK lets you do anything until the client is opened.

So we would have this same problem with twin/direct method messages, right? I'd prefer if we allow users to set the callbacks before opening the client, but that may be a more involved fix than we need right now. Because of that, I won't push back if this is the approach we take.

Alternatively, we could opt to make cleanSession = true the default behavior. That way users can open their clients, then set their callbacks (which subscribe to the twin/c2d/methods messages) without risk of losing any. It makes things a bit more tedious for users since they'd either have to set their callback after each open call or opt-in to cleanSession=false, but I don't have any other issues with it.
Scratch that. Reconnection scenarios would be a nuisance as well

@abhipsaMisra
Copy link
Member

This is technically a workaround for service's current behavior. Ideally, a c2d message would have been published with a QoS of 1, and the client would choose to not ack it and service would redeliver them at some frequency. Once the user sets the callback, those messages could then be ack'ed.
Since service doesn't queue and redeliver unack'ed messages currently, the client needs to do that.

Alternatively, yeah, we could ask users to set the callback before opening the clients. I think that is a reasonable ask but yeah, it will require a bigger change.

Direct methods inherently require there to be an opened client (or, soon to-be opened client), so it isn't that big of an issue there. Deliver once, if there is no ack then fail immediately. Nothing gets queued.
Twin updates aren't persisted either. Hub recommends that you call GetTwin explicitly to retrieve the last known state. So, this isn't really an issue there either.

@bastyuchenko
Copy link

bastyuchenko commented Jun 6, 2023

Hello,
As the issue #3335 creator I would like to share my thoughts:

  • this PR suggests having an offline queue, but we already have IoT Hub that works as a queue. If I need to have offline queue, I can create it by myself in my solution. I expect that SDK is a wrapper for an existing Azure service but the PR suggests an implementation of its own "message broker" logic with "offline queue" that is limited by MessageQueueSize and is not obvious for developers who will consume this SDK.
  • set callback before opening connection can be the solution but.... currently it works in the following way
  1. send a few c2d messages to IoT Hub
    Actual Result: the messages have appeared in IoT Hub
  2. open a connection from a device side
    Actual Result: the messages were removed
  3. send new c2d messages to IoT Hub - the messages removed only because the connection was opened, and no callback was specified
    Actual Result: the messages were removed

Let's imagine I want only communicate with Device Twin. I have to open connection to communicate with DeviceTwin. It means, behind-the-scenes, I always open such a 'black hole' that swallows all my telemetry messages only because I've opened connection to access to Device Twin. It's not what I expected, it is "side effect", it is not obvious, it obliges me to set a callback even I don't want to listen IoT Hub messages right now.


Ideally, a c2d message would have been published with a QoS of 1, and the client would choose to not ack it and service would redeliver them at some frequency. Once the user sets the callback, those messages could then be ack'ed. Since service doesn't queue and redeliver unack'ed messages currently, the client needs to do that.

I'm sorry in advance, maybe I didn't understand your point of view correctly but it sounds for me a little bit as an "override of cleanSession property behavior". Moreover, does it mean that an implementation of message feedback callback will become mandatory on service side?

Is it possible to open a connection but do not consume messages from IoT Hub if MQTT is used?
BTW, how it works in .NET SDK v1? It looks .NET SDK v1 doesn't have such a problem. I open connection but messages do not disappear from IoT Hub (yes, I know that it is not necessary to open connection before set callback in .NET SDK v1).

Is it possible to check how it implemented in other SDKs (i.e. Python or C SDK). It would be great to have some consistency among SDKs.

@abhipsaMisra
Copy link
Member

Thanks for sharing your inputs, @bastyuchenko , we appreciate it.

this PR suggests having an offline queue, but we already have IoT Hub that works as a queue. If I need to have offline queue, I can create it by myself in my solution. I expect that SDK is a wrapper for an existing Azure service but the PR suggests an implementation of its own "message broker" logic with "offline queue" that is limited by MessageQueueSize and is not obvious for developers who will consume this SDK.

IoT Hub is a message queue that will store messages for an offline client if the client had previously connected with CleanSession set to false. It will do so until the client reconnects again with CleanSession set to true. This happens as per contract today.
Now, let us consider the subsequent steps for this client:

  • once the client connects, all pending messages will be delivered to it immediately. IoT Hub will not wait until it subscribes to C2D messages to deliver the offline messages to the client. This is because it has connected with CleanSession set to false.
  • for the SDK, it can process these messages only if the user can a callback set. Otherwise, it doesn't have anything to do with these messages. Ideally, the client would not ack these messages, and these being sent with QoS 1 they'd eventually get redelivered. This is the bit that doesn't happen today. IoT hub will send the C2D messages immediately on reconnection, and then never resend them (irrespective of if they were ack'ed or not. This is something the hub team is aware of, and they have it on their plan to fix).
  • since the messages were delivered as soon as the client connected, and they won't be redelivered by the service again, there are two options:
    • force the client to subscribe to C2D messages if it wants to be able to process messages sent when offline, OR
    • maintain an internal queue with these offline messages that can be delivered once the client subscribes to C2D messages

I agree that both approaches can have their cons:

  • for approach (1) - you are forced to subscribe to and process C2D messages as soon as the client connects, even if that is not what you want. One point of view is that MQTT spec defines that when clients connect with CleanSession set to false they should be ready to process offline messages as soon as they come online.
  • for approach (2) - I agree that this implements a client-side queue on top of service's queue. We'd like to avoid this as well, however, with service's current behavior we don't have many more options.

maybe I didn't understand your point of view correctly but it sounds for me a little bit as an "override of cleanSession property behavior". Moreover, does it mean that an implementation of message feedback callback will become mandatory on service side?

Could you explain what you meant by these two statements? I'm not sure if the above explanation satisfies your question here, but if it doesn't, I'd like to understand the question a bit better.

Is it possible to open a connection but do not consume messages from IoT Hub if MQTT is used?

The CleanSession property will direct hub to deliver offline messages as soon as the client reconnects. You can set CleanSession to true and forgo offline message queuing, but there isn't a way to store offline C2D messages and then deliver them after you subscribe to the topic.

BTW, how it works in .NET SDK v1? It looks .NET SDK v1 doesn't have such a problem. I open connection but messages do not disappear from IoT Hub (yes, I know that it is not necessary to open connection before set callback in .NET SDK v1).

The .NET SDK v1 also has this internal message queue in place, similar to the implementation in this PR.

@bastyuchenko
Copy link

Thanks for the detailed explanation, @abhipsaMisra

Could you explain what you meant by these two statements? I'm not sure if the above explanation satisfies your question here, but if it doesn't, I'd like to understand the question a bit better.

yes, above explanation fully satisfies my question here.

@tmahmood-microsoft,
from https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-messages-c2d

When a device wants to receive a message, the IoT hub locks the message by setting the state to Invisible. This state allows other threads on the device to start receiving other messages.

All messages that are stored in "local offline queue" you implemented in this PR will be marked as Completed in IoT Hub or just stay Invisible (means returns after lock time-out)?

@tmahmood-microsoft
Copy link
Contributor Author

Hi @bastyuchenko, all the messages that are stored in "local offline queue" implemented in this PR will be marked as Completed in IoT Hub.

Copy link
Member

@abhipsaMisra abhipsaMisra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for this:

  • c2d messages aren't lost when you reconnect with no callback set
  • c2d messages are redelivered at a later point when the client subscribes to the callback

@tmahmood-microsoft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tmahmood-microsoft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tmahmood-microsoft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tmahmood-microsoft tmahmood-microsoft merged commit 10e631e into previews/v2 Jun 13, 2023
@tmahmood-microsoft tmahmood-microsoft deleted the tmahmood/Queue-mqtt-messages branch June 13, 2023 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants