Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for "critical" subscriptions (message must be sent or will return error) #926

Open
skliper opened this issue Sep 30, 2020 · 16 comments

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2020

Is your feature request related to a problem? Please describe.
Software bus currently returns success even if a message isn't sent to the subscribers (queue full or over message limit). This causes the message to be dropped with no notification for the sender.

This spawned from the CF use case where notification is required to be able to eliminate the semaphore that is currently used for flow control.

Describe the solution you'd like
Add support for a subscription to be "critical". On send, check that all critical destinations have room for the message, if not don't send to any destinations and return an error. If every critical destination has room, send to all destinations. All done within the SB lock.

For the CF use case, typically the receiver would dedicate a pipe with just that subscription and the individual msg limit check is sufficient (as long as it's smaller than or equal to the queue limit).

May make sense to transition QOS to a bitfield (currently an enum), supporting the subscription critical option.

Describe alternatives you've considered
See #918, #920

Additional context
Discussed that CF should cap work per cycle (avoid free-run if unsubscribed, or no subscribers). Also generate the message once, and retain to send next cycle if there is no room.

Requester Info
Jacob Hageman - NASA/GSFC (spawned from splinter on #920)

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2020

@jwilmot @klystron78 @tngo67 @jphickey

@jphickey
Copy link
Contributor

Not to add another option - I just realized you can also use the "Generic counter" feature of ES to do a similar backpressure concept. Consider this:

  1. Allocate a generic counter to track CF packets
  2. TO increments counter as it actually sends them out the interface
  3. CF keeps its own internal counter of packets it wants to send.
  4. CF reads the generic counter from (2) above and takes the difference, which is the queue depth.
  5. CF stops generating (backpressures itself) if the difference becomes too large.

If the TO app dies or is killed then it automatically back pressures, as nothing will increment the count. It also has the advantage of being lockless and not requiring any new features of ES or SB.

That being said, I do think if the "critical" subscription type/qos is easy enough to implement, it makes sense and seems generic enough to be useful for other items too, so I'm fine with it.

@CDKnightNASA
Copy link
Contributor

Would it be reasonable to have a mechanism so that if a critical message is sent, the task for the receiver is prioritized (or even a task switch is initiated)? (I am an RTOS newb, so apologies if this is a bad idea...)

@skliper
Copy link
Contributor Author

skliper commented Oct 1, 2020

I'll probably change the name of this flag to something like error if not sent to all since "critical" is ambiguous. I'd think for your use case it'd be better to have a dedicated pipe that the task pends on, and have the task set as a high priority (and have it just do the high priority work).

@jphickey
Copy link
Contributor

jphickey commented Oct 1, 2020

Yes, in this case "critical" is not the best word - really just means "provide backpressure to sender based on my ability to ingest this data" ... Not the type of thing one would change priority over. Rather than speeding up the receiver, it will slow down the sender to match what the receiver is capable of.

Dynamic reassignment of priority is (almost) never desirable - normally the paradigm should be for the system engineer to set priorities based on how they want the tasks to perform based on their system requirements. If they want to avoid data delays, create a dedicated pipe plus child task with higher priority to handle it.

@jwilmot
Copy link

jwilmot commented Dec 3, 2020

For the use cases targeted for this change and to keep it simple:

  1. Backpressure/throttling to publisher of message to avoid overruns on serial links
  2. Indication of destination pipe buffer(s) available. (Just receipt, not acceptance) Publishers can assume one destinations is not reading the messages if a pipe buffer is always full

CFE_SB_PassMsgWithPipeReceipt(CFE_MSG_Message_t *MsgPtr) (or whatever you want to call it)

Returns positive number of buffers available after this message allocated
Return 0 if this was the last one (Queue now full)
Returns -1 if No Subscribers (number errors as needed. No preference)
Returns -2 if Queue full
Returns -x for other errors

For many subscribers to the same topic. No message is sent to any if any destination is full. Returned buffer count is lowest one. Typical use case is one to one, where publisher generates message once and then holds until send when a pipe buffer is available at the next cycle/iteration.
When implementing consider a fault condition.

  1. If publisher is active and destination(s) unsubscribe/exit, what happens to messages already in destination in pipe(s)? Pipe buffers should be cleared so that no prior state is retained.

@skliper
Copy link
Contributor Author

skliper commented Dec 3, 2020

Ok, we now just got another option that is different than the original ticket that came from the last discussion (now adding a new API). Do we really need to redesign this again?

I thought all we were going to do is set the QOS to indicate it's critical and use the same APIs.

@jwilmot
Copy link

jwilmot commented Dec 3, 2020

Sorry. didn't read the whole thread. I'm fine with the QoS approach with return codes added for "No Subscribers" and "Queue full". Is that what was intended? The positive number of pipe buffers left was just a nice to have.

@ghost
Copy link

ghost commented Dec 4, 2020

Just had a thought that a "Queue Full" error code is useful especially in the CF case. If TO stops working and CF can't send any PDUs, it can detect this and freeze the channel. (same behavior as loss of comms) Just a thought.

@jwilmot
Copy link

jwilmot commented Dec 4, 2020

You might see "Queue Full" for a few iterations but it should transition to "No Subscribers" if TO exits. If TO has no downlink does it keep reading the CF pipe? It should not. We should maybe talk about this. Can we assume that SC will freeze CP channels(s) before the end of the contact?

@skliper
Copy link
Contributor Author

skliper commented Dec 4, 2020

I'm hoping you are only asking for "no subscribers" be returned for a route with the QoS set as critical? Otherwise that's a change in the traditional behavior of send (no subscribers isn't an error for the sending app)... I'd prefer to avoid impacting other apps w/ this change.

@jwilmot
Copy link

jwilmot commented Dec 4, 2020

Yes. The additional return codes only apply if QoS is set as critical. I too want to avoid impacts to existing code. Your comment brought up an issue. What if one subscriber has QoS set as critical but other's don't? What should the behavior be?

@skliper
Copy link
Contributor Author

skliper commented Dec 4, 2020

I feel like we are repeating discussions :) "Queue full" return was the original proposal, send all or none. QoS needs to be associated with the route to return "No Subscribers", otherwise if the destination specifies it and goes away the route would be non-critical and you'd never get "no subscribers".

@jphickey
Copy link
Contributor

jphickey commented Dec 4, 2020

Has anyone considered my previous comment about using counters to do this? We could even (potentially) add a feature to the counter such that a task can wait for an increment. I think you can get all the backpressure you need (more effectively, actually) without even changing SB and risking breaking other stuff.

@skliper
Copy link
Contributor Author

skliper commented Dec 8, 2020

Has anyone considered...

Although I think it is a more elegant solution, I got the impression the request was to reduce coupling with TO. Removing the semaphore in favor of a counter doesn't seem to meet the intent. Although even the initial suggestion requires TO to subscribe with the appropriate QoS, so there is still a dependency...

@skliper skliper removed this from the 7.0.0 milestone Jan 12, 2021
@skliper
Copy link
Contributor Author

skliper commented Jan 12, 2021

Current solution to CF flow control/throttling does not require this change. Leaving as a possible enhancement for now if requirements and resources are identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants