Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cFE Tlm Packet 64-bit alignment issue #488

Closed
dmknutsen opened this issue Jan 22, 2020 · 31 comments · Fixed by #1077 or #1109
Closed

cFE Tlm Packet 64-bit alignment issue #488

dmknutsen opened this issue Jan 22, 2020 · 31 comments · Fixed by #1077 or #1109
Assignees
Milestone

Comments

@dmknutsen
Copy link
Contributor

Describe the bug
Compiler add padding is applied to cFE Telemetry packets if the data region does not start on a 64 bit boundary and the packet contains a 64 bit variable.

To Reproduce
Steps to reproduce the behavior:

  1. Nominal Build Process.
  2. Enable Tlm.
  3. Bit bust a telemetry packet containing a 64 bit variable.

Expected behavior
No compiler added padding should be applied.

System observed on:
Ubuntu 64-bit - 19.10

Reporter Info
Dan Knutsen
NASA/GSFC

@jphickey
Copy link
Contributor

The solution to this is EDS. There is no easy solution as long as the TLM is defined directly as a C structure and the TLM header is 12 bytes (note the header can be smaller or larger too, depending on timestamp resolution).

@skliper
Copy link
Contributor

skliper commented Jan 22, 2020

CCB 20200122 - Touched on possible short term solutions (it's broken now in that TLM is not consistent across systems). Pending Gateway discussion today on headers.

@skliper skliper added this to the 6.8.0 milestone Feb 25, 2020
@skliper
Copy link
Contributor

skliper commented Apr 8, 2020

@tngo67 Do you have strong feelings on this one for the short term approach? Basically padding depends on the contents right now (if I remember the issue correctly) since the headers don't end on a 64-bit boundary. Do we need a solution this round or can it wait for a longer term solution?

@CDKnightNASA
Copy link
Contributor

CDKnightNASA commented Apr 9, 2020

Describe the bug
Compiler add padding is applied to cFE Telemetry packets if the data region does not start on a 64 bit boundary and the packet contains a 64 bit variable.

I assume you're talking TLM payload, not header (which has very specific byte positioning, and hopefully nothing in cFS tries to shortcut that with a 64-bit type or a struct.) For payloads, is it cFS's job to enforce this? 'course, if there is TLM being generated by CFS or one of the "stock GSFC" apps, those components and apps need to be fixed...Curious where you've seen this behavior?

And I don't know how we could enforce/ensure that data is "packed" when it's sent to the SB, as the payload is opaque to CFE_SB.

SBN had to address this with its network protocol with packing/unpacking. 'course, EDS will make it all better...

@dmknutsen
Copy link
Contributor Author

I believe I first noticed the issue in the TBL Housekeeping TLM packet when I was trying to decom it via the GroundSystem.py tool. The issue should however be present in any TLM packet that contains a 64-bit type. In the case of the TBL housekeeping packet, MemPoolHandle is a uintptr_t - which causes the issue (on a 64 bit system).

We touched on this a few months ago in the CCB and I think we discussed forcing the TLM header to end on a 64-bit boundary by adding padding to CCSDS_TlmSecHdr_t (4 bytes if CCSDS_TIME_SIZE is equal to 6 and 2 bytes if it is equal to 8). If I remember correctly Joe recommended a solution within EDS that would not require redefining packet definitions.

@skliper
Copy link
Contributor

skliper commented Apr 9, 2020

@tngo67 The short term question for certification is we have a requirement somewhere to explicitly define all padding. The question is do we need to make the header explicitly align to 64, and waste space where it's not needed, or just disposition this ticket until we adopt a new toolchain/EDS/or whatever.

@CDKnightNASA
Copy link
Contributor

CDKnightNASA commented Apr 9, 2020

We touched on this a few months ago in the CCB and I think we discussed forcing the TLM header to end on a 64-bit boundary by adding padding to CCSDS_TlmSecHdr_t (4 bytes if CCSDS_TIME_SIZE is equal to 6 and 2 bytes if it is equal to 8). If I remember correctly Joe recommended a solution within EDS that would not require redefining packet definitions.

Copy that, although personally I'd rather the cFS core and "GSFC apps" all standardize on packed, big-endian cmd and tlm message formats, but it's a preference of mine and I realize there are reasons against such a standard.

Needless to say whatever way the CCB decides, we'll need to scrub all TLM-generating code to ensure that it conforms with the standard. Happy to help, time-permitting. ('course, EDS will address this and make it moot anyways.)

@skliper
Copy link
Contributor

skliper commented Apr 9, 2020

Explicitly adding the pad on a packet level basis where needed (before the user data structure) breaks the abstracted functions. For example extra bits in the CFE_TBL_HousekeepingTlm_t...

typedef struct
{
    uint8                       TlmHeader[CFE_SB_TLM_HDR_SIZE];       /**< \brief cFE Software Bus Telemetry Message Header */
    uint8 EXPLICITPADDINGHERE[X];
    CFE_TBL_HousekeepingTlm_Payload_t  Payload;
} CFE_TBL_HousekeepingTlm_t;

GetUserData would need to be aware of the different padding... So I think it needs to be global.

@skliper
Copy link
Contributor

skliper commented Apr 9, 2020

Or at least the app would need to uniquely handle the padding from the result of GetUserData (and any other user data getter/setters).

@jphickey
Copy link
Contributor

jphickey commented Apr 9, 2020

No - padding cannot exist between a header and a payload on some packets and not others, that is effectively the same issue whether it is explicit or implicit. The point is that the receiver needs to decode it, which can only be done if it (reliably/consistently) knows how big the header is.

There is no easy fix for this. The only possible short term workaround would be to increase CFE_SB_TLM_HDR_SIZE from 12 to 16 by including padding for every packet and wasting bytes.

@skliper
Copy link
Contributor

skliper commented Apr 9, 2020

No - padding cannot exist between a header and a payload on some packets and not others, that is effectively the same issue whether it is explicit or implicit.

It does now...

@skliper
Copy link
Contributor

skliper commented Apr 9, 2020

Anyways, I think we are agreeing on short term possible fix? Just a question now for @tngo67 if we should implement this for the certification or not (would change most telemetry packets on the wire, ripples into CTF ground stations and CCDD).

@dmknutsen
Copy link
Contributor Author

If we decide to move forward...can I volunteer to take this one? I've been auditing all of the unit tests for the the past few days - and it is slowly starting to drive me insane lol. Would be a welcome break/change of pace.

@skliper
Copy link
Contributor

skliper commented Apr 21, 2020

@dmknutsen - yes!

@jphickey
Copy link
Contributor

jphickey commented May 1, 2020

There is actually another possible approach here. Yet again we can take a page from the EDS changes that the GHAPS project did without actually using EDS.

In the EDS build, the TO_LAB app (and CI_LAB) maintain a separate "network" buffer from the software bus "native" buffer. When CFE_SB_RcvMsg() returns the EDS build re-encodes the entire packet for network transmission by copying it from the native buffer to the network buffer, and it packs the bits in the process of doing this.

Certainly TO_LAB couldn't pack the entire frame without EDS, but it could remove compiler-added padding between the end of the TLM header and the start of the payload, as EVERY frame will have it.

Since any long-term solution would almost certainly require differentiating between network and native encodings of data this isn't terrible. TO_LAB (or any TO) will need to separate the buffers in the long run anyway.

@skliper skliper modified the milestones: 6.8.0, 7.0.0 May 1, 2020
@skliper
Copy link
Contributor

skliper commented May 1, 2020

Switching milestone to 7.0 per certification discussion (minimize tlm packet impacts this round).

@jphickey
Copy link
Contributor

jphickey commented May 1, 2020

But I thought this was a concern for cert because TLM packets come out differently on a 64 bit vs 32 bit build?

Even if upcoming certification will only be done for a 32-bit processor, most development and CI scripts will likely need to run on 64-bit because that's whats more common now, so then we are in a situation where only one or the other is going to work?

The concern is the general practicality of testing and maintaining the software if the tools/scripts will only work against a 32-bit build but everyone doing development is on 64-bit.

@skliper
Copy link
Contributor

skliper commented May 1, 2020

Negotiated with certification stakeholders and this fix is now targeted for 7.0.

@skliper
Copy link
Contributor

skliper commented May 1, 2020

And yes, the "certifiably" will be for 32bit for now. Won't work out of the box for 64 bit builds.

@skliper
Copy link
Contributor

skliper commented May 1, 2020

Note this is not the only reason the tests would fail to work for both 32 and 64 bit. Major reorganization of the tlm packets would require the next release to be 7.0.0 per our versioning policy, and there's several other issues that have been identified.

@jphickey
Copy link
Contributor

jphickey commented May 1, 2020

My concern was only about possible unintended schedule consequences of deferment, where deferring initially seems like a time saver but might actually be a time cost when there is an easy low-cost/low-risk near-term workaround that can be done.

Many recent delays in getting baseline merges completed could be avoided by having better CI, which improves our ability to meet schedule in general for all the stuff that IS in scope for 6.8.0.

@skliper
Copy link
Contributor

skliper commented Sep 22, 2020

Recommending updating the union to align the cmd/tlm headers from 32bit to 64 bit and explicitly padding the header to 64 bit boundary (since implicit padding should be avoided per our coding rules).

EDIT: Union update doesn't seem to benefit much, just would waste space at the end (at least as far as I've thought through).

@jphickey
Copy link
Contributor

As long as it is acceptable to add padding and therefore expand packets - it should be fine to add extra bytes to the CMD/TLM headers to make them all multiples of 8 bytes. This could likely be done with the MSG customizations, for users that want that, right? Does the open source need to change for that?

@jphickey
Copy link
Contributor

And yes - the padding would need to be added to the CMD/TLM header structures themselves, not to the buffer unions.

@skliper
Copy link
Contributor

skliper commented Sep 23, 2020

See #905 for an example of the issues...

@skliper
Copy link
Contributor

skliper commented Sep 23, 2020

Changed to bug since this causes CFE_SB_GetUserData to fail for tlm with 64bit payload.

@skliper
Copy link
Contributor

skliper commented Sep 28, 2020

This could likely be done with the MSG customizations, for users that want that, right? Does the open source need to change for that?

My preference would be to fix the default since it causes bugs (see #905), and allow the "classic" definitions if a user still wants the broken implementation. Along with the other padding/alignment/layout updates (#664, #663, #689). I'll put in another issue/PR to test for explict padding (no implicit padding coding standard compliance).

skliper added a commit to skliper/cFE that referenced this issue Jan 7, 2021
@shong-astrobotic
Copy link

re: Padding the packet header to 64-bits, this will introduce interoperability issues, especially for systems (e.g. ground software) that expect the standard CCSDS Space Packet primary header format w/o any padding bytes.

Linux also has this issue, where the packets headers are not 64-bit aligned (e.g. Ethernet header is 14 bytes or 18 bytes, typically). What Linux does to work-around this issue is to have an offset (e.g. +64 bytes) to where the packet starts in a given data buffer (e.g. uint8_t pkt_start = &data_buffer + 64). When the packet header is ready to be added, the header is prepended to the data, and the start-of-packet pointer is moved back N bytes, where N is the size of the packet header being prepended (e.g. if adding the Ethernet + IPv4 header + UDP header, which are 14 bytes, 20 bytes and 8 bytes, respectively, then the packet header would be written to data_buffer[22...63], and then pkt_start = pkt_start - 42).

The start-of-packet offset is normally a metadata field in the data buffer (in Linux, this is part of the sk_buff, or socket buffer, structure--refer to http://lxr.linux.no/linux+v2.6.20/include/linux/skbuff.h#L184). The start-of-packet offset should initially be set to a large enough value such that prepending the packet header does not go beyond the start of the data buffer.

I would recommend this type of approach, as it would solve your alignment issues w/o introducing any padding between the Space Packet header and the data.

@skliper
Copy link
Contributor

skliper commented Jan 8, 2021

@shong-astrobotic - the padding is part of the "Ancillary Data Field (variable length)" within the Packet Secondary Header of the CCSDS packet. No violation against standard CCSDS header and any ground system that supports the Packet Secondary Header option for a variable length ancillary data field should handle it.

In other words, the padding is not added to the CCSDS Space Packet primary header.

@shong-astrobotic
Copy link

@skliper - Ok, the SH flag will then have to be set for all cases, then.

@skliper
Copy link
Contributor

skliper commented Jan 8, 2021

@shong-astrobotic - yeah, that's the expectation for the default header implementations (CFE_MSG_Init set's that field). Your point is a good one for someone who may want to replace the default msg module to implement CCSDS packets without a secondary header. As currently implemented they could just consider the padding part of the User Data Field, or pack/unpack using EDS (or similar scheme to eliminate padding), but they would have a lot of other work on their hands to replace the functionality lost by removing the secondary header so I'm not sure how realistic this scenario would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment