Skip to content

Key support#550

Open
eboasson wants to merge 25 commits intoros2:rollingfrom
eboasson:key-support
Open

Key support#550
eboasson wants to merge 25 commits intoros2:rollingfrom
eboasson:key-support

Conversation

@eboasson
Copy link
Collaborator

@eboasson eboasson commented Jan 6, 2026

Description

This rather massive PR:

  • Supports Cyclone DDS master (but retains compatibility with 0.10.x)
  • Adds DDS XTypes type discovery for ROS 2 types
  • Implements support for keyed topics in ROS 2 when using Cyclone
  • Contains a new deserializer in the manner of the serializer that was being used
  • Removes the old (de)serialization code and other no-longer needed type support related cruft

The first two are really mostly the old (draft) PR #501, but with a few more fixes.

The final two are because the situation was a mess, albeit one that worked. It basically required no attention for a long time and was left to rot for a long time. Adding support for keys forced me to work on it, and I decided to try and reduce the mess. It is still far from perfect, but I kept running into not being allowed to do partial template specialization when trying to unify the serializer and the deserializer ... and not liking C++ to begin with, gave up on that.

Is this user-facing behavior change?

Not really, except for adding support for keys and providing type information in DDS. Cyclone moved ahead quite a bit since 0.10.x but it retains backwards compatibility for configuration.

Those changes do include things that should improve the user experience in many cases. In particular on networking and discovery options it improves things quite a bit. For example, auto-detecting whether multicast on loopback actually works, being able to do multicast discovery over loopback while restricting discovery over ethernet to unicast, tooling improvements, moving Iceoryx/Iceoryx2 support into plugins so there are fewer problems in building things.

In my understanding, the paperwork required by the Eclipse Foundation for a release is sorted. So tagging it for a new release is finally possible.

Did you use Generative AI?

No.

Additional Information

So far, really only tested by running tests on macOS (thanks https://github.com/IOES-Lab/ROS2_Jazzy_MacOS_Native_AppleSilicon !). The Cyclone code seems to work fine, but the compatibility with Fast-DDS is terrible. I've done enough digging to say that the problem is with Fast-DDS.

If you want it to work with Fast-DDS, and you Fast-DDS hasn't been yet fixed ... there are some workarounds possible:

  • Setting Internal/GenerateKeyHash to true, because Fast-DDS crashes if it doesn't receive the optional DDSI KeyHash for keyed topics. (I probably would've skipped generating key hashes in this PR if it weren't for this.)
  • Ignoring type information published by Fast-DDS by setting Compatibility/IgnoreTypeInformation to 1.15
  • Wide strings are not (no longer) compatible. Cyclone added support for wide strings by implementing the standardised wide string CDR representation. The (de)serializer here now also implements that representation.

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eboasson thanks for the PR. although i do not understand every single details, i did what i could do for review. just a few minor comments, could you check them?

lgtm with green CI.

@eboasson
Copy link
Collaborator Author

eboasson commented Jan 9, 2026

And a big thanks for reviewing @fujitatomoya !

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with green CI

@eboasson
Copy link
Collaborator Author

@fujitatomoya I have been struggling a bit with getting good CI runs. What I would like to do is two runs:

  • This PR with Cyclone releases/0.10.x
  • This PR with Cyclone master

So far I have only tried this with

--packages-up-to rmw_cyclonedds_cpp test_rclcpp test_security test_communication test_cli test_quality_of_service test_cli_remapping

passed to colcon to reduce the amount of work for the CI system a bit.

A lot of my attempts failed because of unrelated problems (e.g., https://ci.ros2.org/job/ci_linux/26903/console where I don't know what the problem is, and https://ci.ros2.org/job/ci_linux/26876/console where it failed because I didn't realise I should pass the same --packages-up-to for building and testing).

What worked:

  • https://ci.ros2.org/job/ci_windows/26382/ (Cyclone releases/0.10.x) is greenish, with the only errors in test_action_communication__rmw_zenoh_cpp and so presumably unrelated to this PR.

  • https://ci.ros2.org/job/ci_linux/26920/ (Cyclone master) which looks good except for a lot of interoperability tests with Fast-DDS. This is what I meant by the "additional information" in the PR description. I'd like to run it with a special Cyclone configuration in effect, but I don't know how to pass a CYCLONEDDS_URI environment variable in colcon test.

Given that Linux CI with Cyclone master suddenly started working, perhaps it now magically works again for releases/0.10.x as well. So I'll give that another try.

What didn't work:

Windows with master fails to build because it sets BUILD_TESTING=1 for CMake, which then causes the Cyclone build to use the CMake trick for exporting all symbols from the library, which then fails for some reason that other people on the web have run into but that I had neer seen before despite doing the same on Cyclone's own CI. I've a workaround for that (passing -DBUILD_TESTING=0 in colcon.pkg), but then it fails on issues with the Iceoryx .hpp files:

C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_hoofs/cxx/helplets.hpp(225,9): warning C4068: unknown pragma 'GCC' (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_hoofs/cxx/helplets.hpp(226,9): warning C4068: unknown pragma 'GCC' (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_hoofs/cxx/helplets.hpp(230,9): warning C4068: unknown pragma 'GCC' (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_hoofs/platform/types.hpp(25,7): error C2371: 'ssize_t': redefinition; different basic types (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\..\ddsrt\include\dds/ddsrt/types/windows.h(29): message : see declaration of 'ssize_t' (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_hoofs/cxx/method_callback.hpp(108,1): warning C4158: assuming #pragma pointers_to_members(full_generality, virtual_inheritance) (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]
C:/ci/ws/install/include/iceoryx/v2.0.6\iceoryx_posh/popo/trigger.hpp(172): message : see reference to class template instantiation 'iox::cxx::ConstMethodCallback<bool>' being compiled (compiling source file C:\ci\ws\src\eclipse-cyclonedds\cyclonedds\src\psmx_iox\src\psmx_iox_impl.cpp) [C:\ci\ws\build\cyclonedds\src\psmx_iox\psmx_iox.vcxproj]

It looks to me like this is an Iceoryx version that is not really fully compatible with Windows yet. The "old" Cyclone used the Iceoryx C binding, the "new" one has a plugin written in C++. (The Cyclone CI doesn't build the Iceoryx plugin on Windows, perhaps I can reproduce it and fix it.) I googled and there doesn't seem to be a way in colcon to not have it as a dependency for Cyclone on Windows.

So it is looking quite like one expects and the problems with Cyclone + Windows on CI can be deferred because there is no immediate need to update Cyclone, but I do suspect you'd like to a see a little more green in the CI ...

@fujitatomoya
Copy link
Contributor

@eboasson thanks for sharing the situation.

about colcon build and test configuration.
this PR changes rmw_cyclonedds_cpp pacakge, so i would use the following arguments.

  • build_args: --packages-above-and-dependencies rmw_cyclonedds_cpp
  • test_args: --packages-above rmw_cyclonedds_cpp

those should build the all dependent packages above rmw_cyclonedds_cpp, and tests all the pacakges with colcon.

https://ci.ros2.org/job/ci_linux/26903/console where I don't know what the problem is

unfortunately i also see this unstable behavior in CI.
the network connection between the Jenkins controller and the agent was lost.
we need to retry the CI... if this happens.

I'd like to run it with a special Cyclone configuration in effect, but I don't know how to pass a CYCLONEDDS_URI environment variable in colcon test.

you mean, enable the CYCLONEDDS_URI in the CI, right?
i am not sure about that either, there does not seem to be a optional argument parameter for CI.

@claraberendsen @cottsay @mjcarroll @christophebedard any thoughts?

It looks to me like this is an Iceoryx version that is not really fully compatible with Windows yet.

both Iceoryx and CycloneDDS define ssize_t differently on Windows, causing a collision?
as you mentioned, i think that this version of Iceoryx isn't fully Windows-compatible yet.
if that is fixable, i say we could give it a shot to fix this issue before defering?

i may be missing some context here, please correct me if i got anything wrong.

@eboasson
Copy link
Collaborator Author

eboasson commented Jan 22, 2026

With this PR and Cyclone master extended with:

It looks like we get the same/expected interoperability errors as before, the changed return code in rmw_publish_loaned_message in case loaning is disabled is now fixed.

The test_action_communication failure I am looking into. Locally I also see a test_action_client failure, but that one is a race condition in the test logic (it assumes get_publisher_count to be up-to-date immediately, but that needn't be the case with the RMW layer updating the graph asynchronously).

With this PR and Cyclone releases/0.10.x

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Both CI runs are with

  • build_args: --packages-above-and-dependencies rmw_cyclonedds_cpp
  • test_args: --packages-above rmw_cyclonedds_cpp
    as suggested by @fujitatomoya

@cottsay
Copy link
Member

cottsay commented Jan 22, 2026

I'd like to run it with a special Cyclone configuration in effect, but I don't know how to pass a CYCLONEDDS_URI environment variable in colcon test.

you mean, enable the CYCLONEDDS_URI in the CI, right?
i am not sure about that either, there does not seem to be a optional argument parameter for CI.

I don't think ci.ros2.org has a good way to pass arbitrary environment variables to builds.

You could absolutely do this with an environment hook from a package, though. If you know which tests you'd like to have the variable, you could also modify the tests specifically to export the additional variable.

Splinter1984 and others added 21 commits January 23, 2026 09:28
Needs cyclonedds later than 0.10.
Cyclone doesn't do WStrings, ROS2 does and so those we can't translate.

Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
Signed-off-by: Erik Boasson <eb@ilities.com>
There are currently two serializers:

- one in TypeSupport_impl.hpp
- one in Serializer.cpp

The first one is the one originally used in the Cyclone RMW layer and walks the ROS
message definition directly. The second one was built to speed up serialization and
operates on a data structure derived from the ROS message definition.

For several years, the second serializer has been used. The deserializer never got
rewritten, leading to a very messy situation. This introduces a new deserializer written
to match the serializer actually being used.
Cyclone is fine without key hashes and generating them is optional (except for
the one security configuration that mandates key hashes), and so leaving key
hash generation out is attractive as it keeps things simple. Unfortunately, it
turns out that the Fast-DDS version used by ROS2 late 2025 (commit
33bb952a0e80e2b158571cb2d64ed6b2003609db on the 3.2.x branch) requires them.

This adds the code to generate them, Cyclone's existing
Internal/GenerateKeyHash setting can be used to force the generation.
cpplint complains about a .h file being included with a path, but . is forbidden. It
doesn't mind a .hpp file, so now we have a C header file with a .hpp suffix.
This fixes

  test_rmw_implementation.
    TestPublisherUseLoan.
    borrow_loaned_message_with_bad_arguments
There isn't necessarily an exception to throw. This throws std::bad_alloc instead.
@eboasson
Copy link
Collaborator Author

@fujitatomoya I force-pushed it to fix the merge conflict because it was so trivial:

 -:  ------- >  1:  b4963be removed defaults, proper warnings come up (#549)
 1:  fe12728 !  2:  d369298 Dynamic discovery of ROS types in DDS
    @@ rmw_cyclonedds_cpp/src/rmw_node.cpp: static bool check_create_domain(dds_domaini
            /* NOTE: Empty configuration fragments are ignored, so it is safe to
              unconditionally append a comma. */
            config += "</Discovery></Domain></CycloneDDS>,";
    -@@ rmw_cyclonedds_cpp/src/rmw_node.cpp: static rmw_qos_policy_kind_t dds_qos_policy_to_rmw_qos_policy(dds_qos_policy_id_
    -     case DDS_LIFESPAN_QOS_POLICY_ID:
    -       return RMW_QOS_POLICY_LIFESPAN;
    -     default:
    -+      RCUTILS_LOG_ERROR_NAMED("rmw_cyclonedds_cpp", "%d", policy_id);
    -       return RMW_QOS_POLICY_INVALID;
    -   }
    - }
     @@ rmw_cyclonedds_cpp/src/rmw_node.cpp: static CddsPublisher * create_cdds_publisher(
        std::string fqtopic_name = make_fqtopic(ROS_TOPIC_PREFIX, topic_name, "", qos_policies);
        bool is_fixed_type = is_type_self_contained(type_support);
 2:  7f4fbe4 =  3:  185253e Update for recent changes in Cyclone DDS
 3:  ed914fd =  4:  d2721b5 Ignore failure mapping ROS2 type to DDS type
 4:  81892fb =  5:  8d38ce9 uncrustify
 5:  996ed3d =  6:  f2186b4 Solve c99-compound-literal warning by moving a few functions to a C file
 6:  fb3ea59 =  7:  d702fa0 DDS TypeObject construction in a separate file
 7:  cf922f6 =  8:  70bbfe5 Update loans, serialize_into for Cyclone master
 8:  450f5f0 =  9:  3ea07a6 Initial key support
 9:  899d1cf = 10:  90f48ec New deserializer based on the used serializer
10:  9b4cc66 = 11:  439dd35 Remove old type support and clean up
11:  87140b1 = 12:  86ea32b Fixup key extraction, add printing
12:  86c0f48 = 13:  3089a1a Cleanup: nullptr, auto
13:  e1b284b = 14:  c07a796 Cyclone 0.10 compatibility
14:  46ea0ca = 15:  4e84a58 Add min/max serialized size calculation
15:  10398cb = 16:  68411f6 Implement DDSI keyhash generation
16:  e6a662c = 17:  aceb8e1 Uncrustify and cpplint
17:  b8da067 = 18:  21f937d Fix build issues on Ununtu 24.04
18:  785350a = 19:  1f4e5f1 Wrong uncrustify version, pacify cpplint
19:  48ca3f9 = 20:  42bb743 Delete unused macros.hpp
20:  3ae64c4 = 21:  dd8eded Restore header files needed for 0.10.x
21:  f4aa559 = 22:  5d432c1 Pass sequence bounds as 32-bit integers
22:  a407739 = 23:  426ebec Fix unused variable for 0.10.x without Iceoryx
23:  c0771bb = 24:  c9350ef Fix return from rmw_publish_loaned_message
24:  0381993 = 25:  b9fbf80 Do not rethrow when resize_function fails
25:  467d8c5 = 26:  edbc876 Catch any exception thrown in deserializer

My CI run for releases/0.10.x failed for unrelated reasons. I restarted it, hopefully it will complete now:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@eboasson
Copy link
Collaborator Author

Thanks @cottsay ! And @fujitatomoya, of course 🙂

I have done the trick with the hooks to set CYCLONEDDS_URI with the options I mentioned in the description of the PR (this lives on a separate branch, https://github.com/eboasson/key-support-with-env-hook). A CI run in combination with Cyclone master extended with 4 the same 3 PRs mentioned earlier and eclipse-cyclonedds/cyclonedds#2342 to fix up a head-scratchingly stupid mistake of mine:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

The linux and RHEL failures are unrelated, the Linux aarch64 one is as I would expect (the wide strings incompatibility is deliberate, see the description of this PR). This simply provides evidence for my claim that the settings I mentioned help workaround bugs in FastDDS. As they are workarounds for other people's bugs, I don't think they should be made a (semi-)permanent feature of Cyclone or its RMW layer.

Do you agree that the results look good for merging this PR? Once we merge this, we can decide when to update the used Cyclone version as well. (Once put a tag on it, but the paperwork for that has been done and there is no obstacle to do so.)

@asymingt
Copy link
Member

asymingt commented Feb 5, 2026

Had a lot of CI worker turbulence in the last fortnight. Just re-triggered a CI run of 18035. It might take a bit to complete because there is a large backlog of workers.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Contributor

Pulls: #550
Gist: https://gist.githubusercontent.com/fujitatomoya/efbeb2a8419b98b248dd2ff7a234bd1e/raw/a739dd4aa378c1f205dadb4bd02ea683a43ac708/ros2.repos
BUILD args: --packages-above-and-dependencies rmw_cyclonedds_cpp
TEST args: --packages-above rmw_cyclonedds_cpp
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18215

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ros-discourse
Copy link

This pull request has been mentioned on Open Robotics Discourse. There might be relevant details there:

https://discourse.openrobotics.org/t/ros-pmc-minutes-for-februrary-10-2026/52550/1

@knmcguire
Copy link

knmcguire commented Feb 18, 2026

For the windows builds, I would l like to add here that this PR in combination with the latest master of cyclone dds, solves a lot of the build warning we had with MSVC 2022. The ROS PMC would like to transition from MSVC 2019 to MSVC 2022 but when we build that in the test ci jobs, we saw over 1800 warnings. When we build it (with other fixes) together with cyclone-dds master and this PR, it got reduced to only 37 warnings.

Builds of this PR:

We would very much benefit from this PR being merged AND having a brand new release of cyclone dds (or have rolling point to master at least)

@trittsv
Copy link

trittsv commented Feb 18, 2026

We also need the key feature introduced in this PR.
Is there anything still pending, or any reason it hasn’t been merged yet?

@jmachowinski
Copy link
Contributor

@trittsv Shortage of maintainers all over the place. You are more than welcome to join us and help out with bug fixes and PR reviews.

Comment on lines +47 to +50
{
auto u = reinterpret_cast<uint16_t *>(x);
*u = static_cast<uint16_t>((*u >> 8) | (*u << 8));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like Undefined Behavior
The correct implementation is

Suggested change
{
auto u = reinterpret_cast<uint16_t *>(x);
*u = static_cast<uint16_t>((*u >> 8) | (*u << 8));
}
{
uint16_t tmp;
memcpy(&tmp, x, sizeof(tmp));
tmp = ((tmp >> 8) | (tmp << 8));
memcpy(x, &tmp, sizeof(tmp));
}

Note, the compiler will completely optimize the memcopy out, and the result will be as efficient as the pointer version. The big difference here is that you don't mess up the life time tracking of the compiler.

Here is a godbolt example were you can see the resulting assembler code for both versions.
https://godbolt.org/z/ra9roYE9W

C++ Weekly episode explaining this also in depth
https://youtu.be/L06nbZXD2D0

@jmachowinski
Copy link
Contributor

@eboasson I am working on a optimized and bug fixed version of this, can you create some performance tests for the serialization part ?

@jmachowinski
Copy link
Contributor

jmachowinski commented Feb 20, 2026

I did not finish the rework of the PR today, I'll try to look further into it on monday.

As a general remark, there is a lot of reinterpret cast all this PR, which is undefined behavior and need to be fixed. I already fixed most of the stuff in the Serialization.cpp

@jmachowinski
Copy link
Contributor

@eboasson
I opened a pull request with some fixes and performance optimizations at https://github.com/eboasson/rmw_cyclonedds/pulls

@eboasson
Copy link
Collaborator Author

@jmachowinski

I opened a pull request with some fixes and performance optimizations at

Thank you! I'm looking at it and I'm trying to make some simple serialization performance test as well. It should at least be possible to verify that in some simple cases the "trivially serializable" optimizations do happen, but verifying this in an automated test seems somewhat tricky.

As a general remark, there is a lot of reinterpret cast all this PR, which is undefined behavior and need to be fixed. I already fixed most of the stuff in the Serialization.cpp

Thank you for that, too. I think often it ends up not being UB, based on:

https://en.cppreference.com/w/cpp/language/reinterpret_cast.html: 5) An lvalue(until C++11)glvalue(since C++11) expression of type T1 can be converted to reference to another type T2. The result is that of reinterpret_cast<T2>(p), where p is a pointer of type “pointer to T1” to the object or function designated by expression. No temporary is materialized or(since C++17) created, no copy is made, no constructors or conversion functions are called. The resulting reference can only be accessed safely if it is type-accessible.

and for type-accessibility:

If a type T_ref is similar to any of the following types, an object of dynamic type T_obj is type-accessible through a lvalue(until C++11)glvalue(since C++11) of type T_ref:

  • char, unsigned char or std::byte(since C++17): this permits examination of the object representation of any object as an array of bytes.
  • T_obj

The serialized representation is (or at least should be) an array of unsigned char (std::byte would be more elegant), and the non-serialized representation is originally of the correct type. It is notoriously hard to get right, so without a doubt there are at least some places where there is UB, and if there's but one place where there's UB, everything's UB ... (One wonders whether "examination" includes "modifying".)

Outside the serializer there are indeed also a bunch, but I think those are cases where const_cast+static_cast are also possibly. Anyway, one thing at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

9 participants