Skip to content

ros2 bag play - start/pause play next interaction#680

Closed
MichaelOrlov wants to merge 14 commits intoros2:masterfrom
MichaelOrlov:add_pause-resume_play-next_to_player
Closed

ros2 bag play - start/pause play next interaction#680
MichaelOrlov wants to merge 14 commits intoros2:masterfrom
MichaelOrlov:add_pause-resume_play-next_to_player

Conversation

@MichaelOrlov
Copy link
Copy Markdown
Contributor

@MichaelOrlov MichaelOrlov commented Mar 16, 2021

  1. Added new API for play/pause, change playback ratio and play next in pause.
  2. Added key press handling for newly added APIs.
    Key press handling implemented for POSIX compatible OS like Linux.
    play/pause = "space bar", increase playback speed = "cursor up", decrease playback speed = "cursor down", play next message = "cursor forward".
  3. Made Player class publicly available to be able use it in other applications and directly in tests.
  4. Add small "bugfix"/workaround (add delay after playback) found on CI at Apex.AI related to the issue when publishers created in player class got destroyed before they sent last messages on the wire. The rational for adding this workaround in this PR is that it's small (1-2 lines) and all other logic depends from it in this PR.
  5. Made some cleanup and major overhaul in test_play_timing.cpp

Special consideration and pay attention was made to keep original performance and deterministic playback in runtime i.e. avoiding locking mutexes and busy loops in runtime phase of the playback. Also keeping original concept for avoiding accumulation of errors in playback time.

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
…f playback. Without delay publishers could be destroyed before they will sent last message on wire.

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
…lay next Bloc

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
@MichaelOrlov MichaelOrlov force-pushed the add_pause-resume_play-next_to_player branch from 773eb49 to 4465b39 Compare March 16, 2021 07:41
@MichaelOrlov MichaelOrlov marked this pull request as ready for review March 16, 2021 08:29
@MichaelOrlov MichaelOrlov requested a review from a team as a code owner March 16, 2021 08:29
@MichaelOrlov MichaelOrlov requested review from Karsten1987 and jaisontj and removed request for a team March 16, 2021 08:29
…ming to play.

Signed-off-by: Michael Orlov <michael.orlov@apex.ai>
@MichaelOrlov MichaelOrlov force-pushed the add_pause-resume_play-next_to_player branch from f467933 to dd24609 Compare March 16, 2021 08:33
@Karsten1987
Copy link
Copy Markdown
Collaborator

thanks @MichaelOrlov for the PR. While I appreciate to have such a feature-rich PR, I believe we should break these down in individual PRs. It's great that you've found a bug for the publisher clean up, I believe this should be contributed anyhow independent from the pause/resume feature.

We've discussed on #443 (comment) that at least the keyboard handling shall be dealt with external to the player. Ideally, in the ros2bag python package to ease the cross-platform keyboard handling.
Also, we were playing with the idea of featuring a rclcpp_lifecycle Node, which implicitly provides services and further command line tools integration for handling a start/stop functionality.

Can you elaborate why the player API has to be public? You mentioned you wanted to use this in other applications. What are these applications?

@MichaelOrlov
Copy link
Copy Markdown
Contributor Author

MichaelOrlov commented Mar 16, 2021

Hi @Karsten1987.
When you mention about breaking this down on several PSs. What divisions you had in mind?

As regards to the small bugfix/workaround with delay at the end of playback. It's literally one line of code and one line of comment to it with explanation why. I can do a separate PR for it if you want. Please let me know.

As regards to the proposal to moving keyboard handling outside of the player in python script. In theory it looks reasonable and easy way. On practice we have extra layer Rosbag2Transport::play() which we are calling from python script. Keyboard handling imply interaction with 2 threads running in parallel doing this from python will require to use Player class directly without Rosbag2Transport layer and deal with multi-threading in python - it will cause major redesign in python script and Rosbag2Transport.
With the giving design of the Rosbag2 hierarchy and class layers, handling keyboard pressing inside Rosbag2Transport::play() is the only option. BTW. I've spent some effort to implement it in a good/safe way.
I would propose in future transform Rosbag2Transport class and all his member functions to the namespace and make API self contained without any constructors/destructors and init/fini methods. If you will look on the API, all methods considered as independent entities and doesn't imply to be dependent from each other, like Play and Record. Also propose to rename file rosbag2_transport.cpp to the rosbag2_infrastructure.cpp because it's better describes what it's holding and doing inside. But this is for separate PRs and activities.
So well, while we are looking for the best cross-platform design with keyboard handling, people struggling without this essential functionality. I would suggest to accept current implementation with keyboard handling and improve/extend it in future.

As regards to the idea of featuring a rclcpp_lifecycle Node, which implicitly provides services and further command line tools integration for handling a start/stop functionality.
It might be ok if you are planing to have only play/pause API. And I think this is not going to work when you will add possibility to change playback rate in pause and "play next" in pause.
Consider the case:

  1. Rosbag recorded messages publishing on some topic with inter-message interval in 100 mSec.
  2. Replaying this recorded messages with original speed and going in pause mode.
  3. Increasing playback speed 2x.
  4. Playing a few messages manually while in pause via play-next message API.
  5. Resume playing messages.

After resuming, unplayed messages shall start playing with 2x speed i.e. 50 mSec interval and time spend in pause should be taken in to account to avoid playing messages without required interval. The complication arise that you can't just simply sleep for 50 mSec in playback routine. Because it will cause accumulation of errors in time delay. Need to respect start time and time at what time particular message was recorded relative to the start of the recording and time spend in pause and also different playback speed.
I don't think that it's possible to implement this scenario with rclcpp_lifecycle Node. At least in a good way without spaghetti style in design and code.
My implementation successfully handle all these complicated cases and playing messages deterministically in time domain.

As regards to your question:

Can you elaborate why the player API has to be public?

We are at Apex.AI considering to use it in Rosbag Coordinator application to be able deterministically replay complicated scenarios with high bandwidth from multiple nodes and perhaps in future port of rosbag rqt.

@emersonknapp
Copy link
Copy Markdown
Collaborator

emersonknapp commented Mar 17, 2021

When you mention about breaking this down on several PSs. What divisions you had in mind?

You've listed 5 independent changes in your PR description - this sounds like 5 PRs.

EDIT: actually, based on point 1 - which says "play/pause, change playback ratio, and play next" - this itself could even be 3 PRs introducing the features one at a time. I'd also like to find out how this approach works with the proposal in #675 . It is an important feature set, I think we need to have a consensus on the design before just merging new code

As regards to the small bugfix/workaround with delay at the end of playback. It's literally one line of code and one line of comment to it with explanation why. I can do a separate PR for it if you want. Please let me know.

One single line of code may look small, but it needs independent verification, and one wrong line is all it takes to make a program work incorrectly. If it is in a separate PR/commit - there is a chain of review and git blame, and is individually able to be reverted.

@MichaelOrlov
Copy link
Copy Markdown
Contributor Author

Hi @emersonknapp and @Karsten1987 as per initial issue (feature request) initiated by you ros2 bag play - start/pause interaction #55

Like in ROS 1, there should be a chance to control the replaying of a bag file.
spacebar should toggle the start/pause replaying.
s should solely spin the bag once.

Generally, as for now, there is no status feedback given when replaying a bag file. As a first step, there should be a message >printed, whether the bag file is currently publishing data or not.

You listed "play/pause, play next, and keyboard handling" in this feature and we already have had interface for change playback ratio on start. I only implemented runtime API for it to be able to changed it from keyboard.

I've implemented everything what you have been asking for in this feature.
And now you are trying to consider these changes as a separate features and forcing me to split them?

Perhaps I will be able to split keyboard handling and bugfix with delay to the separate PRs. As a consequences they will have dependencies in the order of merging them.
But please don't ask me to split play/pause, play-next and changing playback ratio. They all have dependencies from each other and tightly coupled together.

When I was developing/upstreaming this feature I came across to the situation that I can't proceed without those bugfix and I have to add it on the early stage. Then I found out that I can't proceed without refactoring testing code. Otherwise it would be even more spaghetti-sh. I made it and it became backed in changes with dependencies in this PR.

As regards to the

I'd also like to find out how this approach works with the proposal in #675 .

Long story short:
It's perfectly align to each other. #675 could be implemented above my changes and I don't see any possible issues with it. I tried to keep original concept of playing messages. And implementation of "Rosbag2 time, driven by external clock source" feature above my changes will not be even harder if you would implement it without my changes. It's just different features for different use cases.

Ok. A little bit longer story and details about #675.
We have

These are the time situations to handle:

  1. Steady Time - Rosbag2 keeps time internally with reference to a monotonic system clock, and neither publishes or subscribes to /clock
    • This is the default behavior
  2. Rosbag2 acts as ROS Time Source by publishing to the /clock topic (a simple extension of case 1)
    • Chosen by --clock option to ros2 bag play
  3. Rosbag2 is driven by an external ROS Time Source - most commonly the Gazebo simulator
    • In this case, Rosbag2 cannot use "time-control" features presented above, since it is a passive consumer of time
    • Chosen by --use-sim-time argument to ros2 bag play

We mostly concern about item 3. "Rosbag2 is driven by an external ROS Time Source - most commonly the Gazebo simulator" - right?

Though we are going to change how time flowing on low level. My implementation will live above it and will not know that something got changed. If someone will change slow down time flow twice. My playback rate will be relative to it. All logic will be valid and will be working as nothing got changed. Even pause-resume will be working the same way.

And here are the reasons for this:
Following statement will be still true after my changes:

Notes on current implementation (as of writing this design)

rosbag2_transport::Player currently uses std::chrono::system_clock to query time, and std::this_thread::sleep_until to wait between publishing messages, with explicit handling of playback rate.

And proposal:

Pass a PlayerClock instance to rosbag2_transport::Player

Is till a valid good to go plan with addition that it will need to use PlayerClock::now instead of std::chrono::steady_clock::now(); in a few more places. It's literal two more replacements in one Player::pause_resume() method.

To summarize:
I think these two approaches for play/pause and changing playback speed is for two different use cases and purposes and should be both present in Rosbag2.
My implementation giving direct access to these features via exposing API in Player class and intend to be used in applications which want to use Player class and corresponding library directly and as opposed feature "Rosbag2 is driven by an external ROS Time Source" described in #675 is mostly for simulator and remote synchronization.

One more factor to consider is that my approach is ready to be used now. Approach from #675 is only planing to be implemented and have some dependencies and have high uncertainty when it will be ready (may be year(s) after).

@emersonknapp and @Karsten1987 Giving that details above I would like to ask you to take a look on my changes and give concrete proposals what need to change and how to split them on different PRs.

Hope on understanding.
I appreciate your time and hope you will do the same for me.

@emersonknapp
Copy link
Copy Markdown
Collaborator

emersonknapp commented Mar 17, 2021

We mostly concern about item 3. "Rosbag2 is driven by an external ROS Time Source - most commonly the Gazebo simulator" - right?

No. #675 considers that as just one of the cases being handled, but it's not the primary concern of the design. That design is concerned with gracefully handling all three cases. But it is largely focused on the "rosbag2 controls time" scenario of cases 1 and 2 - determining how "pause and resume time", "change rate of playback", and "seek to a specific time" work.

In case 3, I am proposing that the user cannot do time control on the rosbag2 playback. The ROS Time Source is external, so the playback is completely driven by the ROS Time Source. There may be other features that can happen in this case, such as "mute playback" - where messages are not published until rosbag2 is "un-muted", but this is not the same feature as "pause/resume".

I think these two approaches for play/pause and changing playback speed is for two different use cases and purposes and should be both present in Rosbag2.

Based on what I have said above, I don't see this to be the case. I think you're right that implementing this functionality in the Player is complex with a lot of edge cases - and I think that the PlayerClock design reduces and manages that complexity by approaching the problem in a different way.

Perhaps I am missing some use cases as to why these features would need to exist side by side, please let me know if that's the case (and that is useful input to a design document). The only thing I see from these features living side by side is this:

"Gazebo controls time, but rosbag2 wants to play back in sync, but at a different rate. For example, Gazebo is simulating at 2x speed, and we want rosbag2 to play back at double the simulation speed (effectively 4x, but looks like 2x) - pausing and resuming when the simulation does so."

And even on considering the above, the PlayerClock could handle this case (if it is wanted) by having a rate multiplier on top of using the external ROS Time Source.

@MichaelOrlov
Copy link
Copy Markdown
Contributor Author

Hi @emersonknapp Thank you for expressing your opinion and decision in timely manner.
We need to discuss it internally in Apex.AI first.
I will come back to it a little bit latter on.

@MichaelOrlov
Copy link
Copy Markdown
Contributor Author

Closing this PR as wouldn't merge, since the same functionality will be implemented in the scope of the time control feature #696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants