Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/FixedUpdate Stepping #8168

Closed
wants to merge 1 commit into from
Closed

Update/FixedUpdate Stepping #8168

wants to merge 1 commit into from

Conversation

cart
Copy link
Member

@cart cart commented Mar 22, 2023

Objective

Alternative to #8063

Note that this is a draft PR and only exists to illustrate an idea. It is not intended to be a final API.

It would be nice if users could "step through" their app logic to debug on a frame-by-frame basis (and in the future ... system-by-system).

This aims to illustrate an alternative to the system stepping impl in #8063 that accomplishes the following goals:

  • Illustrate that stepping Update and FixedUpdate globally is a simple and effective solution. We can step "user logic" transparently without the need to manually opt engine systems out of stepping logic or user logic into stepping logic. We can step every example "out of the box" by just adding the Stepping resource and defining a step system in PreUpdate (to determine what inputs will result in a user logic step, and how many steps to take). Update / FixedUpdate exist to be the "simple place to drop user logic". We can use (and rely on) those boundaries.
  • Support arbitrary app logic such as state transitions (because we do full frame stepping)
  • Significantly simpler / less invasive impl that exists entirely in "userspace" (no bevy_ecs scheduler changes need). The current change set is an extremely uncontroversial impl that we could merge now to unlock a subset of the value in System & frame stepping support #8063 without any negative UX implications for our users.
  • Note that this does not (yet) support per-system stepping (which would introduce more complexity and caveats), but I think we can expand this general idea with run conditions or the schedule-internal stepping introduced in System & frame stepping support #8063 (see Next Steps).

Note that this type of "conditional update/fixedupdate" logic running is how engines like Godot implement their pausing functionality.

Like #8063, this would benefit from event buffers that aren't cleared every other frame.

Next Steps

  • Explore granular per-system run-condition based stepping. A new run condition would be implicitly added to systems in Update/FixedUpdate (when granular stepping enabled). It would run / not run systems based on the current execution state (ex: which system are we currently breaking on, which systems have run for this "frame", which systems still need to run). This would be configured on each SystemConfig with override-able defaults. We would have a "stepped virtual frame" resource that each criteria would track itself relative to (which in combination with the step state would allow the system to decide if it should/should not run on a given frame).
  • We can still explore the "internal schedule executor stepping" route introduced in System & frame stepping support #8063. Very possible that the added complexity in the schedule executor is worth it relative to whatever run-condition impl we implement. Run conditions might not even be viable at all! The point is we could still use the "only step Update/FixedUpdate" approach (without any need to opt bevy internals or user systems in or out).
  • Consider just using debuggers for per system stepping. Users can drop breakpoints on whatever systems they want. This can be combined with global per-frame stepping for something close to a best of both worlds solution:
    1. Enable per-frame stepping with KeyCode::Space to advance by one frame
    2. Drop breakpoints on the systems you want
    3. Press space to begin a new frame. Execution breaks on each system you selected (and you have a full debugger at your disposal). After stepping through every system / breakpoint you have set, the stepped user logic ends and the engine internals start looping again (doing rendering PreUpdate/PostUpdate, etc).
    4. Note that we'd need to rethink Time:delta() in the context of breakpoints, otherwise things will jump around a lot (and FixedUpdate will have a ton of updates each frame). We would want to clamp it to some max value in this context (or maybe all contexts?).

@alice-i-cecile alice-i-cecile added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events labels Mar 22, 2023
#[derive(Resource)]
pub enum Stepping {
Disabled,
Enabled { frames: usize },
Copy link
Member

@alice-i-cecile alice-i-cecile Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frames field is hard to understand at a glance. It either needs a clearer name, or we should embed the "this still needs to step" information out into an enum variant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah im not convinced either. Just a convenience at this point / provides functionality to help test things.

@alice-i-cecile
Copy link
Member

Overall, I like it. This is still quite useful, even if it's not as powerful. Boilerplate and complexity reduction is very nice.

I'm generally nervous about making Update and FixedUpdate more "magic", but I think that with the right docs this could work quite well.

Note that this encourages users to move all of their UI, camera movement code and so on out of Update (and probably into PostUpdate), so they can still operate smoothly even during operation.

If this came with a list of "schedules that are stepped" I think I'd be solidly in favor; being able to modify that list at runtime would be extremely powerful for debugging. Following up on this, I think extending this same approach to system sets would work really well for fine-grained control when you need it.

@cart
Copy link
Member Author

cart commented Mar 22, 2023

Note that this encourages users to move all of their UI, camera movement code and so on out of Update (and probably into PostUpdate), so they can still operate smoothly even during operation.

With the first two "system based stepping" approaches, you could just opt those systems out (ex: camera_update.ignore_stepping()). But yeah with schedule-level granularity you would probably want to create a new NoStepUpdate schedule and add it before or after Update.

If this came with a list of "schedules that are stepped" I think I'd be solidly in favor; being able to modify that list at runtime would be extremely powerful for debugging. Following up on this, I think extending this same approach to system sets would work really well for fine-grained control when you need it.

Makes sense to me. Very straightforward to add / my head was in a similar place.

@cart
Copy link
Member Author

cart commented Mar 22, 2023

I just confirmed that this works really nicely:

Note that we'd need to rethink Time:delta() in the context of breakpoints, otherwise things will jump around a lot (and FixedUpdate will have a ton of updates each frame). We would want to clamp it to some max value in this context (or maybe all contexts?).

@dmlary
Copy link
Contributor

dmlary commented Mar 22, 2023

I'll start off by saying I'm not committed to the approach used in #8063, but I am committed to pushing for that level of convenience and functionality for bevy users (myself 😀 ).

This is undoubtably a more elegant solution for per-frame stepping, but I'm a little worried that the lack of system-stepping in this PR creates an (not intentionally) misleading impression that we won't need some mechanism to say "always run this system" for system-stepping. This is acknowledged in the PR comments, but I want to be clear that as complexity was the main hurtle to #8063 we can't compare the two approaches right now, as they do very different things.

Frame stepping in this PR differs from #8063 in that #8063 also supports remaining-frame stepping. As in, I've stepping past the systems I care about, move on to the start of the next frame so I can see it again.

Although frame-stepping is helpful, per-system stepping is an order of magnitude more powerful as it allows us to examine intra-frame system interactions. Those interactions simply cannot be seen with per-frame stepping.

The point is we could still use the "only step Update/FixedUpdate" approach (without any need to opt bevy internals or user systems in or out).

This is the part that I struggle with. It looks like the trade-off you've made to remove the cost of ignore_stepping() is to limit the scope of stepping to two specific schedules. This creates a pitfall for users where they must avoid a bevy feature (custom schedules) to be able to use stepping. They likely won't remember the footnote added to the custom schedule docs that say custom schedules won't be stepped, and think it is broken the first time they try to step.

If I'm reading this right, I think this may be the key place we differ. From the phrasing "without any need to opt bevy internals or user systems in or out", it sounds like you'd rather limit stepping to a fixed selection of schedules than add any additional complexity at all (understandable; bevy is a lot of code). I would like stepping to have the widest reach possible, but find ways to reduce the complexity for opt-out.

Perhaps there's a compromise where Update & FixedUpdate are steppable by default, but users can also opt-in their custom schedules. As things are phrased right now in the PR, it doesn't look like this is being considered.

Consider just using debuggers for per system stepping. Users can drop breakpoints on whatever systems they want.

I may be mis-reading this, but are you talking about a combination of running the app in rust-lldb, and using frame stepping? That could be a possibility, but I'd prefer to see it all possible in-app. The key difference is that entity inspector tools are much easier to use navigating entities & components than a debugger (this is said as someone who swears by their debugger; ECS makes traditional debuggers struggle).

One of the challenges to that right now is it's very hard to get a list of systems due to schedule -> system -> schedule -> system relationships that aren't maintained as data, but as code.

Overall, I am curious to see what this PR looks like with system-stepping implemented.

@cart
Copy link
Member Author

cart commented Mar 22, 2023

This is undoubtably a more elegant solution for per-frame stepping, but I'm a little worried that the lack of system-stepping in this PR creates an (not intentionally) misleading impression that we won't need some mechanism to say "always run this system" for system-stepping. This is acknowledged in the PR comments, but I want to be clear that as complexity was the main hurtle to #8063 we can't compare the two approaches right now, as they do very different things.

Yup I fully acknowledge that the "hard part" is the system stepping / thats where the nuance is (and where a lot of the value is). I tried to call that out in the description. This PR is intended to be an easy way to start the conversation about high level apis / what this should look like on a system-to-system basis: I believe that defaulting to Update/FixedUpdate being stepped and everything else not being stepped is the correct path forward, as it ensures reasonable UX and features without foisting complexity on Bevy developers. Taking the easy path first enabled me to illustrate that those boundaries work very well (and that this can be implemented very simply at the schedule level). It also opens the door to a controversy-free / easily mergeable middle ground if we're interested in that (while we solve the harder / more controversial system-level stepping problems).

Perhaps there's a compromise where Update & FixedUpdate are steppable by default, but users can also opt-in their custom schedules. As things are phrased right now in the PR, it doesn't look like this is being considered.

Allowing other schedules to be stepped (in an opt-in basis) is definitely where my head (and @alice-i-cecile's head) is at. See our comments above. I see no reason to hard code this ... this PR is just an initial / quick illustrative impl.

I may be mis-reading this, but are you talking about a combination of running the app in rust-lldb, and using frame stepping? That could be a possibility, but I'd prefer to see it all possible in-app. The key difference is that entity inspector tools are much easier to use navigating entities & components than a debugger (this is said as someone who swears by their debugger; ECS makes traditional debuggers struggle).

Yeah when actually "in" a breakpoint that would prevent inspector tools from working effectively, as theres no way to send new information to the inspector (unless the inspector logic is running in a different thread / isn't blocked by the breakpoint ... this would be unsafe but possible). You could inspect after leaving the breakpoint (when we go back to the normal "schedule stepping" logic). You could only inspect the end result of a full frame, not the intermediate results. Definitely a downside.

Overall, I am curious to see what this PR looks like with system-stepping implemented.

To help illustrate this: I think using the executor changes from your PR are an extremely valid path forward. The only significant difference would be high level defaults (FixedUpdate/Update stepped by default, everything else not stepped by default). I would like us to investigate a run-condition impl, but I think its reasonably likely that will be nonviable.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

Ok, I think I understand now. This approach switches to opt-in for stepping, at the schedule level; with Update & FixedUpdate enabled by default.

Thank you for taking the time to clarify for me.

Are you envisioning system-stepping opt-in only at the schedule level, or at both system & schedule? Or phrased another way, must all systems in a stepping schedule permit stepping?

@cart
Copy link
Member Author

cart commented Mar 23, 2023

This approach switches to opt-in for stepping, at the schedule level; with Update & FixedUpdate enabled by default.

Yup!

Are you envisioning system-stepping opt-in only at the schedule level, or at both system & schedule? Or phrased another way, must all systems in a stepping schedule permit stepping?

I think system-level stepping implies every system being "stepped" in a way (at least for normal workflows within Update/FixedUpdate). Probably 3 states for each system:

  1. Step::Continue: only runs when a "step" is run. the "step" will not end on this system.
  2. Step::Break: for a given step, execution will stop on this system and continue from here on the next step
  3. Step::Ignore (or maybe Step::Always): will run on every step no matter what.

Where step here is one "run" of the schedule, which will include zero-to-many stepped systems running and all unstepped systems running.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

By default, systems in Update/FixedUpdate would be Step::Continue and all other systems would be Step::Always.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

We might want Step::Break to be a separate piece of configuration defined outside of the schedule, as that will be frame-specific / dynamically determined by users at runtime.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

@dmlary: if you're interested in porting your PR to this general approach (while still doing the executor-based stepping), let me know. From my first pass, I don't think it would be particularly difficult. It would also make the change set much smaller / easier to review as it would remove all of the annotations from the engine and example code.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

That meets all of my core requirements; it should benefit most bevy users without having to make any changes.

There are two pieces I'm not able to envision from where we sit in run_main(). This moves stepping to be a feature of the Main schedule, which I don't think is too much of a limitation.

How do we communicate two pieces down to the ScheduleExecutors:

  • Step the schedules from where we left off until we hit a Step::Break system
    • It's a state in the Stepping resource, but is it pulled by the Schedule, or is it an argument to Schedule::run?
  • Step the remainder of the frame from wherever we last were.

One of the places I really struggled with the complexity of stepping was tracking which schedule to system-step next. My ugly implementation in #8063 "works", but it's awful to expose to a user:

let label = &state.schedule_labels[index];
let schedule = schedules.get(&**label).unwrap();
let node_id = match schedule.next_step_system_node_id() {
Some(id) => id,
None => return,
};
debug!(
"step {:?}/{}",
label,
schedule.next_step_system_name().unwrap().to_string()
);
schedule_events.send(StepSystem(label.clone()));
// if we're running the last system in this schedule, update status to
// point at the next system
if node_id == state.last_system_ids[index] {
let index = index + 1;
if index >= state.schedule_labels.len() {
state.status = Status::Step(0);
} else {
state.status = Status::Step(index);
}
}

If there's any way we could move that state management into the Stepping resource; I think it would be a huge win. I think it's possible.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

@dmlary: if you're interested in porting your PR to this general approach (while still doing the executor-based stepping), let me know. From my first pass, I don't think it would be particularly difficult. It would also make the change set much smaller / easier to review as it would remove all of the annotations from the engine and example code.

Yea, I'm absolutely willing to keep working on the PR. This is functionality I'm invested in.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

@cart Do you see a way in the code to make this also work for schedules run under Main? For example the dynamically created schedules for state-transition. This is "possible", but difficult under #8063, but if we move the stepping control into run_main() we lose the capability.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

Do you see a way in the code to make this also work for schedules run under Main? For example the dynamically created schedules for state-transition. This is "possible", but difficult under #8063, but if we move the stepping control into run_main() we lose the capability.

In a world where the executor is what supports stepping, I don't see much of a reason to limit this to Main (or use run_main as the way it is implemented). I think each Schedule would have a "default stepping behavior" or something, which could be configured per schedule (either on insertion or via Schedule::edit_schedule). Then each SystemConfig would define StepBehavior::Auto (use the default), StepBehavior::Continue, and StepBehavior::Always (and mayyyyybe StepBehavior::Break ... even though this would be static/unchangeable, its hard to deny the nice UX of app.add_systems(Update, foo.step_behavior(StepBehavior::Break)) to add break points at compile time ... but it also doesn't feel like a full solve as we'll want this to be runtime configurable I think).

I think most step state could be stored per schedule (ex: where did we stop last frame). But for stuff that needs to be shared / user-configurable, you could just pass that in via some shared ECS Resource (the executor has mutable access to World, so state can be written to / read from some Stepping resource).

I think the biggest open question is "how to define what systems to break on at runtime". I think some combination of these is the move:

A variant of add_systems that returns a list of NodeIds in the order they were defined and a way to add breakpoints:

let ids = app.add_systems_and_get_ids(Update, (foo, bar));

// later in the app (maybe in a system somewhere)
stepping.add_breakpoint(Update, ids[1]); // break on bar

// later in app (maybe in a system somewhere)
stepping.remove_breakpoint(Update, ids[1]); // stop breaking on bar

A way to query node ids based on system name / set (note we don't store this info currently ... it needs to be computed from the graph ... this is also useful for other debugging scenarios). This is actually involved enough that we might want to do it separately:

// returns a list of all node ids with the given SystemSet
let ids = update_schedule.get_node_ids(bar);
stepping.add_breakpoint(Update, ids[0]);

One of the places I really struggled with the complexity of stepping was tracking which schedule to system-step next. My ugly implementation in #8063 "works", but it's awful to expose to a user:

I bet Stepping could own this:

// default to no stepping at all
app.insert_resource(Stepping::new().step_schedule(Update).step_schedule(FixedUpdate))

Internally each schedule executor would check the Stepping resource: it would break if (1) stepping is enabled for the schedule (2) we didn't break on a different schedule this frame (3) a system within this schedule has a break point defined.

Ideally we use this to "short circuit" expensive stepping work if stepping is disabled for this schedule or there are no break points defined.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

I'd like to suggest an alteration to StepBehavior::Continue & StepBehavior::Break to support run-time control.

There are three distinct stepping states I've used with increasing granularity when debugging a problem:

  • frame step: I know it goes wrong at full speed, let me see which frame it goes wrong at
  • system step: Ok, I know it's gonna go wrong in this frame, which systems does it go sideways during
  • selective system step: I know these three systems involved, I just want to step them over and over again

I think we should split the existingStepBehavior into two pieces. The compile-time SteppingCapable that is equivalent to ignore/observe stepping stored in SystemConfig; this would have SteppingCapable::Auto to inherit from the schedule.

Then a run-time piece StepBehavior in the Schedule that only applies to systems that are observing stepping:

  • StepBehavior::Stop: When it is time to run this system when stepping, run this system and run no other SteppingCapable systems
    • This stop behavior will only be observed when we're system-stepping, and ignored when frame-stepping
    • This is the default; system stepping stops at every system observing stepping
  • StepBehavior::Continue: When it's time to run this system when stepping, run this system, and continue until a Stop system is run
  • StepBehavior::Skip: Do not run this system while stepping, but continue to the next one when system-stepping (new feature)

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

For modifying which systems to stop on:

You're bringing up a new interface for me to consider here, and that is the programmatic enabling of stepping on a system.

UI Approach

The primary way I've been thinking about this is a UI that displays all systems in schedules with checkboxes. Example of something rough I've done in the past, the second column of checkmarks is effectively ignore_stepping().

Screen.Recording.2023-02-14.at.9.53.59.PM.mov

This approach is possible with the helpers added in #8063 on Schedule to get System and NodeId.

Programmatic approach

For a programmatic approach, I love this interface:

fn bar(...) {}

let ids = update_schedule.get_node_ids(bar);

I didn't realize this was possible in rust. If there's an example somewhere doing the comparison of Fn with a function pointer, I'd love to see it. My background is a lot of C, so I don't know what's actually possible in rust.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

Then a run-time piece StepBehavior in the Schedule that only applies to systems that are observing stepping:
StepBehavior::Stop: When it is time to run this system when stepping, run this system and run no other SteppingCapable systems

I don't see how Stop is different from "break" conceptually. I don't think creating new concepts for runtime vs compiletime is helpful here. In general I think the only difference between the two concepts should be: is it "defined/fixed at compile time" or "added/removed at run time". In fact to start, I think skipping fixed compile-time configuration is a reasonable corner to cut while we sort out the runtime apis. Defining configuration directly on systems should just be a convenience (if we decide that is actually useful relative to the runtime apis).

It sounds like the only missing piece here in my proposal to enable all of the scenarios you mentioned above (other than the new suggested Skip state, which seems useful but feels orthogonal) is a stepping.break_on_next_system(Update) operation (which inserts a runtime break on the next system configured with Continue or Break ... either at compile time or run time).

We might want the optional concept of "removing breakpoints on hit". To remove the need to manually remove them.

Also one reason to keep this runtime configuration out of a Schedule: during the execution of a schedule systems don't have access to that schedule (it is removed from the world). If we store this in the Stepping resource, we can queue up Update system breaks from within Update.

Internally, Stepping would contain:

  • Global per-Schedule config (defaulting to no stepping if nothing is defined for a schedule). The state could really just be a ScheduleLabel -> stepping_enabled_bool map. If stepping is enabled, assume all systems are Continue unless otherwise specified.
  • a ScheduleLabel -> [NodeId -> StepBehavior] map. Where StepBehavior consists of Continue, Break, Skip, Always

Some examples (using runtime-only apis, pretending static system config doesn't exist). Note that Stepping could be configured in a UI:

// Break on a specific system in update
app
  .insert_resource(
    Stepping::new()
      .enable_stepping(Update)
      // this will continue to break on foo for each update, until the breakpoint is removed
      .with_breakpoint(Update, foo)
  )
  .add_system(Update, foo)

// Break on the next steppable system in Update
fn break_next(input: Res<Input<KeyCode>>, mut stepping: ResMut<Stepping>) {
  if input.just_pressed(KeyCode::Space) {
    stepping.remove_breakpoints_on_hit = true;
    stepping.break_on_next_system(Update)
  }
}

@cart
Copy link
Member Author

cart commented Mar 23, 2023

I didn't realize this was possible in rust. If there's an example somewhere doing the comparison of Fn with a function pointer, I'd love to see it. My background is a lot of C, so I don't know what's actually possible in rust.

Yup this is possible! add_systems already kind of does this. The chain() implementation collects NodeIds in this way and we could easily return them.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

I don't see how Stop is different from "break" conceptually.

I think I'm approaching this from a different direction. It's the concept of next & step in a debugger, vs break & continue.

I'm trying to implement stepping using step semantics. From what I'm seeing, you want to implement more break & continue. The only difference is the amount of work required to perform the other behavior.

If we start with step semantics, it's more work to switch to break semantics; you have to mark most systems as Continue.

If we start with break semantics, to get a step behavior, you have to mark everything as Stop or Break.

I would prefer we default to step behavior because it can function to debug a problem without any extra work. If we start with the break semantics, you can't do system stepping without marking things as Stop.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

Oh, or does Break only stop on step-frame? During step-system we always stop after running any system that isn't ignoring stepping.

I think what I'm trying to propose is:

System Stepping Behavior Step Frame Step System Details
Ignore Stepping Run & Continue Run & Continue Always run these systems
Break Run & Stop Run & Stop Breakpoints make sense in frame-step
Continue Run & Continue Run & Stop System-step will stop after running any system that observes stepping
AutoNext Run & Continue Run & Continue This allows system-step to automatically step over some systems
Skip Skip & Continue Skip & Continue Effectively disable system while stepping

I added AutoNext as a mechanism to bridge between system-stepping and frame-stepping with breakpoints. It's not a great implementation.

We probably should end up saying that "if you don't want to step through all systems, you need to switch to frame-step mode with breakpoints.

Ignore Stepping is an important state because it distinguishes from Continue to determine if system-stepping should stop after running the system.

@Nilirad
Copy link
Contributor

Nilirad commented Mar 23, 2023

Does the in-app system stepping approach have some advantage over what debuggers already have, apart from ease of use?

I was thinking about the possibility of having a launcher process that also acts as an abstraction over a debugger. It would work as this:

  • The launcher launches the app through a debugger.
  • The app communicates to the launcher how its schedule is structured.
  • Given that information, the launcher exposes a user friendly UI to control system execution.

Ultimately, the launcher role can be implemented on the editor itself.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

Does the in-app system stepping approach have some advantage over what debuggers already have, apart from ease of use?

No, nothing more than ease of use.

You can do both system stepping and frame stepping using a debugger. However, it takes a lot of knowledge to be able to do it. To do system-step, you need to put breakpoints after every system. Well, we can do that at the return from Schedule::run(), no problem. But, quickly you discover it's not every system you want to step, but a majority of them (input & render are necessary for control & debugging), so that breakpoint now becomes a conditional breakpoint with a mess of conditions. And as you go along you want to change which systems you stop after.

This is possible, but not accessible by any means. This difficulty barrier means nobody will do this except in the most complex situations with a lot of effort.

@cart
Copy link
Member Author

cart commented Mar 23, 2023

Oh, or does Break only stop on step-frame? During step-system we always stop after running any system that isn't ignoring stepping.

Ahhh I was thinking break actions would be the only way to stop execution mid-frame. "System stepping" would be implemented via break points (via the break_on_next_system() function) rather than being a separate action.

That being said, I do see the appeal of a separate "system step" action / that does feel reasonable to me. I'm down for the behaviors in your table.

@dmlary
Copy link
Contributor

dmlary commented Mar 23, 2023

@cart ok, I’ll start implementing according to that table, with the schedule opt-in we discussed earlier. I’ll open it as a separate PR when it’s functional

@dmlary
Copy link
Contributor

dmlary commented Mar 24, 2023

@cart I'm hitting an architectural fork and I'd like your input.

Stepping uses ScheduleLabel (really BoxedSteppingLabel) as the key for per-schedule stepping.

I was planning to update Schedule::run() to do something like:

// grab the stepping config
let mut stepping = world.resource_mut::<Stepping>();

// build a skip list for this system
let skip = stepping.get_systems_skip_list(&self);

// call the executor, passing it the skip list
executor.run(executable, skip);

The problem is that Schedules don't know their own labels. These are held up at the App & World level under the Schedules resource.
To make things worse, when a Schedule is being run, it is removed from Schedules, so I can't look it up in Schedule::run().

I see two paths:

  • Update Schedule to include it's own label
    • We can set it during add_schedule(), but that changes schedule to mut schedule
    • Or make label an argument to schedule creation, then we can drop it from add_schedule()
  • Move the generation of skip list up to World::try_run_schedule_ref()
    • This will require either we add a new argument to Schedule::run() for the skip list, or some pre-run command to set the skip-list
    • This also means that we need to update the directions for running custom schedules to use World::run_schedule() instead of Schedule::run()

I would prefer to keep all of this stepping logic in Schedule, but right now we can't tell which schedule we're in.

Maybe pass the label in as an argument to Schedule::run()?

@dmlary
Copy link
Contributor

dmlary commented Mar 25, 2023

I’m going to move forward with the adding label to Schedule path. I’ll make add_ schedule call Schedule::set_label(). This adds the needed info without introducing a breaking change. It also allows custom schedules not added to still use stepping.

@dmlary
Copy link
Contributor

dmlary commented Mar 26, 2023

I didn't realize this was possible in rust. If there's an example somewhere doing the comparison of Fn with a function pointer, I'd love to see it. My background is a lot of C, so I don't know what's actually possible in rust.

Yup this is possible! add_systems already kind of does this. The chain() implementation collects NodeIds in this way and we could easily return them.

I'm still struggling with System equality comparisons. I'm trying to support:

stepping.always_run(Update, my_system);

And I implemented the following on Stepping to set the stepping behavior for a system:

    pub fn always_run<Marker>(
        &mut self,
        schedule: impl ScheduleLabel,
        system: impl IntoSystem<(), (), Marker>,
    ) -> &mut Self
    {
        self.system_behavior_updates.push(SystemBehaviorUpdate {
            schedule: Box::new(schedule),
            system: Box::new(IntoSystem::into_system(system)),
            behavior: StepBehavior::AlwaysRun,
        });

        self
    }

These updates are queued because we don't want them executing mid-frame; it would cause things like double-stepping.

The Problem

I have a BoxedSystem in the behavior updates, and there's a BoxedSystem in the Schedule, but I can't compare them for equality:

error[E0369]: binary operation `==` cannot be applied to type `dyn system::system::System<Out = (), In = ()>`
   --> crates/bevy_ecs/src/schedule/stepping.rs:261:28
    |
261 |                 if *system == **inner {
    |                    ------- ^^ ------- dyn system::system::System<Out = (), In = ()>
    |                    |
    |                    dyn system::system::System<Out = (), In = ()>

For more information about this error, try `rustc --explain E0369`.

I looked into chain() like you suggested, but none of the implementations did any comparison between BoxedSystems. Even digging down into the ScheduleGraph, I wasn't able to find any methods for checking to see if a System already existed in the graph.

Workaround

For right now, I'm comparing using System::name() to find matches, but that feels dangerous/error prone.

@cart do you know of a better approach here?

@dmlary
Copy link
Contributor

dmlary commented Mar 26, 2023

Lol, ok, I just figured out System::type_id() is unique per function turned into a System.

@cart
Copy link
Member Author

cart commented Mar 28, 2023

Sorry for the late reply!

I’m going to move forward with the adding label to Schedule path. I’ll make add_ schedule call Schedule::set_label(). This adds the needed info without introducing a breaking change. It also allows custom schedules not added to still use stepping.

This seems reasonable for now. We might find a better way to handle the dataflow later, but I'm reasonably happy with this.

@cart do you know of a better approach here?

I think we should generally only use two things for system identity: SystemLabel (a one to many relationship) and NodeId (a resolved reference to a specific system in a schedule).

I think stepping.always_skip(Update, some_system) should treat some_system as a SystemLabel, and then the schedule should be able to resolve that to a list of systems/NodeIds containing that label. I guess if there is more than one system we can resolve that in some way (ex: pick the first one silently , pick the first one and warn, hard-fail and require an explicit NodeId, etc)

@dmlary
Copy link
Contributor

dmlary commented Mar 28, 2023

No worries, I'm still making progress; it's just a bit slower, trying to avoid architectural mis-steps.

I think we should generally only use two things for system identity: SystemLabel (a one to many relationship) and NodeId (a resolved reference to a specific system in a schedule).

Agreed.

I think stepping.always_skip(Update, some_system) should treat some_system as a SystemLabel [...] I guess if there is more than one system we can resolve that in some way (ex: pick the first one silently , pick the first one and warn, hard-fail and require an explicit NodeId, etc)

For the SystemLabel interface, any change to system behavior by ScheduleLabel applies to all systems in the schedule with that label. This felt like the correct "explicit" behavior, and matches what debuggers do. If you add a breakpoint for a symbol that exists in multiple dlls, it is added for all instances of that symbol.

For more control, there will be a NodeId-based interface too. We need this anyway to make it easier to implement a UI for controlling stepping.

[...] and then the schedule should be able to resolve that to a list of systems/NodeIds containing that label.

I'm doing this a little differently. The Stepping resource is maintaining the map of NodeIds to StepBehavior per Schedule, and it's doing the resolution of SystemLabel to NodeId when the Schedule is available to it. I'm doing it this way because the StepBehavior isn't necessarily Schedule data, and I'm trying to keep all the feature data in the same place.

Work on this is on-going here: https://github.com/dmlary/bevy/tree/stepping-resource

@cart
Copy link
Member Author

cart commented Mar 29, 2023

For the SystemLabel interface, any change to system behavior by ScheduleLabel applies to all systems in the schedule with that label. This felt like the correct "explicit" behavior, and matches what debuggers do. If you add a breakpoint for a symbol that exists in multiple dlls, it is added for all instances of that symbol.

Makes sense. I dig it!

I'm doing this a little differently. The Stepping resource is maintaining the map of NodeIds to StepBehavior per Schedule, and it's doing the resolution of SystemLabel to NodeId when the Schedule is available to it. I'm doing it this way because the StepBehavior isn't necessarily Schedule data, and I'm trying to keep all the feature data in the same place.

Sounds reasonable to me :)

@dmlary
Copy link
Contributor

dmlary commented Apr 12, 2023

Still working on this, but minor update:

FixedUpdate is gonna be funky for stepping because it can be called multiple times in a single frame.

To handle this from the stepping side, we'll only run systems the first time the schedule is called within a render frame. This is likely gonna have all manner of weird side-effects. Also, the longer you wait between steps, the more times FixedUpdate tries to run. This may be something we account for in run_fixed_update_schedule() at a later point.

@alice-i-cecile
Copy link
Member

Closing in favor of #8453.

github-merge-queue bot pushed a commit that referenced this pull request Feb 3, 2024
# Objective

Add interactive system debugging capabilities to bevy, providing
step/break/continue style capabilities to running system schedules.

* Original implementation: #8063
    - `ignore_stepping()` everywhere was too much complexity
* Schedule-config & Resource discussion: #8168
    - Decided on selective adding of Schedules & Resource-based control

## Solution
Created `Stepping` Resource. This resource can be used to enable
stepping on a per-schedule basis. Systems within schedules can be
individually configured to:
* AlwaysRun: Ignore any stepping state and run every frame
* NeverRun: Never run while stepping is enabled
    - this allows for disabling of systems while debugging
* Break: If we're running the full frame, stop before this system is run

Stepping provides two modes of execution that reflect traditional
debuggers:
* Step-based: Only execute one system at a time
* Continue/Break: Run all systems, but stop before running a system
marked as Break

### Demo

https://user-images.githubusercontent.com/857742/233630981-99f3bbda-9ca6-4cc4-a00f-171c4946dc47.mov

Breakout has been modified to use Stepping. The game runs normally for a
couple of seconds, then stepping is enabled and the game appears to
pause. A list of Schedules & Systems appears with a cursor at the first
System in the list. The demo then steps forward full frames using the
spacebar until the ball is about to hit a brick. Then we step system by
system as the ball impacts a brick, showing the cursor moving through
the individual systems. Finally the demo switches back to frame stepping
as the ball changes course.


### Limitations
Due to architectural constraints in bevy, there are some cases systems
stepping will not function as a user would expect.

#### Event-driven systems
Stepping does not support systems that are driven by `Event`s as events
are flushed after 1-2 frames. Although game systems are not running
while stepping, ignored systems are still running every frame, so events
will be flushed.

This presents to the user as stepping the event-driven system never
executes the system. It does execute, but the events have already been
flushed.

This can be resolved by changing event handling to use a buffer for
events, and only dropping an event once all readers have read it.

The work-around to allow these systems to properly execute during
stepping is to have them ignore stepping:
`app.add_systems(event_driven_system.ignore_stepping())`. This was done
in the breakout example to ensure sound played even while stepping.

#### Conditional Systems
When a system is stepped, it is given an opportunity to run. If the
conditions of the system say it should not run, it will not.

Similar to Event-driven systems, if a system is conditional, and that
condition is only true for a very small time window, then stepping the
system may not execute the system. This includes depending on any sort
of external clock.

This exhibits to the user as the system not always running when it is
stepped.

A solution to this limitation is to ensure any conditions are consistent
while stepping is enabled. For example, all systems that modify any
state the condition uses should also enable stepping.

#### State-transition Systems
Stepping is configured on the per-`Schedule` level, requiring the user
to have a `ScheduleLabel`.

To support state-transition systems, bevy generates needed schedules
dynamically. Currently it’s very difficult (if not impossible, I haven’t
verified) for the user to get the labels for these schedules.

Without ready access to the dynamically generated schedules, and a
resolution for the `Event` lifetime, **stepping of the state-transition
systems is not supported**

---

## Changelog
- `Schedule::run()` updated to consult `Stepping` Resource to determine
which Systems to run each frame
- Added `Schedule.label` as a `BoxedSystemLabel`, along with supporting
`Schedule::set_label()` and `Schedule::label()` methods
- `Stepping` needed to know which `Schedule` was running, and prior to
this PR, `Schedule` didn't track its own label
- Would have preferred to add `Schedule::with_label()` and remove
`Schedule::new()`, but this PR touches enough already
- Added calls to `Schedule.set_label()` to `App` and `World` as needed
- Added `Stepping` resource
- Added `Stepping::begin_frame()` system to `MainSchedulePlugin`
    - Run before `Main::run_main()`
    - Notifies any `Stepping` Resource a new render frame is starting
    
## Migration Guide
- Add a call to `Schedule::set_label()` for any custom `Schedule`
    - This is only required if the `Schedule` will be stepped

---------

Co-authored-by: Carter Anderson <[email protected]>
tjamaan pushed a commit to tjamaan/bevy that referenced this pull request Feb 6, 2024
# Objective

Add interactive system debugging capabilities to bevy, providing
step/break/continue style capabilities to running system schedules.

* Original implementation: bevyengine#8063
    - `ignore_stepping()` everywhere was too much complexity
* Schedule-config & Resource discussion: bevyengine#8168
    - Decided on selective adding of Schedules & Resource-based control

## Solution
Created `Stepping` Resource. This resource can be used to enable
stepping on a per-schedule basis. Systems within schedules can be
individually configured to:
* AlwaysRun: Ignore any stepping state and run every frame
* NeverRun: Never run while stepping is enabled
    - this allows for disabling of systems while debugging
* Break: If we're running the full frame, stop before this system is run

Stepping provides two modes of execution that reflect traditional
debuggers:
* Step-based: Only execute one system at a time
* Continue/Break: Run all systems, but stop before running a system
marked as Break

### Demo

https://user-images.githubusercontent.com/857742/233630981-99f3bbda-9ca6-4cc4-a00f-171c4946dc47.mov

Breakout has been modified to use Stepping. The game runs normally for a
couple of seconds, then stepping is enabled and the game appears to
pause. A list of Schedules & Systems appears with a cursor at the first
System in the list. The demo then steps forward full frames using the
spacebar until the ball is about to hit a brick. Then we step system by
system as the ball impacts a brick, showing the cursor moving through
the individual systems. Finally the demo switches back to frame stepping
as the ball changes course.


### Limitations
Due to architectural constraints in bevy, there are some cases systems
stepping will not function as a user would expect.

#### Event-driven systems
Stepping does not support systems that are driven by `Event`s as events
are flushed after 1-2 frames. Although game systems are not running
while stepping, ignored systems are still running every frame, so events
will be flushed.

This presents to the user as stepping the event-driven system never
executes the system. It does execute, but the events have already been
flushed.

This can be resolved by changing event handling to use a buffer for
events, and only dropping an event once all readers have read it.

The work-around to allow these systems to properly execute during
stepping is to have them ignore stepping:
`app.add_systems(event_driven_system.ignore_stepping())`. This was done
in the breakout example to ensure sound played even while stepping.

#### Conditional Systems
When a system is stepped, it is given an opportunity to run. If the
conditions of the system say it should not run, it will not.

Similar to Event-driven systems, if a system is conditional, and that
condition is only true for a very small time window, then stepping the
system may not execute the system. This includes depending on any sort
of external clock.

This exhibits to the user as the system not always running when it is
stepped.

A solution to this limitation is to ensure any conditions are consistent
while stepping is enabled. For example, all systems that modify any
state the condition uses should also enable stepping.

#### State-transition Systems
Stepping is configured on the per-`Schedule` level, requiring the user
to have a `ScheduleLabel`.

To support state-transition systems, bevy generates needed schedules
dynamically. Currently it’s very difficult (if not impossible, I haven’t
verified) for the user to get the labels for these schedules.

Without ready access to the dynamically generated schedules, and a
resolution for the `Event` lifetime, **stepping of the state-transition
systems is not supported**

---

## Changelog
- `Schedule::run()` updated to consult `Stepping` Resource to determine
which Systems to run each frame
- Added `Schedule.label` as a `BoxedSystemLabel`, along with supporting
`Schedule::set_label()` and `Schedule::label()` methods
- `Stepping` needed to know which `Schedule` was running, and prior to
this PR, `Schedule` didn't track its own label
- Would have preferred to add `Schedule::with_label()` and remove
`Schedule::new()`, but this PR touches enough already
- Added calls to `Schedule.set_label()` to `App` and `World` as needed
- Added `Stepping` resource
- Added `Stepping::begin_frame()` system to `MainSchedulePlugin`
    - Run before `Main::run_main()`
    - Notifies any `Stepping` Resource a new render frame is starting
    
## Migration Guide
- Add a call to `Schedule::set_label()` for any custom `Schedule`
    - This is only required if the `Schedule` will be stepped

---------

Co-authored-by: Carter Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants