Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greedy Stageless #34

Closed
wants to merge 1 commit into from
Closed

Greedy Stageless #34

wants to merge 1 commit into from

Conversation

hymm
Copy link

@hymm hymm commented Oct 6, 2021

This is a proposal for an API and algorithm for a stageless scheduler.

Rendered

@hymm
Copy link
Author

hymm commented Oct 6, 2021

I need to read more code to resolve some of the remaining questions in the PR, but it feels like it's in a decent enough state to get feedback on.

@alice-i-cecile alice-i-cecile self-assigned this Oct 6, 2021
@maniwani
Copy link

maniwani commented Oct 7, 2021

Stages tend to be a very heavy abstraction, which require reasoning about the global behavior of your game and grouping systems in not necessarily the most logical way, but in a way that tries to maximize parallelism. This is because systems that are not in the same stage cannot be run in parallel. Trying to understand the possible parallelism of the program is hard for the user to reason about. Instead the user should only be thinking about the data flow through the program.

I sort of disagree with your logic here. It's wondering "What systems can run in parallel?" that's confusing and why we have a fancy executor. Stages are simple to reason about. All there is to know is that non-exclusive systems in a stage can run in parallel but systems in different stages can't. The move to dismantle stages and expose sync points as their own thing is so users can be more deliberate about it.

AFAIK the current scheduling algorithm works fine. "Stageless" is mostly about having a single executor that owns all systems, because being able to evaluate all system dependencies opens up more chances to run them in parallel. Stage has to be dismantled for this to happen, but we can still have stages.

Stages will always exist whether or not we choose to represent them. At minimum, a stage is just a label for a pair of sync points. The problem right now is that Stage is a lot more than just a label.

IMO not having them in some form is what will make things more confusing.

Currently bevy evaluates all run criteria once before any systems are run and again after all parallel systems run. The proposed loop does not have an equivalent to this as parallel systems are split up between hard sync points.

A problem with looping criteria that will need to be solved is that they will need to work across hard sync points.

An possible alternative for adding hard sync points would be to explicitly create named sync points, which systems can be then placed before or after. This would be much closer to how stages currently work, but much easier than stages to add sync points. However I don't think this is a good idea. While more flexible than stages, it still has the same cognitive load of which logical division of your program you should group unrelated systems into to maximize parallelism.

These are all big reasons why I think we still need stages. Transitions (especially loops) only make sense in the context of stages. Run criteria are the same IMO. The best time to evaluate them is at the beginning of a stage.

Stages just being labels for pairs of sync points (that can be non-adjacent) is a solution worth exploring here.

Run Criteria and Looping Criteria

I agree that run criteria have too many responsibilities and that transitions need to be their own thing. I just wanted to say that loops aren't special. A loop is a specific kind of transition, but all transitions are the same. You're queuing up the next stage to run after the current one finishes. Loops just so happen to queue the same stage.

The problem right now is that we're pretending things like looping, state transitions, etc. are different. AFAIK we just need one transition mechanism for everything and we're good. Then the remaining problem is "What do you do when more than one transition was queued?"

Scheduling when to apply commands

If we can somehow make the automatic command processing order deterministic by default, that'd be best. I think everyone would prefer if they didn't have to specify that. You normally wouldn't care which permutation it is, just that it's the same on all computers.


- If _parallel system A_ depends on _parallel system B_ it can be queued to run in a step because parallel system B will be able to run in the current parallel step.
- If _parallel system A_ depends on _exclusive system B_ that is a direct blocking dependency and cannot run in the current parallel step. It has to wait until system B runs in some exclusive step.
- It gets a little trickier because if a system has a dependency on a system that has blocking dependency it will also be blocked. i.e. If _parallel system C_ depends on _parallel system A_ which has a dependency on _exclusive system B_, it also cannot be queued to run, because A is blocked by B which blocks C from running. In this scenario a system A will move from a blocking dependency to a non-blocking dependency once C is run.
Copy link

@Weibye Weibye Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario a system A will move from a blocking dependency to a non-blocking dependency once C is run.

Is B and C switched in this sentence?

In this scenario a system A will move from a blocking dependency to a non-blocking dependency once B is run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah should probably read

non-blocking dependency for C once B is run.

Copy link

@Nilirad Nilirad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it 👍🏻
Added some comments that may improve the RFC.


Systems with commands have two parts, a parallel part that can be run in parallel with other systems that the user codes and a command queue that needs to be applied to the world during a hard sync point. Since these happen at different times they should be scheduled separately. This is done through three new system descriptor builder functions.

- `.commands_label(Label)` gives the point where the command queue is applied a `Label` which other systems can be scheduled around.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: instead of a manually created commands_label, I wonder if it's possible to generate it automatically and access it as a method of SystemLabel.

e.g.

App::new()
        .add_system(systemA.label("System A")
        .add_system(systemB.after("System A".into().commands()));

// Alternatively, using #[derive(SystemLabel)]:

#[derive(SystemLabel, ...)]
enum Systems {
    A,
    ...
}

App::new()
        .add_system(systemA.label(Systems::A)
        .add_system(systemB.after(Systems::A::commands()));

Copy link
Author

@hymm hymm Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: thought about this some more and disliked what my original comment said.

It would be nice to come up with a label for one less thing, but not feeling great about needing the .into to get to the commands. Not sure which I prefer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not feeling great about needing .into to get to the commands. Not sure which I prefer.

That was just for illustrative purposes. It surely needs some bikeshedding. The idea was to use .into to make it a SystemLabel and then access its associated command label from there. Obviously there is a better approach to this.

Copy link

@Nilirad Nilirad Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some reasoning, I think manual labeling through commands_label is still needed in some cases. The most obvious example is where you give to two or more command queues the same label in order to flush them in the same sync point. I still think a form of default labeling could be made though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll add default labeling to the future work section.


### Run Criteria

Run criteria determine whether or not a system will run. They now are separated from LoopingCriteria. A RunCriteria system is a system that returns a bool with true for run the dependent system and false for not running the system. Systems without run criteria will be run once per tick.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatible with #33 🙂

A looping criteria is a system that returns a bool. Returning true makes a system reevaluate its run criteria and false means the system is done running for this tick. Adding looping criteria introduces an implicit requirement for a hard sync. The hard sync is required to reevaluate looping criteria and run criteria. Systems scheduled after a looping system will run after the looping criteria returns false.

```rs
fn loop_twice(loop_count: Local<int32>) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32, not int32. Maybe usize is preferable in this case.

You also missed -> bool

Comment on lines +194 to +195
- Mapping the current stages onto the new method may get confusing. If a
system is in PREUPDATE and another system is in POSTUPDATE what is the relationship between the two? Are they related at all? Are they implicitly depending on something happending in UPDATE? It will take some careful thought to correctly map the current stages and systems to this new api.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A no-brainer solution would be:

PREUPDATE -> PREUPDATE(commands) -> UPDATE -> UPDATE(commands) -> POSTUPDATE

Maybe not the best solution, but the behavior of the three systems should not change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might need more words here. This is about taking the existing systems added by default plugins and mapping them to the new ordering API without stages where PREUPDATE, UPDATE, and POSTUPDATE no longer exist. These systems currently have an explicit ordering defined by the ordering of the stages, but the exact mapping that needs to take place has no direct mapping in the new API. Ideally we'll need to reason about the relations between data used by the current systems, but because the ordering of systems defined by stages doesn't really encode this it will be a very manual process. We could just throw them in system sets that mirror the current stages, but that won't maximize the benefits of the new API.

@hymm
Copy link
Author

hymm commented Oct 7, 2021

@maniwani Thanks for the response. This is really good feedback. You've given me a lot to think about and revise.

For the motivation section, I'm not sure how much we're disagreeing with what stages should be replaced with vs how much I failed at writing it. I probably need an example of how the new scheduler API's would look like for a toy game.

So to me, stages are an overloaded abstraction that mix a bunch of different concerns. They cover ordering systems across hard sync points, grouping parallel systems, and transitions between states. By removing explicit stages we give the responsibility of grouping parallel systems completely to the executor. This RFC currently is really only a solution to ordering systems across hard sync points and a proposal for how the executor needs to change to account for that. For transitions between states I sort of hand waved it away as being a problem for the states RFC, but your comments are making me reconsider. A full stageless proposal does need to give a solution to that.

My argument at the end of the motivation section was trying to argue against the explicit grouping of systems between sync points. Not if this example makes things more clear, but will try. Say you're adding some new feature to audio and realize you need a hard sync point between some of the audio systems which run in POSTUPDATE. So you make a POSTPOSTUPDATE stage and move the required audio systems to that stage. Now you have your parent update system and your transform propagation system still in POSTUPDATE, but if you move the transform propagation system to POSTPOSTUPDATE will you get better parallelism? With forcing all sync points to be declared explicitly the user will always have to reason about what grouping of parallelism to put their systems into. Removing the explicitness lets the executor reason about it.

Transitions (especially loops) only make sense in the context of stages. Run criteria are the same IMO. The best time to evaluate them is at the beginning of a stage.

I did struggle with when to evaluate criteria which is why it's still exists as question in the rfc. It's very possible that that my solution isn't the best. Might need to give it more thought in relation to states.

Stages just being labels for pairs of sync points (that can be non-adjacent) is a solution worth exploring here.

It's possible you do need explicit labels for transition points. But I think you might be able to get away with using a redesign states and system sets for this, but I don't have a fully thought out solution for this so it might not. That said this proposal is compatible with adding explicit labeled hard sync (transition?) points if it's necessary.

A loop is a specific kind of transition, but all transitions are the same.

The point that looping criteria and states are related is really interesting. Feels like they should be combined somehow...

make the automatic command processing order deterministic by default

I'm not sure how you can get deterministic application of commands without being explicit. i.e. If system A and B can be run in parallel, but system A's commands need to run before system B's.

@alice-i-cecile
Copy link
Member

I'm not sure how you can get deterministic application of commands without being explicit. i.e. If system A and B can be run in parallel, but system A's commands need to run before system B's.

The before_if_needed abstraction may be able to help us out here IMO. We could comfortably overload this to order any commands produced, even if the systems themselves are parallelizable.

@maniwani
Copy link

maniwani commented Oct 7, 2021

I'm not sure how much we're disagreeing with what stages should be replaced.

To me, stages are an overloaded abstraction that mix a bunch of different concerns. They cover ordering systems across hard sync points, grouping parallel systems, and transitions between states. By removing explicit stages we give the responsibility of grouping parallel systems completely to the executor.

My point there was that having stages and having a single executor are not mutually exclusive. I agree that Stage is an overloaded type, but the concept of a "stage" is still is the most intuitive way of organizing systems IMO. Like, the specific issue with the current implementation is that each Stage holds its own systems and its own executor.

I think we still need stages, by which I mean an explicit thing for grouping systems between two sync points. I don't see how you can implicitly define behavior that spans across sync points.

My argument at the end of the motivation section was trying to argue against the explicit grouping of systems between sync points. Not if this example makes things more clear, but will try. Say you're adding some new feature to audio and realize you need a hard sync point between some of the audio systems which run in POSTUPDATE. So you make a POSTPOSTUPDATE stage and move the required audio systems to that stage. Now you have your parent update system and your transform propagation system still in POSTUPDATE, but if you move the transform propagation system to POSTPOSTUPDATE will you get better parallelism? With forcing all sync points to be declared explicitly the user will always have to reason about what grouping of parallelism to put their systems into. Removing the explicitness lets the executor reason about it.

So if I understand correctly, you're saying, "When I need to split a stage, which systems should go where?"

In your example, only the dependencies between the audio systems changed, so it doesn't matter if the transform systems go in PostUpdate or PostPostUpdate. They can still run in parallel with either set of audio systems (assuming they could before).

"How can I do more in parallel?" is a design question. No matter how you implement it, the scheduler can't add concurrency that doesn't exist. It can only make the best of the schedule it's given. If you decide that commands from system A need to be applied before some other system B runs, then that's that.

I agree that the ergonomics in that situation are pretty bad tho. I think that could be improved a bit with stage hierarchies. (In general, I think most issues with stages can be solved by making stages hierarchical.)

E.g. (everything here would become a stage, Frame would just loop)

Main
- Startup
- Frame
  - First
  - PreUpdate
  - Update
  - PostUpdate
    - Audio
    - Anonymous child stage (holds your other audio systems and your transform systems)
  - Render
  - Last
- Shutdown

It's possible you do need explicit labels for transition points. But I think you might be able to get away with using a redesign states and system sets for this, but I don't have a fully thought out solution for this so it might not. That said this proposal is compatible with adding explicit labeled hard sync (transition?) points if it's necessary.

With transitions, the point I wanted to make is that it's not generally safe to make any sort of transition while there are still pending commands. So you can only safely "jump" from sync point to sync point, or rather from the end of one stage to the beginning of another (since all sync points belong to at least one stage implicitly).

States are basically just syntactic sugar. They let you "insert" stages in an ad-hoc way, but you can mentally flatten the "state stack" back into your main schedule.

@hymm
Copy link
Author

hymm commented Oct 8, 2021

E.g. (everything here would become a stage, Frame would just loop)

So this is how I would model your stages with the API I'm thinking about with the caveat that I don't really have a design yet for states or nested system sets.

enum AppState {
    Idle
    Running
    Done
}

fn main() {
    App::new()
        .add_state(AppState::Idle)
        .add_system(transition_to_running)
        .add_system_set(
            SystemSet::new()
                .label("Startup")
                .with_run_criteria(AppState::Running::on_enter)
        )
        .add_system_set(
            SystemSet::new()
                .label("Frame")
                .with_run_criteria(AppState::Running::on_update)
                .add_system_set(
                    SystemSet::new().commaands_label("First")
                )
                .add_system_set(
                    SystemSet::new().commands_label("PreUpdate").after("First")
                )
                // Update, Render, and Last will be similar to above
                .add_system_set(
                    SystemSet::new()
                        .label("PostUpdate")
                        .after("Update")
                        .add_system_set(
                            SystemSet::new()
                                .label("Audio")
                                .add_system(
                                    audio_system_a.command_label("audio a commands")
                                )
                                .add_system(
                                    audio_system_b.after("audio a commands")
                                )
                        )
                        .add_system_set(
                            SystemSet::new()
                                .label("Transform")
                                .add_system(
                                    parent_update.label("parent update")
                                )
                                .add_system(
                                    transform_propagation.after("parent update")
                                )
                        )
                )
        )
        .add_system_set(
            SystemSet::new()
                .label("Shutdown")
                .with_run_criteria(AppState::Running::on_exit)
        )
}

So adding a hard sync point here between audio_system_a and audio_system_b would just be changing from .label to .command_label.

As an aside, I wouldn't organize things like this. I would just have have Audio and Transform inside of Frame and schedule these system sets against the commands label for things like Simulation, Physics and Input. Also there are probably cleaner possible API's with things like the SystemGraph PR.

so it doesn't matter if the transform systems go in PostUpdate or PostPostUpdate.

"How can I do more in parallel?" is a design question.

You can also split them up so half of them are in PostUpdate and the other half are in PostPostUpdate. In this example it probably doesn't matter, but in practice this could lead to better parallelism. I really wish I had a better example, but my point was that the user shouldn't think about this at all, whether it's in the design phase or while coding. Changing things about the audio system should have no effect on how you think about transform systems.

With transitions, the point I wanted to make is that it's not generally safe to make any sort of transition while there are still pending commands.

That makes a lot of sense. It means that updating state, run criteria, and looping criteria should be done between the run of exclusive systems and parallel systems in my 2-step loop.

@maniwani
Copy link

maniwani commented Oct 8, 2021

You can also split them up so half of them are in PostUpdate and the other half are in PostPostUpdate. In this example it probably doesn't matter, but in practice this could lead to better parallelism.

Whether systems can execute in parallel or not is only based on dependencies. If you have the option to put systems in one stage or the other, then both choices must be equally parallelizable. There won't be a difference.

I really wish I had a better example, but my point was that the user shouldn't think about this at all, whether it's in the design phase or while coding. Changing things about the audio system should have no effect on how you think about transform systems.

Since they have to label everything, users should have at least a rough idea of which systems of theirs can run concurrently. The scheduler can't pull concurrency out of thin air. I think your example does highlight poor ergonomics, since the API presents a choice that doesn't actually change anything, but it isn't fair to say that you have to think about the unrelated systems differently.

@hymm
Copy link
Author

hymm commented Oct 8, 2021

had a conversation with Joy here on discord https://discord.com/channels/691052431525675048/895438993670410260/896074416436412456

We were able to agree that we're mostly just disagreeing on the public API for ordering systems. Where this RFC implicitly schedules sync points, Joy might prefer a more explicit API where you insert systems into hierarchical stages. Expect a competing RFC for that in the future.

@alice-i-cecile
Copy link
Member

We were able to agree that we're mostly just disagreeing on the public API for ordering systems. Where this RFC implicitly schedules sync points, Joy might prefer a more explicit API where you insert systems into hierarchical stages. Expect a competing RFC for that in the future.

I'm largely in agreement with the things said in that conversation :) My suspicion is that we're likely going to need to end up synthesizing the two perspectives, and offering an API to handle both styles.

@tigregalis
Copy link

tigregalis commented Oct 23, 2021

How might a scenario like this be addressed by this RFC? Each of these "system sets" should be able to run in parallel.

  1. Physics = fixed timestep, runs multiple times per frame = the complete game simulation, things can get spawned, despawned, and the entire simulation loops within the same frame; entities are mutually exclusive to that of UI and possibly AI
  2. UI = runs once per render frame; entities are mutually exclusive to that of Physics and AI
  3. AI = long running tasks or expensive computations such as pathfinding = may be initiated in a past frame and resolved in a future frame (e.g. on a secondary thread) = entities could be mutually exclusive to that of UI and possibly Physics

@maniwani
Copy link

maniwani commented Oct 24, 2021

Physics, UI, and AI should be able to run in parallel.

It isn't that simple. While they might not have immediate data dependencies, physics and UI systems usually have to maintain a causal relationship. E.g. you want UI to read the most up-to-date component data or you have UI systems that queue events that cause something to happen in physics.

Long running tasks or expensive computations [such as pathfinding and AI] could be initiated in one frame and blocked in a later one.

Async tasks are pretty much the only solution for long-running computations. Spawn a task, copy/clone the data you need into it (to avoid race conditions), and then have some system poll it for completion later.

(Edit: There's also dividing the amount of work into batches that can be completed each frame).

Threads should be reserved for things that can always be doing useful work or need rapid polling like rendering, inputs, and audio.

entities that could be mutually exclusive to that of UI and possibly physics

Data dependencies are defined in terms of component access. You'd need multiple worlds to isolate entities.

@alice-i-cecile
Copy link
Member

Closing out in favor of #45 for clarity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants