[Merged by Bors] - Split Component Ticks #6547

james7132 · 2022-11-11T09:57:20Z

Objective

Fixes #4884. ComponentTicks stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating something) only write out to the changed ticks.

Solution

Split the storage for change detection ticks into two separate Vecs inside Column. Fetch only what is needed during iteration.

This also potentially also removes one blocker from autovectorization of dense queries.

EDIT: This is confirmed to enable autovectorization of dense queries in for_each and par_for_each where possible. Unfortunately iter has other blockers that prevent it.

TODO

Microbenchmark
Check if this allows query iteration to autovectorize simple loops.
Clean up all of the spurious tuples now littered throughout the API

Open Questions

~~Is Mut::is_added absolutely necessary? Can we not just use Added or ChangeTrackers?~~ It's optimized out if unused.
~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is.

Changelog

Added: Tick, a wrapper around a single change detection tick.
Added: Column::get_added_ticks
Added: Column::get_column_ticks
Added: SparseSet::get_added_ticks
Added: SparseSet::get_column_ticks
Changed: Column now stores added and changed ticks separately internally.
Changed: Most APIs returning &UnsafeCell<ComponentTicks> now returns TickCells instead, which contains two separate &UnsafeCell<Tick> for either component ticks.
Changed: Query::for_each(_mut), Query::par_for_each(_mut) will now leverage autovectorization to speed up query iteration where possible.

Migration Guide

Various low level APIs interacting with the change detection ticks no longer return &UnsafeCell<ComponentTicks>, instead returning TickCells which contains two separate &UnsafeCell<Tick>s instead.

// 0.9
column.get_ticks(row).deref().changed

// 0.10
column.get_ticks(row).changed.deref()

james7132 · 2022-11-11T10:13:07Z

Quick answers:

Bad news: No this does not allow auto-vectorization.
Good news: the added ticks are not included in the final output if unused.

Confirmed by checking the same test code's output used in #6461. It's identical save for the offset calculation on the movl instruction:

This PR

.LBB1_5:
	movss	(%rdi,%rbx,4), %xmm0
	movl	%r8d, (%rsi,%rbx,4)
	addss	(%r11,%rbx,4), %xmm0
	movss	%xmm0, (%r11,%rbx,4)
	incq	%rbx
.LBB1_1:
	cmpq	%rax, %rbx
	jne	.LBB1_5
	.p2align	4, 0x90

The output from #6461

.LBB5_5:
	movss	(%rbx,%rbp,4), %xmm0
	movl	%r14d, 4(%r8,%rbp,8)
	addss	(%rdi,%rbp,4), %xmm0
	movss	%xmm0, (%rdi,%rbp,4)
	incq	%rbp
.LBB5_1:
	cmpq	%rdx, %rbp
	jne	.LBB5_5
	.p2align	4, 0x90

james7132 · 2022-11-11T13:56:13Z

Finalized a set of microbenchmarks. Looks like there's a massive improvement in iteration times where there are mutable queries, slight improvements in overall change detection, and small regressions in commands.

group                                                             main                                     split-ticks
-----                                                             ----                                     -----------
add_remove/sparse_set                                             1.00   995.6±74.34µs        ? ?/sec      1.00   998.1±62.41µs        ? ?/sec
add_remove/table                                                  1.00  1260.4±10.96µs        ? ?/sec      1.08  1367.5±26.72µs        ? ?/sec
add_remove_big/sparse_set                                         1.02  1181.0±319.71µs        ? ?/sec     1.00  1158.4±281.59µs        ? ?/sec
add_remove_big/table                                              1.00      2.4±0.03ms        ? ?/sec      1.12      2.7±0.03ms        ? ?/sec
all_added_detection/50000_entities_change_detection::Sparse       1.00   129.8±15.35µs        ? ?/sec      1.00   129.4±22.14µs        ? ?/sec
all_added_detection/50000_entities_change_detection::Table        1.00   101.4±18.29µs        ? ?/sec      1.13   114.1±18.04µs        ? ?/sec
all_added_detection/60000_entities_change_detection::Sparse       1.23   237.5±29.61µs        ? ?/sec      1.00   193.6±46.10µs        ? ?/sec
all_added_detection/60000_entities_change_detection::Table        1.00    144.1±8.31µs        ? ?/sec      1.06    152.8±4.73µs        ? ?/sec
all_changed_detection/50000_entities_change_detection::Sparse     1.01   135.0±31.82µs        ? ?/sec      1.00   133.8±30.77µs        ? ?/sec
all_changed_detection/50000_entities_change_detection::Table      1.03     97.1±4.01µs        ? ?/sec      1.00     93.9±4.29µs        ? ?/sec
all_changed_detection/60000_entities_change_detection::Sparse     1.06   187.0±35.59µs        ? ?/sec      1.00   177.1±31.78µs        ? ?/sec
all_changed_detection/60000_entities_change_detection::Table      1.00    138.4±3.56µs        ? ?/sec      1.00    138.5±3.66µs        ? ?/sec
busy_systems/01x_entities_03_systems                              1.32     31.6±0.88µs        ? ?/sec      1.00     24.0±0.91µs        ? ?/sec
busy_systems/01x_entities_06_systems                              1.25     70.6±3.67µs        ? ?/sec      1.00     56.6±4.68µs        ? ?/sec
busy_systems/01x_entities_09_systems                              1.56    127.0±7.21µs        ? ?/sec      1.00     81.5±2.37µs        ? ?/sec
busy_systems/01x_entities_12_systems                              1.57    153.1±6.28µs        ? ?/sec      1.00     97.7±2.90µs        ? ?/sec
busy_systems/01x_entities_15_systems                              1.60   186.4±12.49µs        ? ?/sec      1.00    116.8±3.90µs        ? ?/sec
busy_systems/02x_entities_03_systems                              1.53     73.2±4.99µs        ? ?/sec      1.00     47.8±2.15µs        ? ?/sec
busy_systems/02x_entities_06_systems                              1.95   163.1±14.54µs        ? ?/sec      1.00     83.6±4.61µs        ? ?/sec
busy_systems/02x_entities_09_systems                              1.53   207.6±16.86µs        ? ?/sec      1.00    135.9±6.00µs        ? ?/sec
busy_systems/02x_entities_12_systems                              1.74   287.4±16.27µs        ? ?/sec      1.00    165.1±4.98µs        ? ?/sec
busy_systems/02x_entities_15_systems                              1.69   329.7±14.64µs        ? ?/sec      1.00    195.3±5.06µs        ? ?/sec
busy_systems/03x_entities_03_systems                              1.48    101.5±6.28µs        ? ?/sec      1.00     68.7±2.51µs        ? ?/sec
busy_systems/03x_entities_06_systems                              1.59   205.8±15.66µs        ? ?/sec      1.00    129.4±8.53µs        ? ?/sec
busy_systems/03x_entities_09_systems                              1.49   308.8±15.80µs        ? ?/sec      1.00   207.7±11.45µs        ? ?/sec
busy_systems/03x_entities_12_systems                              1.68   405.0±16.62µs        ? ?/sec      1.00   240.8±10.02µs        ? ?/sec
busy_systems/03x_entities_15_systems                              1.75   514.0±15.58µs        ? ?/sec      1.00    294.0±9.37µs        ? ?/sec
busy_systems/04x_entities_03_systems                              1.69    122.3±6.27µs        ? ?/sec      1.00     72.2±3.99µs        ? ?/sec
busy_systems/04x_entities_06_systems                              1.49   305.5±18.09µs        ? ?/sec      1.00   205.1±13.88µs        ? ?/sec
busy_systems/04x_entities_09_systems                              2.06   423.4±20.03µs        ? ?/sec      1.00    205.1±7.81µs        ? ?/sec
busy_systems/04x_entities_12_systems                              1.67   537.6±23.43µs        ? ?/sec      1.00   321.1±16.07µs        ? ?/sec
busy_systems/04x_entities_15_systems                              1.85   669.0±21.67µs        ? ?/sec      1.00   360.7±13.11µs        ? ?/sec
busy_systems/05x_entities_03_systems                              1.36   147.0±10.16µs        ? ?/sec      1.00    107.9±6.56µs        ? ?/sec
busy_systems/05x_entities_06_systems                              1.57   374.1±23.16µs        ? ?/sec      1.00   238.2±14.32µs        ? ?/sec
busy_systems/05x_entities_09_systems                              1.63   507.2±32.37µs        ? ?/sec      1.00   312.0±11.55µs        ? ?/sec
busy_systems/05x_entities_12_systems                              2.07   709.1±32.12µs        ? ?/sec      1.00   342.8±12.35µs        ? ?/sec
busy_systems/05x_entities_15_systems                              1.97   810.0±41.95µs        ? ?/sec      1.00   410.4±10.81µs        ? ?/sec
contrived/01x_entities_03_systems                                 1.37     27.4±0.55µs        ? ?/sec      1.00     19.9±0.55µs        ? ?/sec
contrived/01x_entities_06_systems                                 1.37     47.3±1.35µs        ? ?/sec      1.00     34.6±1.24µs        ? ?/sec
contrived/01x_entities_09_systems                                 1.43     71.5±1.43µs        ? ?/sec      1.00     50.1±1.07µs        ? ?/sec
contrived/01x_entities_12_systems                                 1.40     93.1±1.60µs        ? ?/sec      1.00     66.6±1.62µs        ? ?/sec
contrived/01x_entities_15_systems                                 1.44    117.1±1.94µs        ? ?/sec      1.00     81.1±2.39µs        ? ?/sec
contrived/02x_entities_03_systems                                 1.47     40.4±1.50µs        ? ?/sec      1.00     27.5±0.73µs        ? ?/sec
contrived/02x_entities_06_systems                                 1.31     79.2±2.25µs        ? ?/sec      1.00     60.6±1.45µs        ? ?/sec
contrived/02x_entities_09_systems                                 1.29    109.7±2.42µs        ? ?/sec      1.00     85.2±2.15µs        ? ?/sec
contrived/02x_entities_12_systems                                 1.28    145.2±3.18µs        ? ?/sec      1.00    113.2±1.40µs        ? ?/sec
contrived/02x_entities_15_systems                                 1.29    180.4±3.33µs        ? ?/sec      1.00    139.7±2.35µs        ? ?/sec
contrived/03x_entities_03_systems                                 1.19     56.7±2.67µs        ? ?/sec      1.00     47.7±0.88µs        ? ?/sec
contrived/03x_entities_06_systems                                 1.12    108.3±6.38µs        ? ?/sec      1.00     97.0±1.81µs        ? ?/sec
contrived/03x_entities_09_systems                                 1.12    157.2±3.58µs        ? ?/sec      1.00    139.7±3.22µs        ? ?/sec
contrived/03x_entities_12_systems                                 1.08    203.1±7.54µs        ? ?/sec      1.00    188.1±3.33µs        ? ?/sec
contrived/03x_entities_15_systems                                 1.12    261.2±4.55µs        ? ?/sec      1.00    233.2±4.01µs        ? ?/sec
contrived/04x_entities_03_systems                                 1.08     67.3±1.03µs        ? ?/sec      1.00     62.6±0.89µs        ? ?/sec
contrived/04x_entities_06_systems                                 1.11    140.5±2.89µs        ? ?/sec      1.00    126.2±2.04µs        ? ?/sec
contrived/04x_entities_09_systems                                 1.13    196.7±5.23µs        ? ?/sec      1.00    174.0±2.10µs        ? ?/sec
contrived/04x_entities_12_systems                                 1.08    249.2±4.50µs        ? ?/sec      1.00    231.5±4.47µs        ? ?/sec
contrived/04x_entities_15_systems                                 1.07    306.2±6.26µs        ? ?/sec      1.00    284.8±4.49µs        ? ?/sec
contrived/05x_entities_03_systems                                 1.43     80.3±2.71µs        ? ?/sec      1.00     56.3±2.39µs        ? ?/sec
contrived/05x_entities_06_systems                                 1.53    162.0±5.56µs        ? ?/sec      1.00    105.6±3.98µs        ? ?/sec
contrived/05x_entities_09_systems                                 1.43    220.8±5.49µs        ? ?/sec      1.00    154.8±4.08µs        ? ?/sec
contrived/05x_entities_12_systems                                 1.40    269.8±9.32µs        ? ?/sec      1.00    192.7±4.09µs        ? ?/sec
contrived/05x_entities_15_systems                                 1.43   337.3±14.15µs        ? ?/sec      1.00    236.4±7.95µs        ? ?/sec
empty_commands/0_entities                                         1.01      5.3±0.05ns        ? ?/sec      1.00      5.2±0.07ns        ? ?/sec
fake_commands/2000_commands                                       1.02      7.3±0.06µs        ? ?/sec      1.00      7.2±0.08µs        ? ?/sec
fake_commands/4000_commands                                       1.01     14.5±0.09µs        ? ?/sec      1.00     14.3±0.11µs        ? ?/sec
fake_commands/6000_commands                                       1.01     21.7±0.12µs        ? ?/sec      1.00     21.5±0.08µs        ? ?/sec
fake_commands/8000_commands                                       1.00     28.7±0.15µs        ? ?/sec      1.01     29.0±0.18µs        ? ?/sec
few_changed_detection/50000_entities_change_detection::Sparse     1.49   261.7±35.23µs        ? ?/sec      1.00   175.9±36.43µs        ? ?/sec
few_changed_detection/50000_entities_change_detection::Table      1.00    129.0±4.86µs        ? ?/sec      1.03   132.7±31.12µs        ? ?/sec
few_changed_detection/60000_entities_change_detection::Sparse     1.56   301.5±44.18µs        ? ?/sec      1.00   193.1±15.92µs        ? ?/sec
few_changed_detection/60000_entities_change_detection::Table      1.27   198.2±11.89µs        ? ?/sec      1.00    155.8±3.68µs        ? ?/sec
get_or_spawn/batched                                              1.01   413.4±13.50µs        ? ?/sec      1.00   411.2±18.85µs        ? ?/sec
get_or_spawn/individual                                           1.02   737.8±63.64µs        ? ?/sec      1.00   721.7±41.04µs        ? ?/sec
heavy_compute/base                                                1.01    295.4±2.49µs        ? ?/sec      1.00    292.2±1.85µs        ? ?/sec
insert_commands/insert                                            1.00   616.8±30.12µs        ? ?/sec      1.03   635.5±35.98µs        ? ?/sec
insert_commands/insert_batch                                      1.01   418.4±24.28µs        ? ?/sec      1.00   412.5±25.14µs        ? ?/sec
insert_simple/base                                                1.00    359.5±2.71µs        ? ?/sec      1.18    425.3±5.51µs        ? ?/sec
insert_simple/unbatched                                           1.00   902.9±13.76µs        ? ?/sec      1.08   973.8±21.91µs        ? ?/sec
iter_fragmented/base                                              1.00    349.6±8.91ns        ? ?/sec      1.02    357.4±5.34ns        ? ?/sec
iter_fragmented/foreach                                           1.49   240.9±23.68ns        ? ?/sec      1.00   161.7±19.36ns        ? ?/sec
iter_fragmented/foreach_wide                                      1.00      4.0±0.12µs        ? ?/sec      1.02      4.0±0.49µs        ? ?/sec
iter_fragmented/wide                                              1.02      4.0±0.15µs        ? ?/sec      1.00      3.9±0.13µs        ? ?/sec
iter_fragmented_sparse/base                                       1.03      9.0±0.91ns        ? ?/sec      1.00      8.7±0.49ns        ? ?/sec
iter_fragmented_sparse/foreach                                    1.00      7.7±0.13ns        ? ?/sec      1.01      7.8±0.29ns        ? ?/sec
iter_fragmented_sparse/foreach_wide                               1.00     41.2±3.56ns        ? ?/sec      1.08     44.5±0.49ns        ? ?/sec
iter_fragmented_sparse/wide                                       1.04    45.8±11.87ns        ? ?/sec      1.00     44.1±1.01ns        ? ?/sec
iter_simple/base                                                  1.00      8.4±0.24µs        ? ?/sec      1.00      8.4±0.11µs        ? ?/sec
iter_simple/foreach                                               1.00      8.3±0.06µs        ? ?/sec      1.03      8.5±0.11µs        ? ?/sec
iter_simple/foreach_sparse_set                                    1.01     26.1±0.17µs        ? ?/sec      1.00     25.8±0.25µs        ? ?/sec
iter_simple/foreach_wide                                          1.03     40.0±0.28µs        ? ?/sec      1.00     38.7±1.11µs        ? ?/sec
iter_simple/foreach_wide_sparse_set                               1.02    117.0±1.69µs        ? ?/sec      1.00    115.0±0.72µs        ? ?/sec
iter_simple/sparse_set                                            1.01     28.7±0.22µs        ? ?/sec      1.00     28.5±0.18µs        ? ?/sec
iter_simple/system                                                1.00      8.3±0.13µs        ? ?/sec      1.01      8.4±0.07µs        ? ?/sec
iter_simple/wide                                                  1.06     41.6±0.74µs        ? ?/sec      1.00     39.3±0.98µs        ? ?/sec
iter_simple/wide_sparse_set                                       1.00    125.1±1.18µs        ? ?/sec      1.01    126.4±1.67µs        ? ?/sec
none_changed_detection/50000_entities_change_detection::Sparse    1.05   100.1±22.18µs        ? ?/sec      1.00     95.7±9.15µs        ? ?/sec
none_changed_detection/50000_entities_change_detection::Table     1.02     78.7±4.28µs        ? ?/sec      1.00     77.2±3.74µs        ? ?/sec
none_changed_detection/60000_entities_change_detection::Sparse    1.06   154.4±34.35µs        ? ?/sec      1.00   145.0±23.25µs        ? ?/sec
none_changed_detection/60000_entities_change_detection::Table     1.02   119.2±12.97µs        ? ?/sec      1.00    116.7±3.00µs        ? ?/sec
query_get/50000_entities_sparse                                   1.00    318.2±4.09µs        ? ?/sec      1.03   327.5±20.06µs        ? ?/sec
query_get/50000_entities_table                                    1.00    306.3±3.95µs        ? ?/sec      1.02    311.6±5.34µs        ? ?/sec
query_get_component/50000_entities_sparse                         1.00   975.1±40.73µs        ? ?/sec      1.03  1000.2±40.68µs        ? ?/sec
query_get_component/50000_entities_table                          1.05   1079.7±7.35µs        ? ?/sec      1.00  1029.7±13.87µs        ? ?/sec
query_get_component_simple/system                                 1.00    747.2±5.61µs        ? ?/sec      1.02   762.1±13.26µs        ? ?/sec
query_get_component_simple/unchecked                              1.00   862.2±15.25µs        ? ?/sec      1.13   977.4±16.50µs        ? ?/sec
query_get_many_10/50000_calls_sparse                              1.00      4.1±0.33ms        ? ?/sec      1.08      4.5±0.44ms        ? ?/sec
query_get_many_10/50000_calls_table                               1.02      4.2±0.15ms        ? ?/sec      1.00      4.1±0.16ms        ? ?/sec
query_get_many_2/50000_calls_sparse                               1.00   649.1±51.71µs        ? ?/sec      1.01   655.5±71.27µs        ? ?/sec
query_get_many_2/50000_calls_table                                1.00   707.5±40.72µs        ? ?/sec      1.00   704.4±51.65µs        ? ?/sec
query_get_many_5/50000_calls_sparse                               1.01  1964.7±99.43µs        ? ?/sec      1.00  1953.2±110.73µs        ? ?/sec
query_get_many_5/50000_calls_table                                1.01  1940.1±94.22µs        ? ?/sec      1.00  1915.9±88.00µs        ? ?/sec
run_criteria/yes_using_query/001_systems                          1.05      3.9±0.14µs        ? ?/sec      1.00      3.7±0.17µs        ? ?/sec
run_criteria/yes_using_query/006_systems                          1.00      8.5±0.31µs        ? ?/sec      1.04      8.9±0.29µs        ? ?/sec
run_criteria/yes_using_query/011_systems                          1.00     13.1±0.61µs        ? ?/sec      1.04     13.7±0.44µs        ? ?/sec
run_criteria/yes_using_query/016_systems                          1.00     18.4±0.75µs        ? ?/sec      1.03     19.0±0.75µs        ? ?/sec
run_criteria/yes_using_query/021_systems                          1.00     23.4±1.01µs        ? ?/sec      1.04     24.2±0.47µs        ? ?/sec
run_criteria/yes_using_query/026_systems                          1.00     28.5±0.80µs        ? ?/sec      1.02     29.1±0.65µs        ? ?/sec
run_criteria/yes_using_query/031_systems                          1.00     32.6±1.00µs        ? ?/sec      1.04     33.9±0.85µs        ? ?/sec
run_criteria/yes_using_query/036_systems                          1.00     37.2±1.41µs        ? ?/sec      1.05     39.2±1.12µs        ? ?/sec
run_criteria/yes_using_query/041_systems                          1.00     42.4±1.04µs        ? ?/sec      1.04     44.0±1.04µs        ? ?/sec
run_criteria/yes_using_query/046_systems                          1.00     46.1±1.34µs        ? ?/sec      1.06     48.8±1.47µs        ? ?/sec
run_criteria/yes_using_query/051_systems                          1.00     49.9±1.84µs        ? ?/sec      1.07     53.5±1.48µs        ? ?/sec
run_criteria/yes_using_query/056_systems                          1.00     54.4±2.28µs        ? ?/sec      1.07     58.4±1.72µs        ? ?/sec
run_criteria/yes_using_query/061_systems                          1.00     61.0±4.03µs        ? ?/sec      1.04     63.3±3.09µs        ? ?/sec
run_criteria/yes_using_query/066_systems                          1.00     66.7±2.69µs        ? ?/sec      1.06     71.0±2.63µs        ? ?/sec
run_criteria/yes_using_query/071_systems                          1.00     70.4±3.09µs        ? ?/sec      1.09     76.8±1.88µs        ? ?/sec
run_criteria/yes_using_query/076_systems                          1.00     76.3±2.99µs        ? ?/sec      1.07     81.7±3.53µs        ? ?/sec
run_criteria/yes_using_query/081_systems                          1.00     82.6±5.01µs        ? ?/sec      1.07     88.1±3.29µs        ? ?/sec
run_criteria/yes_using_query/086_systems                          1.00     88.1±4.59µs        ? ?/sec      1.07     94.3±4.95µs        ? ?/sec
run_criteria/yes_using_query/091_systems                          1.00     92.2±3.34µs        ? ?/sec      1.12    103.2±3.86µs        ? ?/sec
run_criteria/yes_using_query/096_systems                          1.00     96.3±5.89µs        ? ?/sec      1.13    108.8±5.45µs        ? ?/sec
run_criteria/yes_using_query/101_systems                          1.00    107.5±5.16µs        ? ?/sec      1.08    116.3±4.08µs        ? ?/sec
run_criteria/yes_using_resource/001_systems                       1.00      3.4±0.16µs        ? ?/sec      1.13      3.8±0.20µs        ? ?/sec
run_criteria/yes_using_resource/006_systems                       1.00      8.3±0.33µs        ? ?/sec      1.07      8.9±0.30µs        ? ?/sec
run_criteria/yes_using_resource/011_systems                       1.00     13.7±0.55µs        ? ?/sec      1.01     13.8±0.61µs        ? ?/sec
run_criteria/yes_using_resource/016_systems                       1.00     18.5±0.71µs        ? ?/sec      1.05     19.4±0.73µs        ? ?/sec
run_criteria/yes_using_resource/021_systems                       1.00     23.2±0.91µs        ? ?/sec      1.04     24.2±0.94µs        ? ?/sec
run_criteria/yes_using_resource/026_systems                       1.00     28.2±0.97µs        ? ?/sec      1.03     28.9±0.88µs        ? ?/sec
run_criteria/yes_using_resource/031_systems                       1.00     33.5±0.62µs        ? ?/sec      1.02     34.1±0.98µs        ? ?/sec
run_criteria/yes_using_resource/036_systems                       1.00     37.9±0.89µs        ? ?/sec      1.02     38.8±1.27µs        ? ?/sec
run_criteria/yes_using_resource/041_systems                       1.00     42.3±0.94µs        ? ?/sec      1.03     43.7±1.38µs        ? ?/sec
run_criteria/yes_using_resource/046_systems                       1.00     47.3±0.92µs        ? ?/sec      1.00     47.4±3.94µs        ? ?/sec
run_criteria/yes_using_resource/051_systems                       1.00     51.8±1.61µs        ? ?/sec      1.03     53.6±2.16µs        ? ?/sec
run_criteria/yes_using_resource/056_systems                       1.00     56.5±1.75µs        ? ?/sec      1.04     58.8±2.68µs        ? ?/sec
run_criteria/yes_using_resource/061_systems                       1.00     61.5±1.63µs        ? ?/sec      1.07     65.7±1.96µs        ? ?/sec
run_criteria/yes_using_resource/066_systems                       1.00     68.2±1.95µs        ? ?/sec      1.04     70.9±2.98µs        ? ?/sec
run_criteria/yes_using_resource/071_systems                       1.00     72.9±2.42µs        ? ?/sec      1.05     76.3±2.99µs        ? ?/sec
run_criteria/yes_using_resource/076_systems                       1.00     76.9±2.67µs        ? ?/sec      1.06     81.2±4.19µs        ? ?/sec
run_criteria/yes_using_resource/081_systems                       1.00     80.8±4.26µs        ? ?/sec      1.09     88.3±3.82µs        ? ?/sec
run_criteria/yes_using_resource/086_systems                       1.00     88.4±4.22µs        ? ?/sec      1.08     95.5±3.37µs        ? ?/sec
run_criteria/yes_using_resource/091_systems                       1.00     95.9±3.41µs        ? ?/sec      1.05    100.6±4.76µs        ? ?/sec
run_criteria/yes_using_resource/096_systems                       1.00    102.5±4.60µs        ? ?/sec      1.05    108.0±3.88µs        ? ?/sec
run_criteria/yes_using_resource/101_systems                       1.00    106.8±4.89µs        ? ?/sec      1.11    118.0±4.52µs        ? ?/sec
sized_commands_0_bytes/2000_commands                              1.00      5.1±0.03µs        ? ?/sec      1.09      5.6±0.07µs        ? ?/sec
sized_commands_0_bytes/4000_commands                              1.00     10.2±0.10µs        ? ?/sec      1.10     11.3±0.10µs        ? ?/sec
sized_commands_0_bytes/6000_commands                              1.00     15.2±0.11µs        ? ?/sec      1.10     16.8±0.10µs        ? ?/sec
sized_commands_0_bytes/8000_commands                              1.00     20.3±0.21µs        ? ?/sec      1.12     22.7±0.29µs        ? ?/sec
sized_commands_12_bytes/2000_commands                             1.02      7.4±0.09µs        ? ?/sec      1.00      7.2±0.08µs        ? ?/sec
sized_commands_12_bytes/4000_commands                             1.00     14.7±0.08µs        ? ?/sec      1.00     14.6±0.12µs        ? ?/sec
sized_commands_12_bytes/6000_commands                             1.00     22.1±0.10µs        ? ?/sec      1.00     22.0±0.18µs        ? ?/sec
sized_commands_12_bytes/8000_commands                             1.00     29.5±0.14µs        ? ?/sec      1.00     29.4±0.20µs        ? ?/sec
sized_commands_512_bytes/2000_commands                            1.00     51.7±1.68µs        ? ?/sec      1.05     54.1±2.70µs        ? ?/sec
sized_commands_512_bytes/4000_commands                            1.00    105.9±9.28µs        ? ?/sec      1.05    110.8±7.73µs        ? ?/sec
sized_commands_512_bytes/6000_commands                            1.00   162.1±20.46µs        ? ?/sec      1.04   169.2±22.51µs        ? ?/sec
sized_commands_512_bytes/8000_commands                            1.00   219.2±36.11µs        ? ?/sec      1.04   228.5±33.18µs        ? ?/sec
spawn_commands/2000_entities                                      1.00    184.7±5.82µs        ? ?/sec      1.05    193.2±5.30µs        ? ?/sec
spawn_commands/4000_entities                                      1.00   368.7±12.10µs        ? ?/sec      1.05   385.7±13.76µs        ? ?/sec
spawn_commands/8000_entities                                      1.00   754.7±26.15µs        ? ?/sec      1.03   777.9±21.58µs        ? ?/sec
spawn_world/10000_entities                                        1.00  1023.4±80.40µs        ? ?/sec      1.02  1040.3±84.50µs        ? ?/sec
spawn_world/1000_entities                                         1.00    101.8±8.01µs        ? ?/sec      1.04    106.4±9.10µs        ? ?/sec
spawn_world/100_entities                                          1.00     10.3±0.92µs        ? ?/sec      1.02     10.5±0.92µs        ? ?/sec
spawn_world/10_entities                                           1.00  1020.8±85.81ns        ? ?/sec      1.02  1043.8±99.74ns        ? ?/sec
world_entity/50000_entities                                       1.00     94.8±0.87µs        ? ?/sec      1.00     95.1±0.86µs        ? ?/sec
world_get/50000_entities_sparse                                   1.00    353.7±1.78µs        ? ?/sec      1.03    363.7±7.35µs        ? ?/sec
world_get/50000_entities_table                                    1.03    386.8±7.23µs        ? ?/sec      1.00    376.8±4.97µs        ? ?/sec
world_query_for_each/50000_entities_sparse                        1.01     47.9±0.63µs        ? ?/sec      1.00     47.6±0.27µs        ? ?/sec
world_query_for_each/50000_entities_table                         1.00     27.2±0.30µs        ? ?/sec      1.00     27.3±0.18µs        ? ?/sec
world_query_get/50000_entities_sparse_wide                        1.00    192.7±1.03µs        ? ?/sec      1.01    195.5±1.24µs        ? ?/sec
world_query_get/50000_entities_table                              1.00    137.2±3.03µs        ? ?/sec      1.00    137.6±0.85µs        ? ?/sec
world_query_get/50000_entities_table_wide                         1.00    242.7±1.90µs        ? ?/sec      1.01    245.0±4.77µs        ? ?/sec
world_query_iter/50000_entities_sparse                            1.00     54.1±0.34µs        ? ?/sec      1.01     54.5±0.33µs        ? ?/sec
world_query_iter/50000_entities_table                             1.00     27.3±0.19µs        ? ?/sec      1.00     27.3±0.79µs        ? ?/sec

james7132 · 2022-11-11T14:20:47Z

Sans the API surface, this is ready for review.

crates/bevy_ecs/src/change_detection.rs

crates/bevy_ecs/src/query/fetch.rs

crates/bevy_ecs/src/storage/sparse_set.rs

alice-i-cecile · 2022-11-12T04:11:55Z

crates/bevy_ecs/src/storage/table.rs

+    }
+
+    #[inline]
+    pub fn get_ticks(&self, row: usize) -> Option<ComponentTicks> {


Comments on these methods about how the row values works and why this might return None would be nice. Won't block on it though; this is no worse than before.

I'll go ahead and make a separate PR for this.

crates/bevy_ecs/src/world/mod.rs

james7132 · 2022-11-12T12:50:15Z

Did some further investigation on why the perf difference grew bigger between for_each and iter. for_each will autovectorize if possible, while something else in iter is blocking the same optimization.

The following is the same hot section of code in the original post, but in for_each instead of iter. Note it's use of %xmm* (SSE) registers, and addps a 4xf32 SIMD instruction.

.LBB2_13:
	movups	(%rdi,%rax,4), %xmm1
	movups	16(%rdi,%rax,4), %xmm2
	movdqu	%xmm0, (%rsi,%rax,4)
	movdqu	%xmm0, 16(%rsi,%rax,4)
	movups	(%rdx,%rax,4), %xmm3
	addps	%xmm1, %xmm3
	movups	16(%rdx,%rax,4), %xmm1
	addps	%xmm2, %xmm1
	movups	%xmm3, (%rdx,%rax,4)
	movups	%xmm1, 16(%rdx,%rax,4)
	movups	32(%rdi,%rax,4), %xmm1
	movups	48(%rdi,%rax,4), %xmm2
	movdqu	%xmm0, 32(%rsi,%rax,4)
	movdqu	%xmm0, 48(%rsi,%rax,4)
	movups	32(%rdx,%rax,4), %xmm3
	addps	%xmm1, %xmm3
	movups	48(%rdx,%rax,4), %xmm1
	addps	%xmm2, %xmm1
	movups	%xmm3, 32(%rdx,%rax,4)
	movups	%xmm1, 48(%rdx,%rax,4)
	addq	$16, %rax
	addq	$-2, %rcx
	jne	.LBB2_13
	testb	$1, %r15b
	je	.LBB2_16

Diddykonga

LGTM The Split makes sense.

Diddykonga · 2022-11-14T02:00:08Z

crates/bevy_ecs/src/component.rs

+    /// component_ticks.set_changed(world.read_change_tick());
+    /// ```
+    #[inline]
+    pub fn set_changed(&mut self, change_tick: u32) {


Is this needed, what if it is an Added Tick?
Also the Comments indirectly use the method they are documenting.

I would prefer direct access of the field.

Co-authored-by: PROMETHIA-27 <[email protected]>

maniwani

LGTM. Very nice.

james7132 · 2022-11-14T08:01:32Z

bors try

bors · 2022-11-14T08:17:02Z

try

Build succeeded:

alice-i-cecile · 2022-11-21T12:55:48Z

bors r+

# Objective Fixes #4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks. ## Solution Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration. This also potentially also removes one blocker from autovectorization of dense queries. EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each` where possible. Unfortunately `iter` has other blockers that prevent it. ### TODO - [x] Microbenchmark - [x] Check if this allows query iteration to autovectorize simple loops. - [x] Clean up all of the spurious tuples now littered throughout the API ### Open Questions - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused. - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is. --- ## Changelog Added: `Tick`, a wrapper around a single change detection tick. Added: `Column::get_added_ticks` Added: `Column::get_column_ticks` Added: `SparseSet::get_added_ticks` Added: `SparseSet::get_column_ticks` Changed: `Column` now stores added and changed ticks separately internally. Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks. Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible. ## Migration Guide TODO

bors · 2022-11-21T13:19:25Z

Pull request successfully merged into main.

Build succeeded:

# Objective #6547 accidentally broke change detection for SparseSet components by using `Ticks::from_tick_cells` with the wrong argument order. ## Solution Use the right argument order. Add a regression test.

# Objective Fixes bevyengine#4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks. ## Solution Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration. This also potentially also removes one blocker from autovectorization of dense queries. EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each` where possible. Unfortunately `iter` has other blockers that prevent it. ### TODO - [x] Microbenchmark - [x] Check if this allows query iteration to autovectorize simple loops. - [x] Clean up all of the spurious tuples now littered throughout the API ### Open Questions - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused. - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is. --- ## Changelog Added: `Tick`, a wrapper around a single change detection tick. Added: `Column::get_added_ticks` Added: `Column::get_column_ticks` Added: `SparseSet::get_added_ticks` Added: `SparseSet::get_column_ticks` Changed: `Column` now stores added and changed ticks separately internally. Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks. Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible. ## Migration Guide TODO

# Objective bevyengine#6547 accidentally broke change detection for SparseSet components by using `Ticks::from_tick_cells` with the wrong argument order. ## Solution Use the right argument order. Add a regression test.

# Objective Fixes bevyengine#4884. `ComponentTicks` stores both added and changed ticks contiguously in the same 8 bytes. This is convenient when passing around both together, but causes half the bytes fetched from memory for the purposes of change detection to effectively go unused. This is inefficient when most queries (no filter, mutating *something*) only write out to the changed ticks. ## Solution Split the storage for change detection ticks into two separate `Vec`s inside `Column`. Fetch only what is needed during iteration. This also potentially also removes one blocker from autovectorization of dense queries. EDIT: This is confirmed to enable autovectorization of dense queries in `for_each` and `par_for_each` where possible. Unfortunately `iter` has other blockers that prevent it. ### TODO - [x] Microbenchmark - [x] Check if this allows query iteration to autovectorize simple loops. - [x] Clean up all of the spurious tuples now littered throughout the API ### Open Questions - ~~Is `Mut::is_added` absolutely necessary? Can we not just use `Added` or `ChangeTrackers`?~~ It's optimized out if unused. - ~~Does the fetch of the added ticks get optimized out if not used?~~ Yes it is. --- ## Changelog Added: `Tick`, a wrapper around a single change detection tick. Added: `Column::get_added_ticks` Added: `Column::get_column_ticks` Added: `SparseSet::get_added_ticks` Added: `SparseSet::get_column_ticks` Changed: `Column` now stores added and changed ticks separately internally. Changed: Most APIs returning `&UnsafeCell<ComponentTicks>` now returns `TickCells` instead, which contains two separate `&UnsafeCell<Tick>` for either component ticks. Changed: `Query::for_each(_mut)`, `Query::par_for_each(_mut)` will now leverage autovectorization to speed up query iteration where possible. ## Migration Guide TODO

# Objective bevyengine#6547 accidentally broke change detection for SparseSet components by using `Ticks::from_tick_cells` with the wrong argument order. ## Solution Use the right argument order. Add a regression test.

@cart

## How This Works For the Bevy 0.10 release blog post (and for the first time ever), I'm publicly opening the doors to other people writing blog post sections. Specifically, if you worked on a feature in a substantial way and are interested in presenting it, you can now ask to claim a section by leaving a comment in this PR. If you claim a section, submit a pull request to the `release-0.10.0` branch in this repo. For the next week, we will be filling in sections (the release target is Saturday March 4th). Please don't claim a section if you don't plan on completing it within that timeline. Also don't claim a section if you weren't an active participant in the design and implementation of the change (unless you are a Maintainer or SME). I will claim any unclaimed sections. Try to match the style of previous release blog posts as much as possible. 1. Show, don't tell. Don't bombard people with information. Avoid large walls of text _and_ large walls of code. Prefer the pattern "byte sized description of one thing" -> "example code/picture/video contextualizing that one thing" -> repeat. Take readers on a journey step by simple step. 2. Don't use up reader's "mental bandwidth" without good reason. We can't afford page-long descriptions of minor bug fixes. If it isn't a "headliner change", keep the description short and sweet. If a change is self describing, let it do that (ex: We now support this new mesh shape primitive ... this is what it looks like). If it is a "headliner change", still try to keep it reasonable. We always have a lot to cover. 3. In slight competition with point (2), don't omit interesting technical information when it is truly fun and engaging. A good chunk of our users are highly technical and enjoy learning how the sausage is made. Try to strike a balance between "terse and simple" and "nerdy details". 4. When relevant, briefly describe the problem being solved first, then describe the solution we chose. This contextualizes the change and gives the feature value and purpose. 5. When possible, provide visuals. They create interest / keep people hooked / break up the monotony. 6. Record images and videos at the default bevy resolution (1280x720) 7. Provide an accurate listing of authors that meaningfully contributed to the feature. Try to sort in order of "contribution scale". This is hard to define, but try to be fair. When in doubt, ask other contributors, SMEs, and/or maintainers. 8. Provide numbers and graphs where possible. If something is faster, use numbers to back it up. We don't (yet) have automated graph generation in blog post style, so send data / info to me (@cart) if you want a graph made. ## Headliners Headliners are our "big ticket high importance / high profile" changes. They are listed briefly at the beginning of the blog post, their entries are roughly sorted "to the top", and they are given priority when it comes to "space in the blog post". If you think we missed something (or didn't prioritize something appropriately), let us know. * ECS Schedule v3 (previously known as "stageless") * Partial Android Support * Depth and Normal Prepass * Environment Map Lighting * Cascaded Shadow Maps * Distance and Atmospheric Fog * Smooth Skeletal Animation Transitions * Enable Parallel Pipelined Rendering * Windows as Entities * Renderer Optimizations * ECS Optimizations ## Sections These are the sections we will cover in the blog post. If a section has been claimed, it will have `(claimed by X)` in the title. If it is unclaimed it will have `(unclaimed)` in the title. Let us know if we missed a section. We don't cover every feature, but we should cover pretty much everything that would be interesting to users. Note that what is interesting or challenging to implement is not necessarily something that is relevant to our blog post readers. And sometimes the reverse is true! If you believe a section should be split up or reorganized, just bring it up here and we can discuss it. ### ~~Schedule V3 (claimed by @alice-i-cecile)~~ * [Migrate engine to Schedule v3][7267] * [Add `bevy_ecs::schedule_v3` module][6587] * [Stageless: fix unapplied systems][7446] * [Stageless: move final apply outside of spawned executor][7445] * Sets * Base Sets * [Base Sets][7466] * Reporting * [Report sets][7756] * [beter cycle reporting][7463] * Run Conditions * [Add condition negation][7559] * [And/Or][7605] * [Add more common run conditions][7579] * States * [States derive macro][7535] * System Piping Flexibility * [Support piping exclusive systems][7023] * [Allow piping run conditions][7547] ### ~~Depth and Normal Prepass (claimed by @IceSentry)~~ * [Add depth and normal prepass][6284] * [Move prepass functions to prepass_utils][7354] ### ~~Distance and Atmospheric Fog (claimed by @coreh)~~ * [Add Distance and Atmospheric Fog support][6412] ### ~~Cascaded Shadow Maps (claimed by @cart)~~ * [Cascaded shadow maps.][7064] * [Better cascades config defaults + builder, tweak example configs][7456] ### ~~Environment Map Lighting (claimed by @cart)~~ * [EnvironmentMapLight, BRDF Improvements][7051] * [Webgl2 support][7737] ### ~~Tonemapping options (claimed by @cart)~~ * [Initial tonemapping options][7594] ### ~~Android support + unification (claimed by @mockersf)~~ * [IOS, Android... same thing][7493] ### ~~Windows as Entities (claimed by @Aceeri)~~ * [Windows as Entities][5589] * [break feedback loop when moving cursor][7298] * [Fix `Window` feedback loop between the OS and Bevy][7517] ### ~~Enable Parallel Pipelined Rendering (claimed by @james7132)~~ * [Pipelined Rendering][6503] * [Stageless: add a method to scope to always run a task on the scope thread][7415] * [Separate Extract from Sub App Schedule][7046] ### ~~Smooth Skeletal Animation Transitions (claimed by @james7132)~~ * [Smooth Transition between Animations][6922] ### ~~Spatial Audio (claimed by @harudagondi)~~ * [Spatial Audio][6028] ### ~~Shader Processor Features (claimed by @cart)~~ * [Shader defs can now have a value][5900] * [Shaders can now have #else ifdef chains][7431] * [Define shader defs in shader][7518] ### ~~Shader Flexibility Improvements (claimed by @cart)~~ * [add ambient lighting hook][5428] * [Refactor Globals and View structs into separate shaders][7512] ### ~~Renderer Optimizations (claimed by @james7132)~~ * [bevy_pbr: Avoid copying structs and using registers in shaders][7069] * [Flatten render commands][6885] * [Replace UUID based IDs with a atomic-counted ones][6988] * [improve compile time by type-erasing wgpu structs][5950] * [Shrink DrawFunctionId][6944] * [Shrink ComputedVisibility][6305] * [Reduce branching in TrackedRenderPass][7053] * [Make PipelineCache internally mutable.][7205] * [Improve `Color::hex` performance][6940] * [Support recording multiple CommandBuffers in RenderContext][7248] * [Parallelized transform propagation][4775] * [Introduce detailed_trace macro, use in TrackedRenderPass][7639] * [Optimize color computation in prepare_uinodes][7311] * [Directly extract joints into SkinnedMeshJoints][6833] * [Parallelize forward kinematics animation systems][6785] * [Move system_commands spans into apply_buffers][6900] * [Reduce the use of atomics in the render phase][7084] ### ~~ECS Optimizations (claimed by @james7132 )~~ * [Remove redundant table and sparse set component IDs from Archetype][4927] * [Immutable sparse sets for metadata storage][4928] * [Replace BlobVec's swap_scratch with a swap_nonoverlapping][4853] * [Use T::Storage::STORAGE_TYPE to optimize out unused branches][6800] * [Remove unnecessary branching from bundle insertion][6902] * [Split Component Ticks][6547] * [use bevy_utils::HashMap for better performance. TypeId is predefined …][7642] * [Extend EntityLocation with TableId and TableRow][6681] * [Basic adaptive batching for parallel quer- [Speed up `CommandQueue` by storing commands more densely][6391]y iteration][4777] ### ~~Reflect Improvements (claimed by @cart)~~ * [bevy_reflect: Add `ReflectFromReflect` (v2)][6245] * [Add reflection support for VecDeque][6831] * [reflect: add `insert` and `remove` methods to `List`][7063] * [Add `remove` method to `Map` reflection trait.][6564] * [bevy_reflect: Fix binary deserialization not working for unit structs][6722] * [Add `TypeRegistrationDeserializer` and remove `BorrowedStr`][7094] * [bevy_reflect: Add simple enum support to reflection paths][6560] * [Enable deriving Reflect on structs with generic types][7364] * [bevy_reflect: Support tuple reflection paths][7324] * [bevy_reflect: Pre-parsed paths][7321] * [bevy_ecs: ReflectComponentFns without World][7206] ### ~~AsBindGroup Improvements (claimed by @cart)~~ * [Support storage buffers in derive `AsBindGroup`][6129] * [Support raw buffers in AsBindGroup][7701] ### ~~Cylinder Shape (claimed by @cart)~~ * [Add cylinder shape][6809] ### ~~Subdividable Plane Shape (claimed by @cart)~~ * [added subdivisions to shape::Plane][7546] ### ~~StandardMaterial Blend Modes (claimed by @coreh)~~ * [Standard Material Blend Modes][6644] ### ~~Configurable Visibility Component (claimed by @cart)~~ * [enum `Visibility` component][6320] ### Task Improvements (claimed by @cart) * [Fix panicking on another scope][6524] * [Add thread create/destroy callbacks to TaskPool][6561] * [Thread executor for running tasks on specific threads.][7087] * [await tasks to cancel][6696] * [Stageless: move MainThreadExecutor to schedule_v3][7444] * [Stageless: close the finish channel so executor doesn't deadlock][7448] ### ~~Upgrade to wgpu 0.15 (claimed by @cart)~~ * [Wgpu 0.15][7356] ### ~~Expose Bindless / Non-uniform Indexing Support (claimed by @cart)~~ * [Request WGPU Capabilities for Non-uniform Indexing][6995] ### ~~Cubic Spline (claimed by @aevyrie)~~ * [Bezier][7653] ### ~~Revamp Bloom (claimed by @JMS55)~~ * [Revamp bloom](bevyengine/bevy#6677) ### ~~Use Prepass Shaders for Shadows (claimed by @superdump)~~ * [use prepass shaders for shadows](bevyengine/bevy#7784) ### ~~AccessKit (claimed by @alice-i-cecile)~~ * [accesskit](bevyengine/bevy#6874) ### ~~Camera Output Modes (claimed by @cart)~~ * [camera output modes](bevyengine/bevy#7671) ### ~~SystemParam Improvements (claimed by @JoJoJet)~~ * [Make the `SystemParam` derive macro more flexible][6694] * [Add a `SystemParam` primitive for deferred mutations; allow `#[derive]`ing more types of SystemParam][6817] ### ~~Gamepad Improvements (claimed by @cart)~~ * [Gamepad events refactor][6965] * [add `Axis::devices` to get all the input devices][5400] ### ~~Input Methods (claimed by @cart)~~ * [add Input Method Editor support][7325] ### ~~Color Improvements (claimed by @cart)~~ * [Add LCH(ab) color space to `bevy_render::color::Color`][7483] * [Add a more familiar hex color entry][7060] ### ~~Split Up CorePlugin (claimed by @cart)~~ * [Break `CorePlugin` into `TaskPoolPlugin`, `TypeRegistrationPlugin`, `FrameCountPlugin`.][7083] ### ~~ExtractComponent Derive (claimed by @cart)~~ * [Extract component derive][7399] ### ~~Added OpenGL and DX11 Backends By Default (claimed by @cart)~~ * [add OpenGL and DX11 backends][7481] ### ~~UnsafeWorldCell (claimed by @BoxyUwU)~~ * [Move all logic to `UnsafeWorldCell`][7381] * [Rename `UnsafeWorldCellEntityRef` to `UnsafeEntityCell`][7568] ### ~~Entity Commands (claimed by @cart)~~ * [Add a trait for commands that run for a given `Entity`][7015] * [Add an extension trait to `EntityCommands` to update hierarchy while preserving `GlobalTransform`][7024] * [Add ReplaceChildren and ClearChildren EntityCommands][6035] ### ~~Iterate EntityRef (claimed by @james7132)~~ * [Allow iterating over with EntityRef over the entire World][6843] ### ~~Ref Queries (@JoJoJet)~~ * [Added Ref to allow immutable access with change detection][7097] ### ~~Taffy Upgrade (claimed by @cart)~~ * [Upgrade to Taffy 0.2][6743] ### ~~Relative Cursor Position (claimed by @cart)~~ * [Relative cursor position][7199] ### ~~Const UI Config (claimed by @cart)~~ * [Add const to methods and const defaults to bevy_ui][5542] ### ~~Examples (claimed by @cart)~~ * [Add pixelated Bevy to assets and an example][6408] * [Organized scene_viewer into plugins for reuse and organization][6936] ### ~~CI Improvements (claimed by @cart)~~ * [add rust-version for MSRV and CI job to check][6852] * [msrv: only send a message on failure during the actual msrv part][7532] * [Make CI friendlier][7398] * [Fix CI welcome message][7428] * [add an action to ask for a migration guide when one is missing][7507] ### ~~SMEs (@cart)~~ This was already covered in another blog post. Just briefly call out what they are and that this is the first release that used them. Link to the other blog post. * [Subject Matter Experts and new Bevy Org docs][7185] [4775]: bevyengine/bevy#4775 [4777]: bevyengine/bevy#4777 [4853]: bevyengine/bevy#4853 [4927]: bevyengine/bevy#4927 [4928]: bevyengine/bevy#4928 [5400]: bevyengine/bevy#5400 [5428]: bevyengine/bevy#5428 [5542]: bevyengine/bevy#5542 [5589]: bevyengine/bevy#5589 [5900]: bevyengine/bevy#5900 [5950]: bevyengine/bevy#5950 [6028]: bevyengine/bevy#6028 [6035]: bevyengine/bevy#6035 [6129]: bevyengine/bevy#6129 [6179]: bevyengine/bevy#6179 [6245]: bevyengine/bevy#6245 [6284]: bevyengine/bevy#6284 [6305]: bevyengine/bevy#6305 [6320]: bevyengine/bevy#6320 [6391]: bevyengine/bevy#6391 [6408]: bevyengine/bevy#6408 [6412]: bevyengine/bevy#6412 [6503]: bevyengine/bevy#6503 [6524]: bevyengine/bevy#6524 [6547]: bevyengine/bevy#6547 [6557]: bevyengine/bevy#6557 [6560]: bevyengine/bevy#6560 [6561]: bevyengine/bevy#6561 [6564]: bevyengine/bevy#6564 [6587]: bevyengine/bevy#6587 [6644]: bevyengine/bevy#6644 [6649]: bevyengine/bevy#6649 [6681]: bevyengine/bevy#6681 [6694]: bevyengine/bevy#6694 [6696]: bevyengine/bevy#6696 [6722]: bevyengine/bevy#6722 [6743]: bevyengine/bevy#6743 [6785]: bevyengine/bevy#6785 [6800]: bevyengine/bevy#6800 [6802]: bevyengine/bevy#6802 [6809]: bevyengine/bevy#6809 [6817]: bevyengine/bevy#6817 [6831]: bevyengine/bevy#6831 [6833]: bevyengine/bevy#6833 [6843]: bevyengine/bevy#6843 [6852]: bevyengine/bevy#6852 [6885]: bevyengine/bevy#6885 [6900]: bevyengine/bevy#6900 [6902]: bevyengine/bevy#6902 [6922]: bevyengine/bevy#6922 [6926]: bevyengine/bevy#6926 [6936]: bevyengine/bevy#6936 [6940]: bevyengine/bevy#6940 [6944]: bevyengine/bevy#6944 [6965]: bevyengine/bevy#6965 [6988]: bevyengine/bevy#6988 [6995]: bevyengine/bevy#6995 [7015]: bevyengine/bevy#7015 [7023]: bevyengine/bevy#7023 [7024]: bevyengine/bevy#7024 [7046]: bevyengine/bevy#7046 [7051]: bevyengine/bevy#7051 [7053]: bevyengine/bevy#7053 [7060]: bevyengine/bevy#7060 [7063]: bevyengine/bevy#7063 [7064]: bevyengine/bevy#7064 [7069]: bevyengine/bevy#7069 [7083]: bevyengine/bevy#7083 [7084]: bevyengine/bevy#7084 [7087]: bevyengine/bevy#7087 [7094]: bevyengine/bevy#7094 [7097]: bevyengine/bevy#7097 [7185]: bevyengine/bevy#7185 [7199]: bevyengine/bevy#7199 [7205]: bevyengine/bevy#7205 [7206]: bevyengine/bevy#7206 [7248]: bevyengine/bevy#7248 [7267]: bevyengine/bevy#7267 [7298]: bevyengine/bevy#7298 [7311]: bevyengine/bevy#7311 [7321]: bevyengine/bevy#7321 [7324]: bevyengine/bevy#7324 [7325]: bevyengine/bevy#7325 [7354]: bevyengine/bevy#7354 [7356]: bevyengine/bevy#7356 [7364]: bevyengine/bevy#7364 [7381]: bevyengine/bevy#7381 [7398]: bevyengine/bevy#7398 [7399]: bevyengine/bevy#7399 [7415]: bevyengine/bevy#7415 [7428]: bevyengine/bevy#7428 [7431]: bevyengine/bevy#7431 [7444]: bevyengine/bevy#7444 [7445]: bevyengine/bevy#7445 [7446]: bevyengine/bevy#7446 [7448]: bevyengine/bevy#7448 [7456]: bevyengine/bevy#7456 [7463]: bevyengine/bevy#7463 [7466]: bevyengine/bevy#7466 [7481]: bevyengine/bevy#7481 [7483]: bevyengine/bevy#7483 [7493]: bevyengine/bevy#7493 [7507]: bevyengine/bevy#7507 [7510]: bevyengine/bevy#7510 [7512]: bevyengine/bevy#7512 [7517]: bevyengine/bevy#7517 [7518]: bevyengine/bevy#7518 [7532]: bevyengine/bevy#7532 [7535]: bevyengine/bevy#7535 [7546]: bevyengine/bevy#7546 [7547]: bevyengine/bevy#7547 [7559]: bevyengine/bevy#7559 [7568]: bevyengine/bevy#7568 [7579]: bevyengine/bevy#7579 [7594]: bevyengine/bevy#7594 [7605]: bevyengine/bevy#7605 [7639]: bevyengine/bevy#7639 [7642]: bevyengine/bevy#7642 [7653]: bevyengine/bevy#7653 [7701]: bevyengine/bevy#7701 [7737]: bevyengine/bevy#7737 [7756]: bevyengine/bevy#7756 Co-authored-by: François <[email protected]> Co-authored-by: Alice Cecile <[email protected]> Co-authored-by: Mike <[email protected]> Co-authored-by: Boxy <[email protected]> Co-authored-by: IceSentry <[email protected]> Co-authored-by: JoJoJet <[email protected]> Co-authored-by: Aevyrie <[email protected]> Co-authored-by: James Liu <[email protected]> Co-authored-by: Marco Buono <[email protected]> Co-authored-by: Aceeri <[email protected]>

… Iterator combinators (#6773) # Objective After #6547, `Query::for_each` has been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However, `Query::for_each` isn't idiomatic Rust, and lacks the flexibility of iterator combinators. Ideally, `Query::iter` and friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations. ## Solution This is an intermediate solution and refactor. This moves the `Query::for_each` implementation onto the `Iterator::fold` implementation for `QueryIter` instead. This should result in the same automatic vectorization optimization on all `Iterator` functions that internally use fold, including `Iterator::for_each`, `Iterator::count`, etc. With this, it should close the gap between the two completely. Internally, this PR changes `Query::for_each` to use `query.iter().for_each(..)` instead of the duplicated implementation. Separately, the duplicate implementations of internal iteration (i.e. `Query::par_for_each`) now use portions of the current `Query::for_each` implementation factored out into their own functions. This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in `for_each` and `par_iter().for_each()`. --- ## Changelog Changed: `Query::for_each`, `Query::for_each_mut`, `Query::for_each`, and `Query::for_each_mut` have been moved to `QueryIter`'s `Iterator::for_each` implementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.13 and will be removed in 0.14. --------- Co-authored-by: JoJoJet <[email protected]> Co-authored-by: Alice Cecile <[email protected]>

james7132 added 7 commits November 10, 2022 11:53

Split ComponentTicks internally

e2eb78a

Introduce split ticks into Column

3560c20

Split ticks in SparseSet

e21916a

Bubble up splitting

4f1e425

Remove Column::ticks

b356548

Fix CI

1c8d242

Only fetch relevant ticks for filters

62baac2

james7132 requested a review from maniwani November 11, 2022 09:58

james7132 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 11, 2022

james7132 added 2 commits November 11, 2022 03:13

Slim down initialize

e380e0d

Formatting

51d10f0

Fix CI

956f226

james7132 marked this pull request as ready for review November 11, 2022 14:20

james7132 requested a review from alice-i-cecile November 11, 2022 14:21

james7132 mentioned this pull request Nov 11, 2022

Use u64 for change ticks #6327

Closed

alice-i-cecile reviewed Nov 12, 2022

View reviewed changes

crates/bevy_ecs/src/change_detection.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Nov 12, 2022

View reviewed changes

crates/bevy_ecs/src/query/fetch.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Nov 12, 2022

View reviewed changes

crates/bevy_ecs/src/storage/sparse_set.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Nov 12, 2022

View reviewed changes

crates/bevy_ecs/src/world/mod.rs Outdated Show resolved Hide resolved

james7132 added 4 commits November 12, 2022 05:05

is_changed -> is_older_than

ea6065d

Split ChangeTrackers table ticks

0484924

Tuples to TickCells

79056da

Add and use Ticks::from_tick_cells

e82ba14

Diddykonga approved these changes Nov 14, 2022

View reviewed changes

Document Ticks

8ec9a1a

Co-authored-by: PROMETHIA-27 <[email protected]>

james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Nov 14, 2022

maniwani approved these changes Nov 14, 2022

View reviewed changes

Cleanup filters

e63ca26

bors bot added a commit that referenced this pull request Nov 14, 2022

Try #6547:

e956e81

james7132 added this to the 0.10 milestone Nov 15, 2022

This was referenced Nov 15, 2022

Yeet for_each #4060

Closed

Opt out change detection #6659

Closed

bors bot changed the title ~~Split Component Ticks~~ [Merged by Bors] - Split Component Ticks Nov 21, 2022

bors bot closed this Nov 21, 2022

This was referenced Nov 23, 2022

Use 64 bit for world and system tick index #6651

Closed

Override QueryIter::fold to port Query::for_each perf gains to select Iterator combinators #6773

Merged

james7132 mentioned this pull request Dec 9, 2022

[Merged by Bors] - Fix Sparse Change Detection #6896

Closed

cart mentioned this pull request Feb 24, 2023

[Merged by Bors] - News: Release 0.10.0 bevyengine/bevy-website#546

Closed

Shatur mentioned this pull request May 7, 2023

Added/Changed detection JoJoJet/bevy-trait-query#30

Closed

2 tasks

SkiFire13 mentioned this pull request Nov 30, 2023

Reduced TableRow as Casting #10811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Split Component Ticks #6547

[Merged by Bors] - Split Component Ticks #6547

james7132 commented Nov 11, 2022 •

edited

Loading

james7132 commented Nov 11, 2022 •

edited

Loading

james7132 commented Nov 11, 2022

james7132 commented Nov 11, 2022

alice-i-cecile Nov 12, 2022

james7132 Nov 12, 2022

james7132 commented Nov 12, 2022

Diddykonga left a comment

Diddykonga Nov 14, 2022

maniwani left a comment

james7132 commented Nov 14, 2022

bors bot commented Nov 14, 2022

alice-i-cecile commented Nov 21, 2022

bors bot commented Nov 21, 2022

[Merged by Bors] - Split Component Ticks #6547

[Merged by Bors] - Split Component Ticks #6547

Conversation

james7132 commented Nov 11, 2022 • edited Loading

Objective

Solution

TODO

Open Questions

Changelog

Migration Guide

james7132 commented Nov 11, 2022 • edited Loading

james7132 commented Nov 11, 2022

james7132 commented Nov 11, 2022

alice-i-cecile Nov 12, 2022

Choose a reason for hiding this comment

james7132 Nov 12, 2022

Choose a reason for hiding this comment

james7132 commented Nov 12, 2022

Diddykonga left a comment

Choose a reason for hiding this comment

Diddykonga Nov 14, 2022

Choose a reason for hiding this comment

maniwani left a comment

Choose a reason for hiding this comment

james7132 commented Nov 14, 2022

bors bot commented Nov 14, 2022

try

alice-i-cecile commented Nov 21, 2022

bors bot commented Nov 21, 2022

james7132 commented Nov 11, 2022 •

edited

Loading

james7132 commented Nov 11, 2022 •

edited

Loading