Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packing selftests/fix kbuild output #2

Conversation

jacob-keller
Copy link

This series squashes in a fix for the packing-checks.h generation, moving it
from lib/Makefile to Kbuild. I am not 100% sure why this fixes things, but I
can now build with both KBUILD_OUTPUT and O=..

I also fixed ice to use packed_field_8, saving a few more bytes, and i'm
working on testing the module sizes for sja1105.

  • packing: generate CHECK_PACKED_FIELDS_N definitions via Kbuild
  • ice: use pack_fields API
  • net: dsa: sja1105: convert sja1105_clocking.c to pack_fields()
  • net: dsa: sja1105: replace ptp_cmd_packing() with packed_field array
  • net: dsa: sja1105: convert sja1105_spi_message_pack() to packed_fields array
  • net: dsa: sja1105: adapt to struct packed_field_8
  • ice: convert to pack_fields_m()

@jacob-keller
Copy link
Author

Here's a summary of the changes in size for the sja1105 module.

From base up to the fe932ef ("net: dsa: sja1105: replace sja1105_xfer_buf()")

add/remove: 22/23 grow/shrink: 48/39 up/down: 11709/-2450 (9259)
Function                                     old     new   delta
sja1110_general_params_entry_packing         933    1678    +745
sja1105pqrs_general_params_entry_packing     831    1492    +661
sja1110_mac_config_entry_packing             856    1501    +645
sja1105pqrs_mac_config_entry_packing         856    1501    +645
sja1105et_mac_config_entry_packing          1332    1977    +645
sja1105et_general_params_entry_packing       702    1223    +521
sja1110_l2_lookup_entry_packing              586    1088    +502
sja1105pqrs_l2_lookup_entry_packing          547     976    +429
sja1105pqrs_l2_lookup_params_entry_packing     437     826    +389
sja1110_l2_lookup_params_entry_packing       452     809    +357
sja1110_vl_lookup_entry_packing              340     635    +295
sja1105_vl_lookup_entry_packing              340     635    +295
sja1110_schedule_entry_packing               339     598    +259
sja1105_schedule_entry_packing               339     598    +259
sja1110_vlan_lookup_entry_packing            246     430    +184
sja1110_retagging_entry_packing              246     430    +184
sja1105_retagging_entry_packing              246     430    +184
sja1105_l2_forwarding_entry_packing          223     406    +183
sja1110_xmii_params_entry_packing            268     447    +179
sja1105et_l2_lookup_params_entry_packing     281     459    +178
sja1105_xmii_params_entry_packing            197     369    +172
sja1110_l2_forwarding_entry_packing          256     415    +159
sja1105_vlan_lookup_entry_packing            215     374    +159
sja1105_ptp_cmd_read                           -     157    +157
sja1105_read_u32                               -     150    +150
sja1105_write_u32                              -     142    +142
sja1110_vl_policing_entry_packing            203     342    +139
sja1105_vl_policing_entry_packing            203     342    +139
sja1105_write_u64                              -     138    +138
sja1110_l2_policing_entry_packing            184     318    +134
sja1105et_l2_lookup_entry_packing            184     318    +134
sja1105_l2_policing_entry_packing            184     318    +134
sja1110_vl_forwarding_params_entry_packing     180     313    +133
sja1105_vl_forwarding_params_entry_packing     180     313    +133
sja1105_read_u64                               -     130    +130
sja1105_ptp_cmd_write                          -     128    +128
sja1110_l2_forwarding_params_entry_packing     161     287    +126
sja1105_l2_forwarding_params_entry_packing     161     287    +126
sja1105_table_header_pack                      -     109    +109
sja1105_table_header_unpack                    -     106    +106
sja1110_vl_forwarding_entry_packing          153     258    +105
sja1105_vl_forwarding_entry_packing          153     258    +105
sja1110_schedule_params_entry_packing        130     216     +86
sja1110_pcp_remapping_entry_packing          133     219     +86
sja1105_schedule_params_entry_packing        133     219     +86
sja1110_schedule_entry_points_entry_packing     122     202     +80
sja1105pqrs_avb_params_entry_packing         122     202     +80
sja1105_schedule_entry_points_entry_packing     122     202     +80
sja1105_static_config_pack                   513     585     +72
sja1105_schedule_entry_points_params_entry_packing     102     146     +44
sja1105et_avb_params_entry_packing           103     145     +42
sja1105_ptp_gettimex                         156     195     +39
sja1105_static_config_get_length             125     157     +32
sja1105_write_buf                              -      31     +31
sja1105_read_buf                               -      28     +28
sja1105_ptp_gettimex.cold                      -      26     +26
sja1105_ptp_enable.cold                        -      22     +22
__pfx_sja1105_write_u64                        -      16     +16
__pfx_sja1105_write_u32                        -      16     +16
__pfx_sja1105_write_buf                        -      16     +16
__pfx_sja1105_table_header_unpack              -      16     +16
__pfx_sja1105_table_header_pack                -      16     +16
__pfx_sja1105_read_u64                         -      16     +16
__pfx_sja1105_read_u32                         -      16     +16
__pfx_sja1105_read_buf                         -      16     +16
__pfx_sja1105_ptp_cmd_write                    -      16     +16
__pfx_sja1105_ptp_cmd_read                     -      16     +16
sja1105_pack                                  48      56      +8
sja1105_unpack                                44      49      +5
sja1105_ptp_enable                           865     870      +5
sja1105_tas_check_running                    137     135      -2
sja1105_dynamic_config_poll_valid            320     318      -2
__sja1105_ptp_gettimex                       133     131      -2
sja1105_pcs_mdio_read_c45                    148     145      -3
sja1105_pack.cold                             87      84      -3
sja1105_crc32                                181     178      -3
sja1105_tas_state_machine.cold               114     110      -4
sja1105_tas_stop.isra                        128     123      -5
sja1105_ptp_txtstamp_skb                     515     510      -5
sja1110_setup_rgmii_delay                    742     736      -6
sja1110_reset_cmd                            157     151      -6
sja1105pqrs_reset_cmd                        157     151      -6
sja1105et_reset_cmd                          157     151      -6
sja1105_table_header_pack_with_crc            85      79      -6
sja1105_rxtstamp_work                        452     446      -6
sja1105_probe                                731     725      -6
sja1105_cgu_idiv_config                      407     401      -6
static_config_buf_prepare_for_upload         197     190      -7
sja1105_dynamic_config_read                  462     454      -8
sja1105_ptp_clock_register                   665     656      -9
sja1110_disable_microcontroller              212     200     -12
sja1105pqrs_setup_rgmii_delay                552     540     -12
sja1105_dynamic_config_write                 457     444     -13
sja1110_pcs_mdio_read_c45                    383     368     -15
__sja1105_ptp_settime                        266     251     -15
sja1105_init_scheduling                     2235    2219     -16
sja1105_get_ethtool_stats                    545     529     -16
__sja1105_ptp_adjtime                        259     243     -16
__pfx_sja1105_xfer_u64                        16       -     -16
__pfx_sja1105_xfer_u32                        16       -     -16
__pfx_sja1105_xfer_buf                        16       -     -16
__pfx_sja1105_table_header_packing            16       -     -16
__pfx_sja1105_ptp_commit                      16       -     -16
sja1105_xfer_buf                              17       -     -17
sja1105_packing                               45      27     -18
sja1105_mdiobus_register                     742     722     -20
sja1105_inhibit_tx                           274     246     -28
sja1105_tas_state_machine                    788     746     -42
sja1105pqrs_mgmt_route_cmd_packing           157     110     -47
sja1105et_mgmt_route_cmd_packing             113      64     -49
sja1105_ptp_adjfine                          219     169     -50
sja1110_vl_forwarding_params_entry_packing.cold      57       -     -57
sja1110_schedule_params_entry_packing.cold      57       -     -57
sja1110_pcp_remapping_entry_packing.cold      57       -     -57
sja1110_l2_forwarding_params_entry_packing.cold      57       -     -57
sja1110_l2_forwarding_entry_packing.cold      57       -     -57
sja1105_vl_forwarding_params_entry_packing.cold      57       -     -57
sja1105_schedule_params_entry_packing.cold      57       -     -57
sja1105_l2_forwarding_params_entry_packing.cold      57       -     -57
sja1105_l2_forwarding_entry_packing.cold      57       -     -57
sja1110_l2_lookup_params_entry_packing.cold      59       -     -59
sja1105_clocking_setup_port                 3146    3086     -60
sja1110_pcs_mdio_write_c45                   362     299     -63
sja1105_pcs_mdio_write_c45                   118      55     -63
sja1105_schedule_entry_points_params_entry_packing.cold      69       -     -69
sja1105et_avb_params_entry_packing.cold       70       -     -70
sja1105_packing.cold                          87       -     -87
sja1105_xfer                                 908     819     -89
sja1105_table_header_packing                 122       -    -122
sja1105_xfer_u64                             215       -    -215
sja1105_ptp_commit                           223       -    -223
sja1105_xfer_u32                             250       -    -250
Total: Before=90050, After=99309, chg +10.28%
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-24 (-24)
Data                                         old     new   delta
_entry_ptr                                   680     656     -24
Total: Before=3296, After=3272, chg -0.73%
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-148 (-148)
RO Data                                      old     new   delta
__func__                                    1067    1051     -16
_entry                                      3740    3608    -132
Total: Before=31740, After=31592, chg -0.47%

These changes seem to add a significant bloat in the function sizes, for an extremely minor savings in the RO Data and Data sections.

From fe932ef ("net: dsa: sja1105: replace sja1105_xfer_buf()") to 25e2d5f ("net: dsa: sja1105: convert sja1105_spi_message_pack() to packed_fields array")

add/remove: 0/16 grow/shrink: 8/11 up/down: 270/-2961 (-2691)
Function                                     old     new   delta
sja1105_tas_state_machine                    746     822     +76
sja1105_ptp_cmd_write                        128     196     +68
sja1105_ptp_cmd_read                         157     222     +65
sja1110_disable_microcontroller              200     227     +27
sja1105_init_scheduling                     2219    2235     +16
sja1105_clocking_setup_port.cold             193     201      +8
sja1105pqrs_setup_rgmii_delay                540     546      +6
sja1105_tas_state_machine.cold               110     114      +4
sja1105_tas_stop.isra                        123     120      -3
sja1105_pack                                  56      49      -7
sja1105_xfer                                 819     809     -10
__pfx_sja1110_cgu_outclk_packing.constprop      16       -     -16
__pfx_sja1105pqrs_ptp_cmd_packing             16       -     -16
__pfx_sja1105et_ptp_cmd_packing               16       -     -16
__pfx_sja1105_tas_check_running               16       -     -16
__pfx_sja1105_cgu_pll_control_packing.constprop      16       -     -16
__pfx_sja1105_cgu_mii_control_packing.constprop      16       -     -16
__pfx_sja1105_cfg_pad_mii_packing.constprop      16       -     -16
__pfx_sja1105_cfg_pad_mii_id_packing.constprop      16       -     -16
sja1105_clocking_setup_port                 3086    3065     -21
sja1105_pack.cold                             84      51     -33
sja1110_cgu_outclk_packing.constprop          95       -     -95
sja1105_cgu_mii_control_packing.constprop      95       -     -95
__sja1105_ptp_settime                        251     145    -106
__sja1105_ptp_adjtime                        243     137    -106
sja1105_ptp_enable                           870     742    -128
sja1105_cgu_idiv_config                      401     268    -133
sja1105_tas_check_running                    135       -    -135
sja1105_ptp_clock_register                   656     514    -142
sja1105_cgu_pll_control_packing.constprop     250       -    -250
sja1105_cfg_pad_mii_id_packing.constprop     250       -    -250
sja1110_setup_rgmii_delay                    736     432    -304
sja1105pqrs_ptp_cmd_packing                  319       -    -319
sja1105et_ptp_cmd_packing                    319       -    -319
sja1105_cfg_pad_mii_packing.constprop        377       -    -377
Total: Before=99309, After=96618, chg -2.71%
add/remove: 1/0 grow/shrink: 0/2 up/down: 56/-64 (-8)
Data                                         old     new   delta
__UNIQUE_ID_ddebug1340                         -      56     +56
_entry_ptr                                   656     648      -8
__UNIQUE_ID_ddebug882                        112      56     -56
Total: Before=3272, After=3264, chg -0.24%
add/remove: 10/0 grow/shrink: 0/1 up/down: 2144/-44 (2100)
RO Data                                      old     new   delta
sja1105_cfg_pad_mii_fields                     -     384    +384
sja1110_cfg_pad_mii_id_fields                  -     320    +320
sja1105pqrs_ptp_cmd_fields                     -     256    +256
sja1105et_ptp_cmd_fields                       -     256    +256
sja1105_cgu_pll_ctrl_fields                    -     256    +256
sja1105_cfg_pad_mii_id_fields                  -     256    +256
sja1105_cgu_idiv_fields                        -     128    +128
sja1110_cgu_outclk_fields                      -      96     +96
sja1105_spi_message_fields                     -      96     +96
sja1105_cgu_mii_ctrl_fields                    -      96     +96
_entry                                      3608    3564     -44
Total: Before=31592, After=33692, chg +6.65%

Here, we lose 2600 bytes in code at a cost of 2100 bytes in RO data. This is nice, but we're still up about 9500 bytes just from switching to pack/unpack.

From 25e2d5f ("net: dsa: sja1105: convert sja1105_spi_message_pack() to packed_fields array") to dcab922 ("net: dsa: sja1105: adapt to struct packed_field_8")

add/remove: 0/0 grow/shrink: 9/0 up/down: 2998/0 (2998)
Function                                     old     new   delta
sja1105_clocking_setup_port                 3065    4771   +1706
sja1105_ptp_cmd_read                         222     500    +278
sja1110_disable_microcontroller              227     476    +249
sja1105pqrs_setup_rgmii_delay                546     781    +235
sja1105_ptp_cmd_write                        196     425    +229
sja1105_cgu_idiv_config                      268     393    +125
sja1110_setup_rgmii_delay                    432     533    +101
sja1105_xfer                                 809     881     +72
sja1105_clocking_setup_port.cold             201     204      +3
Total: Before=96618, After=99616, chg +3.10%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Data                                         old     new   delta
Total: Before=3264, After=3264, chg +0.00%
add/remove: 0/0 grow/shrink: 0/10 up/down: 0/-1876 (-1876)
RO Data                                      old     new   delta
sja1110_cgu_outclk_fields                     96      12     -84
sja1105_spi_message_fields                    96      12     -84
sja1105_cgu_mii_ctrl_fields                   96      12     -84
sja1105_cgu_idiv_fields                      128      16    -112
sja1105pqrs_ptp_cmd_fields                   256      32    -224
sja1105et_ptp_cmd_fields                     256      32    -224
sja1105_cgu_pll_ctrl_fields                  256      32    -224
sja1105_cfg_pad_mii_id_fields                256      32    -224
sja1110_cfg_pad_mii_id_fields                320      40    -280
sja1105_cfg_pad_mii_fields                   384      48    -336
Total: Before=33692, After=31816, chg -5.57%

Here, we see that we gain 3k in code for a savings of 1876 bytes in RO data. You're correct that the macro is significantly worse for multiple uses like in sja1105. But the reality is this was a net cost of 1KB, while the other changes just to switch to pack/unpack appears to have cost ~9k bytes.

summary of the entire changes since the base to the tip

add/remove: 22/39 grow/shrink: 52/33 up/down: 14500/-4934 (9566)
Function                                     old     new   delta
sja1105_clocking_setup_port                 3146    4771   +1625
sja1110_general_params_entry_packing         933    1678    +745
sja1105pqrs_general_params_entry_packing     831    1492    +661
sja1110_mac_config_entry_packing             856    1501    +645
sja1105pqrs_mac_config_entry_packing         856    1501    +645
sja1105et_mac_config_entry_packing          1332    1977    +645
sja1105et_general_params_entry_packing       702    1223    +521
sja1110_l2_lookup_entry_packing              586    1088    +502
sja1105_ptp_cmd_read                           -     500    +500
sja1105pqrs_l2_lookup_entry_packing          547     976    +429
sja1105_ptp_cmd_write                          -     425    +425
sja1105pqrs_l2_lookup_params_entry_packing     437     826    +389
sja1110_l2_lookup_params_entry_packing       452     809    +357
sja1110_vl_lookup_entry_packing              340     635    +295
sja1105_vl_lookup_entry_packing              340     635    +295
sja1110_disable_microcontroller              212     476    +264
sja1110_schedule_entry_packing               339     598    +259
sja1105_schedule_entry_packing               339     598    +259
sja1105pqrs_setup_rgmii_delay                552     781    +229
sja1110_vlan_lookup_entry_packing            246     430    +184
sja1110_retagging_entry_packing              246     430    +184
sja1105_retagging_entry_packing              246     430    +184
sja1105_l2_forwarding_entry_packing          223     406    +183
sja1110_xmii_params_entry_packing            268     447    +179
sja1105et_l2_lookup_params_entry_packing     281     459    +178
sja1105_xmii_params_entry_packing            197     369    +172
sja1110_l2_forwarding_entry_packing          256     415    +159
sja1105_vlan_lookup_entry_packing            215     374    +159
sja1105_read_u32                               -     150    +150
sja1105_write_u32                              -     142    +142
sja1110_vl_policing_entry_packing            203     342    +139
sja1105_vl_policing_entry_packing            203     342    +139
sja1105_write_u64                              -     138    +138
sja1110_l2_policing_entry_packing            184     318    +134
sja1105et_l2_lookup_entry_packing            184     318    +134
sja1105_l2_policing_entry_packing            184     318    +134
sja1110_vl_forwarding_params_entry_packing     180     313    +133
sja1105_vl_forwarding_params_entry_packing     180     313    +133
sja1105_read_u64                               -     130    +130
sja1110_l2_forwarding_params_entry_packing     161     287    +126
sja1105_l2_forwarding_params_entry_packing     161     287    +126
sja1105_table_header_pack                      -     109    +109
sja1105_table_header_unpack                    -     106    +106
sja1110_vl_forwarding_entry_packing          153     258    +105
sja1105_vl_forwarding_entry_packing          153     258    +105
sja1110_schedule_params_entry_packing        130     216     +86
sja1110_pcp_remapping_entry_packing          133     219     +86
sja1105_schedule_params_entry_packing        133     219     +86
sja1110_schedule_entry_points_entry_packing     122     202     +80
sja1105pqrs_avb_params_entry_packing         122     202     +80
sja1105_schedule_entry_points_entry_packing     122     202     +80
sja1105_static_config_pack                   513     585     +72
sja1105_schedule_entry_points_params_entry_packing     102     146     +44
sja1105et_avb_params_entry_packing           103     145     +42
sja1105_ptp_gettimex                         156     195     +39
sja1105_tas_state_machine                    788     822     +34
sja1105_static_config_get_length             125     157     +32
sja1105_write_buf                              -      31     +31
sja1105_read_buf                               -      28     +28
sja1105_ptp_gettimex.cold                      -      26     +26
sja1105_ptp_enable.cold                        -      22     +22
__pfx_sja1105_write_u64                        -      16     +16
__pfx_sja1105_write_u32                        -      16     +16
__pfx_sja1105_write_buf                        -      16     +16
__pfx_sja1105_table_header_unpack              -      16     +16
__pfx_sja1105_table_header_pack                -      16     +16
__pfx_sja1105_read_u64                         -      16     +16
__pfx_sja1105_read_u32                         -      16     +16
__pfx_sja1105_read_buf                         -      16     +16
__pfx_sja1105_ptp_cmd_write                    -      16     +16
__pfx_sja1105_ptp_cmd_read                     -      16     +16
sja1105_clocking_setup_port.cold             193     204     +11
sja1105_unpack                                44      49      +5
sja1105_pack                                  48      49      +1
sja1105_dynamic_config_poll_valid            320     318      -2
__sja1105_ptp_gettimex                       133     131      -2
sja1105_pcs_mdio_read_c45                    148     145      -3
sja1105_crc32                                181     178      -3
sja1105_ptp_txtstamp_skb                     515     510      -5
sja1110_reset_cmd                            157     151      -6
sja1105pqrs_reset_cmd                        157     151      -6
sja1105et_reset_cmd                          157     151      -6
sja1105_table_header_pack_with_crc            85      79      -6
sja1105_rxtstamp_work                        452     446      -6
sja1105_probe                                731     725      -6
static_config_buf_prepare_for_upload         197     190      -7
sja1105_tas_stop.isra                        128     120      -8
sja1105_dynamic_config_read                  462     454      -8
sja1105_dynamic_config_write                 457     444     -13
sja1105_cgu_idiv_config                      407     393     -14
sja1110_pcs_mdio_read_c45                    383     368     -15
sja1105_get_ethtool_stats                    545     529     -16
__pfx_sja1110_cgu_outclk_packing.constprop      16       -     -16
__pfx_sja1105pqrs_ptp_cmd_packing             16       -     -16
__pfx_sja1105et_ptp_cmd_packing               16       -     -16
__pfx_sja1105_xfer_u64                        16       -     -16
__pfx_sja1105_xfer_u32                        16       -     -16
__pfx_sja1105_xfer_buf                        16       -     -16
__pfx_sja1105_tas_check_running               16       -     -16
__pfx_sja1105_table_header_packing            16       -     -16
__pfx_sja1105_ptp_commit                      16       -     -16
__pfx_sja1105_cgu_pll_control_packing.constprop      16       -     -16
__pfx_sja1105_cgu_mii_control_packing.constprop      16       -     -16
__pfx_sja1105_cfg_pad_mii_packing.constprop      16       -     -16
__pfx_sja1105_cfg_pad_mii_id_packing.constprop      16       -     -16
sja1105_xfer_buf                              17       -     -17
sja1105_packing                               45      27     -18
sja1105_mdiobus_register                     742     722     -20
sja1105_xfer                                 908     881     -27
sja1105_inhibit_tx                           274     246     -28
sja1105_pack.cold                             87      51     -36
sja1105pqrs_mgmt_route_cmd_packing           157     110     -47
sja1105et_mgmt_route_cmd_packing             113      64     -49
sja1105_ptp_adjfine                          219     169     -50
sja1110_vl_forwarding_params_entry_packing.cold      57       -     -57
sja1110_schedule_params_entry_packing.cold      57       -     -57
sja1110_pcp_remapping_entry_packing.cold      57       -     -57
sja1110_l2_forwarding_params_entry_packing.cold      57       -     -57
sja1110_l2_forwarding_entry_packing.cold      57       -     -57
sja1105_vl_forwarding_params_entry_packing.cold      57       -     -57
sja1105_schedule_params_entry_packing.cold      57       -     -57
sja1105_l2_forwarding_params_entry_packing.cold      57       -     -57
sja1105_l2_forwarding_entry_packing.cold      57       -     -57
sja1110_l2_lookup_params_entry_packing.cold      59       -     -59
sja1110_pcs_mdio_write_c45                   362     299     -63
sja1105_pcs_mdio_write_c45                   118      55     -63
sja1105_schedule_entry_points_params_entry_packing.cold      69       -     -69
sja1105et_avb_params_entry_packing.cold       70       -     -70
sja1105_packing.cold                          87       -     -87
sja1110_cgu_outclk_packing.constprop          95       -     -95
sja1105_cgu_mii_control_packing.constprop      95       -     -95
__sja1105_ptp_settime                        266     145    -121
sja1105_table_header_packing                 122       -    -122
__sja1105_ptp_adjtime                        259     137    -122
sja1105_ptp_enable                           865     742    -123
sja1105_tas_check_running                    137       -    -137
sja1105_ptp_clock_register                   665     514    -151
sja1110_setup_rgmii_delay                    742     533    -209
sja1105_xfer_u64                             215       -    -215
sja1105_ptp_commit                           223       -    -223
sja1105_xfer_u32                             250       -    -250
sja1105_cgu_pll_control_packing.constprop     250       -    -250
sja1105_cfg_pad_mii_id_packing.constprop     250       -    -250
sja1105pqrs_ptp_cmd_packing                  319       -    -319
sja1105et_ptp_cmd_packing                    319       -    -319
sja1105_cfg_pad_mii_packing.constprop        377       -    -377
Total: Before=90050, After=99616, chg +10.62%
add/remove: 1/0 grow/shrink: 0/2 up/down: 56/-88 (-32)
Data                                         old     new   delta
__UNIQUE_ID_ddebug1340                         -      56     +56
_entry_ptr                                   680     648     -32
__UNIQUE_ID_ddebug882                        112      56     -56
Total: Before=3296, After=3264, chg -0.97%
add/remove: 10/0 grow/shrink: 0/2 up/down: 268/-192 (76)
RO Data                                      old     new   delta
sja1105_cfg_pad_mii_fields                     -      48     +48
sja1110_cfg_pad_mii_id_fields                  -      40     +40
sja1105pqrs_ptp_cmd_fields                     -      32     +32
sja1105et_ptp_cmd_fields                       -      32     +32
sja1105_cgu_pll_ctrl_fields                    -      32     +32
sja1105_cfg_pad_mii_id_fields                  -      32     +32
sja1105_cgu_idiv_fields                        -      16     +16
sja1110_cgu_outclk_fields                      -      12     +12
sja1105_spi_message_fields                     -      12     +12
sja1105_cgu_mii_ctrl_fields                    -      12     +12
__func__                                    1067    1051     -16
_entry                                      3740    3564    -176
Total: Before=31740, After=31816, chg +0.24%

We gain ~9500 bytes of code, and 76 bytes of RO data, and we save 32 bytes of regular data. The majority of these bytes seem to be from refactoring to pack/unpack, if I am reading this correctly.

@jacob-keller
Copy link
Author

For ice, changing from packed_fields to packed_fields_m with packed_field_8

add/remove: 0/0 grow/shrink: 3/1 up/down: 280/-19 (261)
Function                                     old     new   delta
__ice_pack_txq_ctx                            55     174    +119
__ice_pack_rxq_ctx                            55     168    +113
ice_ena_vsi_rdma_qset                        688     736     +48
ice_write_rxq_ctx                            253     234     -19
Total: Before=634610, After=634871, chg +0.04%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Data                                         old     new   delta
Total: Before=63213, After=63213, chg +0.00%
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-1316 (-1316)
RO Data                                      old     new   delta
ice_rlan_ctx_fields                          640      80    -560
ice_tlan_ctx_fields                          864     108    -756
Total: Before=99610, After=98294, chg -1.32%

I suspect the cost of having pack_fields_8 and pack_fields_16 vs having pack_fields_m is worthwhile? I guess it depends on how many call sites actually use the packing library..

It costs ~261 bytes of code but saves ~1300 bytes of RO data to reduce the size down to 4 bytes per field description. For ice that is a definite win.

vladimiroltean added a commit that referenced this pull request Sep 10, 2024
…s_m()

Jacob Keller came with data in #2
that proves that defining pack_fields_m() and unpack_fields_m() as
macros directly callable by consumer drivers is not a great idea.

We can hide those macros inside the lib/packing.c translation module,
and just provide pack_fields_8(), pack_fields_16(), unpack_fields_8()
and unpack_fields_16() as entry points into the library.

We can even go one step further and expose just pack_fields() and
unpack_fields(), and use the new C11 _Generic() selection feature,
which can call one function or the other, depending on the type of
the "fields" array - a caveman form of polymorphism. It is evaluated at
compile time which function will actually be called.

Signed-off-by: Vladimir Oltean <[email protected]>
@vladimiroltean vladimiroltean merged commit 04edbf8 into vladimiroltean:packing-selftests Sep 10, 2024
@vladimiroltean
Copy link
Owner

I did some more work myself on top of this:

  • More or less finished the sja1105 conversion, and vaguely regression-tested it. A lot more testing needs to be done.
  • Moved the pack_fields_m() and unpack_fields_m() macros to lib/packing.c, to avoid loop unrolling in drivers.

jacob-keller pushed a commit to jacob-keller/linux that referenced this pull request Sep 10, 2024
…rnel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch vladimiroltean#1 adds ctnetlink support for kernel side filtering for
	 deletions, from Changliang Wu.

Patch vladimiroltean#2 updates nft_counter support to Use u64_stats_t,
	 from Sebastian Andrzej Siewior.

Patch #3 uses kmemdup_array() in all xtables frontends,
	 from Yan Zhen.

Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead
	 opencoded casting, from Shen Lichuan.

Patch #5 removes unused argument in nftables .validate interface,
	 from Florian Westphal.

Patch torvalds#6 is a oneliner to correct a typo in nftables kdoc,
	 from Simon Horman.

Patch torvalds#7 fixes missing kdoc in nftables, also from Simon.

Patch torvalds#8 updates nftables to handle timeout less than CONFIG_HZ.

Patch torvalds#9 rejects element expiration if timeout is zero,
	 otherwise it is silently ignored.

Patch torvalds#10 disallows element expiration larger than timeout.

Patch torvalds#11 removes unnecessary READ_ONCE annotation while mutex is held.

Patch torvalds#12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset.

Patch torvalds#13 annotates data-races around element expiration.

Patch torvalds#14 allocates timeout and expiration in one single set element
	  extension, they are tighly couple, no reason to keep them
	  separated anymore.

Patch torvalds#15 updates nftables to interpret zero timeout element as never
	  times out. Note that it is already possible to declare sets
	  with elements that never time out but this generalizes to all
	  kind of set with timeouts.

Patch torvalds#16 supports for element timeout and expiration updates.

* tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: set element timeout update support
  netfilter: nf_tables: zero timeout means element never times out
  netfilter: nf_tables: consolidate timeout extension for elements
  netfilter: nf_tables: annotate data-races around element expiration
  netfilter: nft_dynset: annotate data-races around set timeout
  netfilter: nf_tables: remove annotation to access set timeout while holding lock
  netfilter: nf_tables: reject expiration higher than timeout
  netfilter: nf_tables: reject element expiration with no timeout
  netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
  netfilter: nf_tables: Add missing Kernel doc
  netfilter: nf_tables: Correct spelling in nf_tables.h
  netfilter: nf_tables: drop unused 3rd argument from validate callback ops
  netfilter: conntrack: Convert to use ERR_CAST()
  netfilter: Use kmemdup_array instead of kmemdup for multiple allocation
  netfilter: nft_counter: Use u64_stats_t for statistic.
  netfilter: ctnetlink: support CTA_FILTER for flush
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
jacob-keller pushed a commit to jacob-keller/linux that referenced this pull request Sep 10, 2024
Daniel Machon says:

====================
net: lan966x: use the newly introduced FDMA library

This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation.  This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.

In this second series, the FDMA library will be taken into use by the
lan966x switch driver.

 ###################
 # Example of use: #
 ###################

- Initialize the rx and tx fdma structs with values for: number of
  DCB's, number of DB's, channel ID, DB size (data buffer size), and
  total size of the requested memory. Also provide two callbacks:
  nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.

- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().

- Initialize the DCB's with fdma_dcb_init().

- Add new DCB's with fdma_dcb_add().

- Free memory with fdma_free_phys() or fdma_free_coherent().

 #####################
 # Patch  breakdown: #
 #####################

Patch vladimiroltean#1:  select FDMA library for lan966x.

Patch vladimiroltean#2:  includes the fdma_api.h header and removes old symbols.

Patch #3:  replaces old rx and tx variables with equivalent ones from the
           fdma struct. Only the variables that can be changed without
           breaking traffic is changed in this patch.

Patch #4:  uses the library for allocation of rx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch #5:  uses the library for adding DCB's in the rx path.

Patch torvalds#6:  uses the library for freeing rx buffers.

Patch torvalds#7:  uses the library for allocation of tx buffers. This requires
           quite a bit of refactoring in this single patch.

Patch torvalds#8:  uses the library for adding DCB's in the tx path.

Patch torvalds#9:  uses the library helpers in the tx path.

Patch torvalds#10: ditch last_in_use variable and use library instead.

Patch torvalds#11: uses library helpers throughout.

Patch torvalds#12: refactor lan966x_fdma_reload() function.

[1] https://lore.kernel.org/netdev/[email protected]/

Signed-off-by: Daniel Machon <[email protected]>
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
@vladimiroltean
Copy link
Owner

I've done even more cleanup on the patch set and grouped things a little bit (also rebased onto latest net-next).
The first 19 patches should be good for you to take over.

e09756b5ba37 ice: use pack_fields API
383db66e606a ice: cleanup Rx queue context programming functions
1b3a28997a0f ice: move prefetch enable to ice_setup_rx_ctx
1157c941a479 ice: reduce size of queue context fields
5e3c4e74a5a0 ice: use <linux/packing.h> for Tx and Rx queue context data
1d86cc1090ec ice: remove int_q_state from ice_tlan_ctx
0012e22f12c7 lib: packing: add pack_fields() and unpack_fields()
55d8a121e5c4 lib: packing: demote truncation error in pack() to a warning in __pack()
3f0ae70e0073 lib: packing: use GENMASK() for box_mask
e471fc0bbf68 lib: packing: use BITS_PER_BYTE instead of 8
72576fc44579 lib: packing: create __pack() and __unpack() variants without error checking
819c1f0ceaa1 lib: packing: fix QUIRK_MSB_ON_THE_RIGHT behavior
94fac42c93a5 lib: packing: add additional KUnit tests
30933af750c1 lib: packing: add KUnit tests adapted from selftests
01a34a40e4c0 lib: packing: duplicate pack() and unpack() implementations
cb4f2f912801 lib: packing: add pack() and unpack() wrappers over packing()
a7599690f998 lib: packing: remove kernel-doc from header file
7ee79e245fe1 lib: packing: adjust definitions and implementation for arbitrary buffer lengths
3bc89c75beec lib: packing: refuse operating on bit indices which exceed size of buffer

Note that the ice patch for pack_fields() should be squashed, I didn't do that. Preferably grouped in 2 sets, as mentioned.
The other 19 patches are on the ocelot and sja1105 driver, and I can take care of them. I still have some more ideas for improvement, but they're really incremental work which does not depend upon changes on the first set of 19.

@jacob-keller
Copy link
Author

Sounds good. I'll see about that today hopefully, after I finish double checking ice functionality.

@jacob-keller
Copy link
Author

Current plan:

  • send the bug fixes, kunit tests, and cleanup for pack/unpack now

  • send the remaining work for pack_fields and the ice driver to IWL after that merges through IWL.

I can't find a reasonable split to get below 15 patches otherwise, and I think getting the kunit tests and bug fixes in is a good idea. I don't really want to refactor the kunit tests back to using packing, so I think that is a reasonable approach.

@jacob-keller
Copy link
Author

Current plan:

  • send the bug fixes, kunit tests, and cleanup for pack/unpack now
  • send the remaining work for pack_fields and the ice driver to IWL after that merges through IWL.

I can't find a reasonable split to get below 15 patches otherwise, and I think getting the kunit tests and bug fixes in is a good idea. I don't really want to refactor the kunit tests back to using packing, so I think that is a reasonable approach.

It looks like netdev has an overload of patches from this week, and the merge window may close this weekend. I'm going to wait to send anything for now. I may send next week if the window doesn't close this weekend.

@vladimiroltean
Copy link
Owner

Current plan:

  • send the bug fixes, kunit tests, and cleanup for pack/unpack now
  • send the remaining work for pack_fields and the ice driver to IWL after that merges through IWL.

I can't find a reasonable split to get below 15 patches otherwise, and I think getting the kunit tests and bug fixes in is a good idea. I don't really want to refactor the kunit tests back to using packing, so I think that is a reasonable approach.

Yes please. Sounds good for the KUnit tests to cover pack() and unpack() rather than packing().

@vladimiroltean
Copy link
Owner

It looks like netdev has an overload of patches from this week, and the merge window may close this weekend. I'm going to wait to send anything for now. I may send next week if the window doesn't close this weekend.

I don't know, it's your choice. The overload of patches is not going to stop me from sending some of my own :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants