-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assign crypto aliases to different AES implementation modules #4198
Assign crypto aliases to different AES implementation modules #4198
Conversation
The kernel modules aes-neon-blk and aes-neon-bs perform poorly, at least on Cortex-A72 without crypto extensions. In fact, aes-arm64 outperforms them on benchmarks, despite it being a simpler implementation (only accelerating the single-block AES cipher). For modes of operation where multiple cipher blocks can be processed in parallel, aes-neon-bs outperforms aes-neon-blk by around 60-70% and aes-arm64 is another 10-20% faster still. But the difference is even more marked with modes of operation with dependencies between neighbouring blocks, such as CBC encryption, which defeat parallelism: in these cases, aes-arm64 is typically around 250% faster than either aes-neon-blk or aes-neon-bs. The key trade-off with aes-arm64 is that the look-up tables are situated in RAM. This leaves them potentially open to cache timing attacks. The two other modules, by contrast, load the look-up tables into NEON registers and so are able to perform in constant time. This patch aims to load aes-arm64 more often. If none of the currently-loaded crypto modules implement a given algorithm, a new one is typically selected for loading using a platform-neutral alias describing the required algorithm. To enable users to still load aes-neon-blk or aes-neon-bs if they really want them, while still ensuring that aes-arm64 is usually selected, remove the aliases from aes-neonbs-glue.c and aes-glue.c and apply them to aes-cipher-glue.c, but still build the two NEON modules. Since aes-glue.c can also be used to build aes-ce-blk, leave them enabled if USE_V8_CRYPTO_EXTENSIONS is defined, to ensure they are selected if we in future use a CPU which has the crypto extensions enabled. Note that the algorithm priority specifiers are unchanged, so if aes-neon-bs is loaded at the same time as aes-arm64, the former will be used in preference. However, aes-neon-blk and aes-arm64 have tied priority, so whichever module was loaded first will be used (assuming aes-neon-bs is not loaded). Signed-off-by: Ben Avison <[email protected]>
Do you have a simple test to run with and without this patch that demonstrates the performance improvement? |
A simple one is
Without this patch, Note also that (I believe) the Bluetooth stack pulls in whichever module has alias "crypto-cmac(aes)", and you can't remove a module if it's in use, so if you want to test without the module, you'd need to stick it in a blacklist somewhere in /etc/modprobe.d. |
Thanks - that looks like a useful improvement. |
kernel: drm/vc4: crtc: Reduce PV fifo threshold on hvs4 See: raspberrypi/linux#4207 kernel: vc4/drm: Adjustments to hdmi audio dma to reduce glitches See: raspberrypi/linux#4208 kernel: overlays: gpio-led: new overlay See: raspberrypi/linux#4206 kernel: bcm2835-codec tweaks See: raspberrypi/linux#4113 kernel: Assign crypto aliases to different AES implementation modules See: raspberrypi/linux#4198 kernel: media: bcm2835-unicam: Fix bug in buffer swapping logic See: raspberrypi/linux#4189 kernel: configs: Add CONFIG_RTS_HCTOSYS=y See: raspberrypi/linux#4205 kernel: overlays: Improve the i2c-rtc,i2c_csi_dsi option firmware: video_decode: For VC1/WMV with no signalled header bytes, use start of 1st buffer See: raspberrypi/linux#4113
kernel: drm/vc4: crtc: Reduce PV fifo threshold on hvs4 See: raspberrypi/linux#4207 kernel: vc4/drm: Adjustments to hdmi audio dma to reduce glitches See: raspberrypi/linux#4208 kernel: overlays: gpio-led: new overlay See: raspberrypi/linux#4206 kernel: bcm2835-codec tweaks See: raspberrypi/linux#4113 kernel: Assign crypto aliases to different AES implementation modules See: raspberrypi/linux#4198 kernel: media: bcm2835-unicam: Fix bug in buffer swapping logic See: raspberrypi/linux#4189 kernel: configs: Add CONFIG_RTS_HCTOSYS=y See: raspberrypi/linux#4205 kernel: overlays: Improve the i2c-rtc,i2c_csi_dsi option firmware: video_decode: For VC1/WMV with no signalled header bytes, use start of 1st buffer See: raspberrypi/linux#4113
@ebenupton has been following the investigations that spawned this PR.
The kernel modules aes-neon-blk and aes-neon-bs perform poorly, at least on
Cortex-A72 without crypto extensions. In fact, aes-arm64 outperforms them
on benchmarks, despite it being a simpler implementation (only accelerating
the single-block AES cipher).
For modes of operation where multiple cipher blocks can be processed in
parallel, aes-neon-bs outperforms aes-neon-blk by around 60-70% and aes-arm64
is another 10-20% faster still. But the difference is even more marked with
modes of operation with dependencies between neighbouring blocks, such as
CBC encryption, which defeat parallelism: in these cases, aes-arm64 is
typically around 250% faster than either aes-neon-blk or aes-neon-bs.
The key trade-off with aes-arm64 is that the look-up tables are situated in
RAM. This leaves them potentially open to cache timing attacks. The two other
modules, by contrast, load the look-up tables into NEON registers and so are
able to perform in constant time.
This patch aims to load aes-arm64 more often.
If none of the currently-loaded crypto modules implement a given algorithm,
a new one is typically selected for loading using a platform-neutral alias
describing the required algorithm. To enable users to still
load aes-neon-blk or aes-neon-bs if they really want them, while still
ensuring that aes-arm64 is usually selected, remove the aliases from
aes-neonbs-glue.c and aes-glue.c and apply them to aes-cipher-glue.c, but
still build the two NEON modules.
Since aes-glue.c can also be used to build aes-ce-blk, leave them enabled
if USE_V8_CRYPTO_EXTENSIONS is defined, to ensure they are selected if we
in future use a CPU which has the crypto extensions enabled.
Note that the algorithm priority specifiers are unchanged, so if
aes-neon-bs is loaded at the same time as aes-arm64, the former will be
used in preference. However, aes-neon-blk and aes-arm64 have tied priority,
so whichever module was loaded first will be used (assuming aes-neon-bs is
not loaded).
Signed-off-by: Ben Avison [email protected]