Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign crypto aliases to different AES implementation modules #4198

Merged
merged 1 commit into from
Mar 11, 2021

Conversation

bavison
Copy link
Contributor

@bavison bavison commented Mar 9, 2021

@ebenupton has been following the investigations that spawned this PR.

The kernel modules aes-neon-blk and aes-neon-bs perform poorly, at least on
Cortex-A72 without crypto extensions. In fact, aes-arm64 outperforms them
on benchmarks, despite it being a simpler implementation (only accelerating
the single-block AES cipher).

For modes of operation where multiple cipher blocks can be processed in
parallel, aes-neon-bs outperforms aes-neon-blk by around 60-70% and aes-arm64
is another 10-20% faster still. But the difference is even more marked with
modes of operation with dependencies between neighbouring blocks, such as
CBC encryption, which defeat parallelism: in these cases, aes-arm64 is
typically around 250% faster than either aes-neon-blk or aes-neon-bs.

The key trade-off with aes-arm64 is that the look-up tables are situated in
RAM. This leaves them potentially open to cache timing attacks. The two other
modules, by contrast, load the look-up tables into NEON registers and so are
able to perform in constant time.

This patch aims to load aes-arm64 more often.

If none of the currently-loaded crypto modules implement a given algorithm,
a new one is typically selected for loading using a platform-neutral alias
describing the required algorithm. To enable users to still
load aes-neon-blk or aes-neon-bs if they really want them, while still
ensuring that aes-arm64 is usually selected, remove the aliases from
aes-neonbs-glue.c and aes-glue.c and apply them to aes-cipher-glue.c, but
still build the two NEON modules.

Since aes-glue.c can also be used to build aes-ce-blk, leave them enabled
if USE_V8_CRYPTO_EXTENSIONS is defined, to ensure they are selected if we
in future use a CPU which has the crypto extensions enabled.

Note that the algorithm priority specifiers are unchanged, so if
aes-neon-bs is loaded at the same time as aes-arm64, the former will be
used in preference. However, aes-neon-blk and aes-arm64 have tied priority,
so whichever module was loaded first will be used (assuming aes-neon-bs is
not loaded).

Signed-off-by: Ben Avison [email protected]

The kernel modules aes-neon-blk and aes-neon-bs perform poorly, at least on
Cortex-A72 without crypto extensions. In fact, aes-arm64 outperforms them
on benchmarks, despite it being a simpler implementation (only accelerating
the single-block AES cipher).

For modes of operation where multiple cipher blocks can be processed in
parallel, aes-neon-bs outperforms aes-neon-blk by around 60-70% and aes-arm64
is another 10-20% faster still. But the difference is even more marked with
modes of operation with dependencies between neighbouring blocks, such as
CBC encryption, which defeat parallelism: in these cases, aes-arm64 is
typically around 250% faster than either aes-neon-blk or aes-neon-bs.

The key trade-off with aes-arm64 is that the look-up tables are situated in
RAM. This leaves them potentially open to cache timing attacks. The two other
modules, by contrast, load the look-up tables into NEON registers and so are
able to perform in constant time.

This patch aims to load aes-arm64 more often.

If none of the currently-loaded crypto modules implement a given algorithm,
a new one is typically selected for loading using a platform-neutral alias
describing the required algorithm. To enable users to still
load aes-neon-blk or aes-neon-bs if they really want them, while still
ensuring that aes-arm64 is usually selected, remove the aliases from
aes-neonbs-glue.c and aes-glue.c and apply them to aes-cipher-glue.c, but
still build the two NEON modules.

Since aes-glue.c can also be used to build aes-ce-blk, leave them enabled
if USE_V8_CRYPTO_EXTENSIONS is defined, to ensure they are selected if we
in future use a CPU which has the crypto extensions enabled.

Note that the algorithm priority specifiers are unchanged, so if
aes-neon-bs is loaded at the same time as aes-arm64, the former will be
used in preference. However, aes-neon-blk and aes-arm64 have tied priority,
so whichever module was loaded first will be used (assuming aes-neon-bs is
not loaded).

Signed-off-by: Ben Avison <[email protected]>
@pelwell
Copy link
Contributor

pelwell commented Mar 9, 2021

Do you have a simple test to run with and without this patch that demonstrates the performance improvement?

@bavison
Copy link
Contributor Author

bavison commented Mar 9, 2021

A simple one is

cryptsetup benchmark aes

Without this patch, lsmod | grep aes shows aes-neon-blk as the only AES module loaded. With it, aes-arm64 will show up in its place. You can do a combination of sudo modprobe and sudo rmmod to test various combinations of modules if you wish; sudo modprobe "crypto-cbc(aes)" or similar can also be used to load the module that provides the alias for a given algorithm.

Note also that (I believe) the Bluetooth stack pulls in whichever module has alias "crypto-cmac(aes)", and you can't remove a module if it's in use, so if you want to test without the module, you'd need to stick it in a blacklist somewhere in /etc/modprobe.d.

@pelwell pelwell merged commit 59f05f0 into raspberrypi:rpi-5.10.y Mar 11, 2021
@pelwell
Copy link
Contributor

pelwell commented Mar 11, 2021

Thanks - that looks like a useful improvement.

popcornmix added a commit to raspberrypi/firmware that referenced this pull request Mar 15, 2021
kernel: drm/vc4: crtc: Reduce PV fifo threshold on hvs4
See: raspberrypi/linux#4207

kernel: vc4/drm: Adjustments to hdmi audio dma to reduce glitches
See: raspberrypi/linux#4208

kernel: overlays: gpio-led: new overlay
See: raspberrypi/linux#4206

kernel: bcm2835-codec tweaks
See: raspberrypi/linux#4113

kernel: Assign crypto aliases to different AES implementation modules
See: raspberrypi/linux#4198

kernel: media: bcm2835-unicam: Fix bug in buffer swapping logic
See: raspberrypi/linux#4189

kernel: configs: Add CONFIG_RTS_HCTOSYS=y
See: raspberrypi/linux#4205

kernel: overlays: Improve the i2c-rtc,i2c_csi_dsi option

firmware: video_decode: For VC1/WMV with no signalled header bytes, use start of 1st buffer
See: raspberrypi/linux#4113
popcornmix added a commit to Hexxeh/rpi-firmware that referenced this pull request Mar 15, 2021
kernel: drm/vc4: crtc: Reduce PV fifo threshold on hvs4
See: raspberrypi/linux#4207

kernel: vc4/drm: Adjustments to hdmi audio dma to reduce glitches
See: raspberrypi/linux#4208

kernel: overlays: gpio-led: new overlay
See: raspberrypi/linux#4206

kernel: bcm2835-codec tweaks
See: raspberrypi/linux#4113

kernel: Assign crypto aliases to different AES implementation modules
See: raspberrypi/linux#4198

kernel: media: bcm2835-unicam: Fix bug in buffer swapping logic
See: raspberrypi/linux#4189

kernel: configs: Add CONFIG_RTS_HCTOSYS=y
See: raspberrypi/linux#4205

kernel: overlays: Improve the i2c-rtc,i2c_csi_dsi option

firmware: video_decode: For VC1/WMV with no signalled header bytes, use start of 1st buffer
See: raspberrypi/linux#4113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants