Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database unlock is slow in 2.7.0-beta1 #7487

Closed
ghost opened this issue Mar 3, 2022 · 15 comments
Closed

Database unlock is slow in 2.7.0-beta1 #7487

ghost opened this issue Mar 3, 2022 · 15 comments
Assignees

Comments

@ghost
Copy link

ghost commented Mar 3, 2022

Overview

Unlocking a database in KeePassXC 2.7.0-beta1 is much slower than unlocking the same database in KeePassXC 2.6.6 or KeePass 2.50.

Steps to Reproduce

  1. Create a database with Argon2d Iterations 128, Memory 64MB, Parallelism 8.
  2. Unlock the database.

Expected Behavior

Unlocking the database takes about 2 seconds.

Actual Behavior

Unlocking the database takes about 9 seconds.

Context

With KeePassXC 2.6.6 or KeePass 2.50, unlocking takes about 2 seconds.
I created a database with KeePassXC 2.6.6, KeePassXC 2.7.0-beta1 and KeePass 2.50, but it happened with any database.
I suspected antivirus (Kaspersky) interference and paused protection before unlocking it, but it didn't improve.
It was also reproduced in Windows 11 Insider Preview Dev on VMware Player.

CPU: Core i7-7700K
RAM: DDR4-2400 16GB

KeePassXC - 2.7.0-beta1
Revision: 046e508

Operating System: Windows 10 21H2

@ghost ghost added the PRE-RELEASE BUG label Mar 3, 2022
@ghost ghost assigned droidmonkey Mar 3, 2022
@phoerious
Copy link
Member

phoerious commented Mar 3, 2022

I suppose Botan is less optimized than libargon2. @droidmonkey Should we do some benchmarks?

@droidmonkey
Copy link
Member

droidmonkey commented Mar 3, 2022

From 2 to 9 seconds is extreme and interesting. I definitely notice a slower speed with botan, but not >4x slower.

https://github.com/microsoft/vcpkg/tree/master/ports%2Fbotan

https://github.com/msys2/MINGW-packages/tree/master/mingw-w64-libbotan

#6209 (comment)

@randombit I am building Botan using vcpkg, might we be missing an optimization flag in there?

@randombit
Copy link

Are you sure you are not using their dbg mode? That sets --debug-mode to the library configure script, which disables optimizations completely.

Also I have no idea of the performance of Argon2 on VC. In general I use GCC/Clang on Linux, and IME the VC optimizer is significantly less good.

The iteration count for Argon2 seems really high given that RFC 9106 recommends no more than t=3. I doubt I ever checked performance with much higher iterations. I'll look into this. I remember a long time ago checking the libargon2 library vs Botan and they were about the same but I'm sure that was with a small t.

@ghost
Copy link
Author

ghost commented Mar 3, 2022

When I tried it with Iterations 3 Memory 2047MB Parallelism 8, it took about 1-2 seconds with KeePassXC 2.6.6, 1.3 seconds with KeePass 2.50, and about 6-7 seconds with KeePassXC 2.7.0-beta1. When I tried it again with Iterations 128 Memory 64MB Parallelism 8, it took about 8 seconds, so I don't think the impact of Iterations is significant.

(Is it better to reduce the number of iterations? The default setting of the KeePass 2.50 is Memory 64MB, so I used it.)

@phoerious
Copy link
Member

It should definitely not be slower than KeePass2. In fact, it should be much faster than that.

@droidmonkey
Copy link
Member

droidmonkey commented Mar 3, 2022

You reduced iterations and significantly increased memory. Do one or the other. I certainly don't see the level of degradation that you do under a 1 second benchmark. My database decrypts in roughly the same time between my mobile (KeePassDX) and laptop (KPXC 2.7.0). My laptop should be much faster than mobile, but it's not.

I'll have to check on the dbg mode.

@randombit
Copy link

Comparing libargon2 (https://github.com/P-H-C/phc-winner-argon2 as packaged on Arch Linux) with Botan. This is a i7-6700K at 4.2 GHz. Both using Argon2id, 64 Mb, t=128, p=8

Botan: 4.8 s (via botan speed argon2)
libargon2: 10.4 s (using time reported by argon2 cli util)

[Why this discrepancy is so large in favor of Botan I have no idea]

... oh now I see why ...

libargon2 does use threads for p > 1 (Botan does not) but the time it reports is - I guess - the sum total across all of the threads, rather than the wall clock time which is what anyone normal cares about. So for above example the cli util reports 10.4 s but actually only takes 2.7 seconds wall time.

OK so for p > 1 you are going to see a fairly large perf hit. I can look into using threads (there is even a todo in the source about this) but that won't help when building against existing releases of the library.

@droidmonkey
Copy link
Member

droidmonkey commented Mar 3, 2022

Oof that is a big deal if we aren't doing parallelism in threads. This may make me want to go back to the reference implementation.

@phoerious
Copy link
Member

phoerious commented Mar 3, 2022

I am in favour of that until this is fixed. I can confirm that 2.6.6 with p > 1 decrypts much faster. When I benchmark to 128MiB at 16 threads, I get 6 rounds in 2.7.0-beta1 and 44 rounds in 2.6.6.

But that's not all. If I set p = 1, I still get 6 rounds in 2.7.0-beta1 (not surprisingly), but 10 rounds in 2.6.6. So while the missing support for threads is the biggest issue, the whole thing seems less efficient than the reference implementation.

@droidmonkey
Copy link
Member

droidmonkey commented Mar 3, 2022

Luckily it's not a big deal to go back to libargon2. Will try to get that in a PR tonight for review.

@randombit why not just embed and wrap the reference implementation by Botan API? Their license is CC0 / Apache.

@randombit
Copy link

@droidmonkey The way their code is written, integrating it/adapting it would be harder than just adding threading and SIMD support directly. PR adding threads is up now, SSSE3 version is written but needs some debugging still. I'll backport these in the next 2.x release. Probably too late to get into upcoming Ubuntu 22.04 though. Wish this issue had been raised sooner. :/

@phoerious
Copy link
Member

Great, thanks for resolving this quickly. @droidmonkey Could we perhaps compile conditionally, depending on the installed Botan version?

@randombit
Copy link

Just as an FYI, I wrote a quick tool to properly benchmark the performance of libargon2 (https://gist.github.com/randombit/adaaaecfc23906af0fe8f3e7ea0257cd) since their cli util is clearly not so helpful for this. For params M=65536, t=128, p=8, on an 4 core CPU with AVX2 and HT, I am seeing 1386 ms/per hash with Botan+threads, and 1375 ms/per hash with libargon2, basically the same. I think libargon2 has the edge in that it has SIMD support, but Botan is making up for most of that by using a thread pool instead of creating+killing many threads.

@phoerious I'm curious about the results you are seeing with p = 1. Windows or Linux? Does your CPU have AVX-512?

@phoerious
Copy link
Member

phoerious commented Mar 4, 2022

@phoerious I'm curious about the results you are seeing with p = 1. Windows or Linux? Does your CPU have AVX-512?

Windows, Ryzen 3950X, so I guess not. Also tested it on the Threadripper 2920X machine I'm working on right now (Ubuntu Linux, AVX2 only) and I get roughly the same performance there (8 vs. 7, sometimes 8).

Here's botan speed (2.17.3) vs your libargon2 benchmark tool with t=1, m=65536, p=1) on the Threadripper:

Argon2id M=65536 t=1 p=1 15 /sec; 65.44 ms/op 229022108 cycles/op (46 ops in 3010 ms)

42 runs in 3069.218427 ms -> 73.076629 ms/run

I can test the Ryzen in the evening.

@phoerious
Copy link
Member

phoerious commented Mar 4, 2022

@randombit Here's the results on my Ryzen (Windows, compiled via Msys2):

Argon2id M=65536 t=1 p=1 16 /sec; 60.08 ms/op 600744 cycles/op (50 ops in 3004 ms)

59 runs in 3045.006700 ms -> 51.610283 ms/run

droidmonkey added a commit that referenced this issue Mar 5, 2022
* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1.
* Include port file for vcpkg backed builds
droidmonkey added a commit that referenced this issue Mar 5, 2022
* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1.
* Include port file for vcpkg backed builds
droidmonkey added a commit that referenced this issue Mar 5, 2022
* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1.
* Include port file for vcpkg backed builds
droidmonkey added a commit that referenced this issue Mar 5, 2022
* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1.
* Include port file for vcpkg backed builds
t-h-e pushed a commit to t-h-e/keepassxc that referenced this issue Sep 8, 2022
* Fix keepassxreboot#7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1.
* Include port file for vcpkg backed builds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants