Database unlock is slow in 2.7.0-beta1 #7487

ghost · 2022-03-03T11:35:00Z

Overview

Unlocking a database in KeePassXC 2.7.0-beta1 is much slower than unlocking the same database in KeePassXC 2.6.6 or KeePass 2.50.

Steps to Reproduce

Create a database with Argon2d Iterations 128, Memory 64MB, Parallelism 8.
Unlock the database.

Expected Behavior

Unlocking the database takes about 2 seconds.

Actual Behavior

Unlocking the database takes about 9 seconds.

Context

With KeePassXC 2.6.6 or KeePass 2.50, unlocking takes about 2 seconds.
I created a database with KeePassXC 2.6.6, KeePassXC 2.7.0-beta1 and KeePass 2.50, but it happened with any database.
I suspected antivirus (Kaspersky) interference and paused protection before unlocking it, but it didn't improve.
It was also reproduced in Windows 11 Insider Preview Dev on VMware Player.

CPU: Core i7-7700K
RAM: DDR4-2400 16GB

KeePassXC - 2.7.0-beta1
Revision: 046e508

Operating System: Windows 10 21H2

The text was updated successfully, but these errors were encountered:

phoerious · 2022-03-03T12:08:58Z

I suppose Botan is less optimized than libargon2. @droidmonkey Should we do some benchmarks?

droidmonkey · 2022-03-03T12:17:31Z

From 2 to 9 seconds is extreme and interesting. I definitely notice a slower speed with botan, but not >4x slower.

https://github.com/microsoft/vcpkg/tree/master/ports%2Fbotan

https://github.com/msys2/MINGW-packages/tree/master/mingw-w64-libbotan

#6209 (comment)

@randombit I am building Botan using vcpkg, might we be missing an optimization flag in there?

randombit · 2022-03-03T14:59:24Z

Are you sure you are not using their dbg mode? That sets --debug-mode to the library configure script, which disables optimizations completely.

Also I have no idea of the performance of Argon2 on VC. In general I use GCC/Clang on Linux, and IME the VC optimizer is significantly less good.

The iteration count for Argon2 seems really high given that RFC 9106 recommends no more than t=3. I doubt I ever checked performance with much higher iterations. I'll look into this. I remember a long time ago checking the libargon2 library vs Botan and they were about the same but I'm sure that was with a small t.

ghost · 2022-03-03T15:36:03Z

When I tried it with Iterations 3 Memory 2047MB Parallelism 8, it took about 1-2 seconds with KeePassXC 2.6.6, 1.3 seconds with KeePass 2.50, and about 6-7 seconds with KeePassXC 2.7.0-beta1. When I tried it again with Iterations 128 Memory 64MB Parallelism 8, it took about 8 seconds, so I don't think the impact of Iterations is significant.

(Is it better to reduce the number of iterations? The default setting of the KeePass 2.50 is Memory 64MB, so I used it.)

phoerious · 2022-03-03T15:39:24Z

It should definitely not be slower than KeePass2. In fact, it should be much faster than that.

droidmonkey · 2022-03-03T16:18:40Z

You reduced iterations and significantly increased memory. Do one or the other. I certainly don't see the level of degradation that you do under a 1 second benchmark. My database decrypts in roughly the same time between my mobile (KeePassDX) and laptop (KPXC 2.7.0). My laptop should be much faster than mobile, but it's not.

I'll have to check on the dbg mode.

randombit · 2022-03-03T17:32:52Z

Comparing libargon2 (https://github.com/P-H-C/phc-winner-argon2 as packaged on Arch Linux) with Botan. This is a i7-6700K at 4.2 GHz. Both using Argon2id, 64 Mb, t=128, p=8

Botan: 4.8 s (via botan speed argon2)
libargon2: 10.4 s (using time reported by argon2 cli util)

[Why this discrepancy is so large in favor of Botan I have no idea]

... oh now I see why ...

libargon2 does use threads for p > 1 (Botan does not) but the time it reports is - I guess - the sum total across all of the threads, rather than the wall clock time which is what anyone normal cares about. So for above example the cli util reports 10.4 s but actually only takes 2.7 seconds wall time.

OK so for p > 1 you are going to see a fairly large perf hit. I can look into using threads (there is even a todo in the source about this) but that won't help when building against existing releases of the library.

droidmonkey · 2022-03-03T18:09:42Z

Oof that is a big deal if we aren't doing parallelism in threads. This may make me want to go back to the reference implementation.

phoerious · 2022-03-03T18:19:30Z

I am in favour of that until this is fixed. I can confirm that 2.6.6 with p > 1 decrypts much faster. When I benchmark to 128MiB at 16 threads, I get 6 rounds in 2.7.0-beta1 and 44 rounds in 2.6.6.

But that's not all. If I set p = 1, I still get 6 rounds in 2.7.0-beta1 (not surprisingly), but 10 rounds in 2.6.6. So while the missing support for threads is the biggest issue, the whole thing seems less efficient than the reference implementation.

droidmonkey · 2022-03-03T20:14:14Z

Luckily it's not a big deal to go back to libargon2. Will try to get that in a PR tonight for review.

@randombit why not just embed and wrap the reference implementation by Botan API? Their license is CC0 / Apache.

randombit · 2022-03-04T12:41:40Z

@droidmonkey The way their code is written, integrating it/adapting it would be harder than just adding threading and SIMD support directly. PR adding threads is up now, SSSE3 version is written but needs some debugging still. I'll backport these in the next 2.x release. Probably too late to get into upcoming Ubuntu 22.04 though. Wish this issue had been raised sooner. :/

phoerious · 2022-03-04T12:43:28Z

Great, thanks for resolving this quickly. @droidmonkey Could we perhaps compile conditionally, depending on the installed Botan version?

randombit · 2022-03-04T13:08:25Z

Just as an FYI, I wrote a quick tool to properly benchmark the performance of libargon2 (https://gist.github.com/randombit/adaaaecfc23906af0fe8f3e7ea0257cd) since their cli util is clearly not so helpful for this. For params M=65536, t=128, p=8, on an 4 core CPU with AVX2 and HT, I am seeing 1386 ms/per hash with Botan+threads, and 1375 ms/per hash with libargon2, basically the same. I think libargon2 has the edge in that it has SIMD support, but Botan is making up for most of that by using a thread pool instead of creating+killing many threads.

@phoerious I'm curious about the results you are seeing with p = 1. Windows or Linux? Does your CPU have AVX-512?

phoerious · 2022-03-04T13:27:33Z

@phoerious I'm curious about the results you are seeing with p = 1. Windows or Linux? Does your CPU have AVX-512?

Windows, Ryzen 3950X, so I guess not. Also tested it on the Threadripper 2920X machine I'm working on right now (Ubuntu Linux, AVX2 only) and I get roughly the same performance there (8 vs. 7, sometimes 8).

Here's botan speed (2.17.3) vs your libargon2 benchmark tool with t=1, m=65536, p=1) on the Threadripper:

Argon2id M=65536 t=1 p=1 15 /sec; 65.44 ms/op 229022108 cycles/op (46 ops in 3010 ms)

42 runs in 3069.218427 ms -> 73.076629 ms/run

I can test the Ryzen in the evening.

phoerious · 2022-03-04T18:11:04Z

@randombit Here's the results on my Ryzen (Windows, compiled via Msys2):

Argon2id M=65536 t=1 p=1 16 /sec; 60.08 ms/op 600744 cycles/op (50 ops in 3004 ms)

59 runs in 3045.006700 ms -> 51.610283 ms/run

* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

* Fix keepassxreboot#7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

ghost added the PRE-RELEASE BUG label Mar 3, 2022

ghost assigned droidmonkey Mar 3, 2022

randombit mentioned this issue Mar 4, 2022

Add threading to Argon2 randombit/botan#2926

Merged

randombit mentioned this issue Mar 4, 2022

Add Argon2 SSSE3 implementation randombit/botan#2927

Merged

droidmonkey added a commit that referenced this issue Mar 5, 2022

Revert back to Argon2 reference library

8ad3dd2

* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

droidmonkey mentioned this issue Mar 5, 2022

Revert back to reference Argon2 library #7497

Merged

droidmonkey added a commit that referenced this issue Mar 5, 2022

Revert back to Argon2 reference library

b0bcf29

* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

droidmonkey added a commit that referenced this issue Mar 5, 2022

Revert back to Argon2 reference library

4258f40

* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

droidmonkey added a commit that referenced this issue Mar 5, 2022

Revert back to Argon2 reference library

a64da71

* Fix #7487 - Botan does not use threads when calculating Argon2 KDF leading to very poor performance for a parallelism value > 1. * Include port file for vcpkg backed builds

droidmonkey closed this as completed in e5a2958 Mar 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database unlock is slow in 2.7.0-beta1 #7487

Database unlock is slow in 2.7.0-beta1 #7487

ghost commented Mar 3, 2022

phoerious commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

randombit commented Mar 3, 2022

ghost commented Mar 3, 2022

phoerious commented Mar 3, 2022

droidmonkey commented Mar 3, 2022 •

edited

Loading

randombit commented Mar 3, 2022

droidmonkey commented Mar 3, 2022 •

edited

Loading

phoerious commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

randombit commented Mar 4, 2022

phoerious commented Mar 4, 2022

randombit commented Mar 4, 2022

phoerious commented Mar 4, 2022 •

edited

Loading

phoerious commented Mar 4, 2022 •

edited

Loading

Database unlock is slow in 2.7.0-beta1 #7487

Database unlock is slow in 2.7.0-beta1 #7487

Comments

ghost commented Mar 3, 2022

Overview

Steps to Reproduce

Expected Behavior

Actual Behavior

Context

phoerious commented Mar 3, 2022 • edited Loading

droidmonkey commented Mar 3, 2022 • edited Loading

randombit commented Mar 3, 2022

ghost commented Mar 3, 2022

phoerious commented Mar 3, 2022

droidmonkey commented Mar 3, 2022 • edited Loading

randombit commented Mar 3, 2022

droidmonkey commented Mar 3, 2022 • edited Loading

phoerious commented Mar 3, 2022 • edited Loading

droidmonkey commented Mar 3, 2022 • edited Loading

randombit commented Mar 4, 2022

phoerious commented Mar 4, 2022

randombit commented Mar 4, 2022

phoerious commented Mar 4, 2022 • edited Loading

phoerious commented Mar 4, 2022 • edited Loading

phoerious commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

phoerious commented Mar 3, 2022 •

edited

Loading

droidmonkey commented Mar 3, 2022 •

edited

Loading

phoerious commented Mar 4, 2022 •

edited

Loading

phoerious commented Mar 4, 2022 •

edited

Loading