NT-long-opencl #5245

magnumripper · 2023-03-09T19:17:31Z

@solardiz, I have a trivial patch to NT-opencl that bumps its max length to 59 (or more). Due to the extra overhead and perhaps more because we can no longer reverse MD4 steps, it makes the format about 9% slower even at single-block lengths. I'm pondering two alternatives:

a) The changes are #if PLAINTEXT_LENGTH > 27. I could commit the changes except the actual change of PLAINTEXT_LENGTH. This is a no-op but makes it fairly easy to enable the support for longer passwords and rebuild.

b) I could create a separate NT-long-opencl (using same files but two format structs).

I'm not quite sure which way to go - what do you think?

BTW I also played with the idea to actually store two sets of binaries, with and without reversed steps, and really try to minimize overhead. But I think it's overkill - passwords this long are definitely used but they are not very common so they can be attacked separately anyway.

The text was updated successfully, but these errors were encountered:

magnumripper · 2023-03-09T19:24:29Z

BTW I also played with the idea to actually store two sets of binaries, with and without reversed steps, and really try to minimize overhead. But I think it's overkill - passwords this long are definitely used but they are not very common so they can be attacked separately anyway.

I just now realized we can use 64 bits of non-reversed binary and 64 bits of reversed, for the hot code... so it could actually be done without storing double binaries. Maybe I'll try it just for kicks.

solardiz · 2023-03-09T19:27:00Z

@magnumripper This is great news.

I'm not quite sure which way to go - what do you think?

I think it's better to have the separate NT-long-opencl format than to have the user edit files back and forth.

passwords this long are definitely used but they are not very common so they can be attacked separately anyway.

Actually, for ease of use and lower risk/rate of user errors, it'd be great if passwords of all lengths could be attacked at once without much efficiency loss when testing short candidate passwords. So I think your work on this wasn't overkill. However, this gets more complicated under the hood, which means more effort to finish this work and potentially more bugs left?

so it could actually be done without storing double binaries. Maybe I'll try it just for kicks.

That would be great. I was just going to say that a drawback would be memory usage when loading a huge number of NT hashes, which is a realistic concern given HIBP. Also, higher memory usage means lower cache hit rate, so could hurt performance.

solardiz · 2023-03-09T19:28:43Z

I think we can get this into the tree in two steps/PRs: first add the new format, then arrive at a unified format replacing the two. I think we want to have the two-format approach preserved in our git history e.g. for later troubleshooting or recommending to people in case the more complicated unified format ever fails.

magnumripper · 2023-03-09T19:30:17Z

On a similar note, md5crypt (both CPU and OpenCL) is really in need for longer support. I looked at it several times but can't understand the optimized code! >.<

solardiz · 2023-03-09T19:37:43Z

On a similar note, md5crypt (both CPU and OpenCL) is really in need for longer support.

Yes. We do have md5crypt-long on CPU, but it lacks SIMD support. It is definitely possible to have something with medium performance by using SIMD yet supporting longer than 15.

Curiously, md5crypt-long is now above 1M c/s on AWS c6a.48xlarge... but the non-long one is above 10M there.

magnumripper · 2023-03-09T19:50:22Z

I should probably benchmark support for even longer passwords, the change in code would be absolutely minimal. But it would obviously increase register pressure. Perhaps that could be left as supported by the code but not enabled by default. So we'd ship nt-long-opencl with length 59 but bumpable all the way to 125 (or 123 which is at even block length with no extra empty block).

1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. Closes openwall#5245

1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

magnumripper · 2023-03-10T00:07:05Z

Two separate formats done now. ~~Less than 5% speed penalty.~~

$ ../run/john -test -form:nt-*opencl 
Device 2: NVIDIA GeForce RTX 2080 Ti
Benchmarking: NT-opencl [MD4 OpenCL/mask accel]... LWS=256 GWS=69632 (272 blocks) x22308 DONE
Raw:    32458M c/s real, 32297M c/s virtual, Dev#2 util: 98%

Benchmarking: NT-long-opencl [MD4 OpenCL/mask accel]... LWS=256 GWS=139264 (544 blocks) x22308 DONE
Raw:    30309M c/s real, 30162M c/s virtual, Dev#2 util: 98%

2 formats benchmarked.

$ bc -l <<< 3030900/32458
93.37913611436317702877

The kernel duration is 7 ms longer at a KPC of about 3 billion (accelerated):

key xfer: 177.280 us, idx xfer: 89.728 us, crypt: 90.052 ms, res xfer: 1.984 us
gws:    139264   34395M c/s 34395995110 rounds/s   90.321 ms per crypt_all()

vs.

key xfer: 177.120 us, idx xfer: 89.728 us, crypt: 97.105 ms, res xfer: 1.984 us
gws:    139264   31904M c/s 31904807850 rounds/s   97.374 ms per crypt_all()+

magnumripper · 2023-03-10T11:47:31Z

Going back to a single format with full speed at single-block is tricky. A quick and dirty way to do it is to sacrifice speed for multi-block crypts in order to keep it for single-block ones, by adding a "bogus" md4_reverse() after the multi-block crypt (same stuff as is done in binary()). This way we don't need to touch anything else. Then the only penalty for single-block crypts is the extra branch needed on password length.

I think if 32 bits of result would suffice we could be fine, but with mask mode on a good GPU we'd dwelve too far in the compare function and produce possible (false) hits on every crypt, that had to be sorted in cmp_exact - we'll lose a lot of speed. So we need 64 bits. Actually the current code seem to use all 128 bits, I can't see why. I think we could speed it up further by just doing 64 bits, dropping two more steps in the end of the MD4! We'd also get half the memory footprint for millions of hashes... Perhaps I should start with focusing on that, ignoring nt-long-opencl in the meantime.

magnumripper · 2023-03-10T11:54:19Z

Then the only penalty for single-block crypts is the extra branch needed on password length.

We could have two separate kernels, and move that decision to host side. This way there'd be NO penalty at all.

solardiz · 2023-03-10T14:24:30Z

$ bc -l <<< 3030900/31824

I don't see where 31824 came from.

I think we could speed it up further by just doing 64 bits, dropping two more steps in the end of the MD4! We'd also get half the memory footprint for millions of hashes... Perhaps I should start with focusing on that, ignoring nt-long-opencl in the meantime.

Makes sense, but I think you had simple enough changes implementing nt-opencl-long already that we should keep that code version at some commit in our tree as well.

magnumripper · 2023-03-10T15:09:00Z

$ bc -l <<< 3030900/31824

I don't see where 31824 came from.

Oh, a copy-paste error. I ran a lot of tests and if I compare fastest against fastest, the difference is about 5% I think. But it doesn't matter, I'll improve it anyway.

I think we could speed it up further by just doing 64 bits, dropping two more steps in the end of the MD4! We'd also get half the memory footprint for millions of hashes... Perhaps I should start with focusing on that, ignoring nt-long-opencl in the meantime.

Doing the above is a major change, changing all the bitmap and 'perfect hash table' stuff. We'll see how insane it gets.

Makes sense, but I think you had simple enough changes implementing nt-opencl-long already that we should keep that code version at some commit in our tree as well.

Yes I think the first two commits will stay more or less as is, just adding commits.

solardiz · 2023-03-10T15:21:20Z

Then the only penalty for single-block crypts is the extra branch needed on password length.

We could have two separate kernels, and move that decision to host side. This way there'd be NO penalty at all.

But the decision is per candidate password length, whereas GWS for a single kernel invocation is huge. By having the decision in the kernel, we can have it vary between groups of 64 candidates tested (post-mask).

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

magnumripper · 2023-03-13T00:04:42Z

Current bleeding:

ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'nt' for 'sm_75'
ptxas info    : Function properties for nt
ptxas         .     56 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 16388 bytes smem, 432 bytes cmem[0], 8 bytes cmem[2]
binary size 22012

Current #5246:

ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'nt' for 'sm_75'
ptxas info    : Function properties for nt
ptxas         .     312 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 50 registers, 16388 bytes smem, 432 bytes cmem[0], 8 bytes cmem[2]
binary size 44381

Current #5250:

ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'nt' for 'sm_75'
ptxas info    : Function properties for nt
ptxas         .     312 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 44 registers, 16388 bytes smem, 432 bytes cmem[0], 8 bytes cmem[2]
binary size 52217

Apparently "stack frame" is simply the plaintext buffer (after on-device UTF-16 conversion), it follows its exact size. And I seem to have somehow shaved 2 registers in #5250 compared to bleeding.

magnumripper · 2023-03-13T00:47:19Z

(Edited due to git mistake)

Yet another branch singlebranch #5251 where I get away with a single branch in the very end of the function:

ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'nt' for 'sm_75'
ptxas info    : Function properties for nt
ptxas         .     320 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 44 registers, 16388 bytes smem, 432 bytes cmem[0], 8 bytes cmem[2]
binary size 43711

magnumripper · 2023-03-13T01:07:01Z

That last alternative seems to be the winner. The current version lose 6.5% performance on AMD and less than 3% on nvidia compared to bleeding-jumbo, with 100% single-block crypts. I'm surprised the price is so high for some code never processed but anyway I'll continue from there.

solardiz · 2023-03-13T02:44:48Z

The current version lose 6.5% performance on AMD and less than 3% on nvidia compared to bleeding-jumbo, with 100% single-block crypts. I'm surprised the price is so high for some code never processed but anyway I'll continue from there.

I guess this price is for the reads at index 14 and 15, which were previously a register and a constant 0, but are now part of an array spilled to global memory. Maybe it'll be cheaper to pre-check the length and avoid those reads for <=27 (store the right values in two registers, and then use those).

Maybe I was wrong in suggesting you try to reduce the divergence. This was a nice exercise, but those percentages do feel high as a price for rarely-used functionality. Maybe the 3x+ (vs. 2x+) hit with divergence was more acceptable. However, as I recall you also had unacceptable hit for performance on AMD there? Maybe simply adding an unlikely() to there would do the trick?

magnumripper · 2023-03-13T15:43:33Z

The current version lose 6.5% performance on AMD and less than 3% on nvidia compared to bleeding-jumbo, with 100% single-block crypts. I'm surprised the price is so high for some code never processed but anyway I'll continue from there.

I guess this price is for the reads at index 14 and 15, which were previously a register and a constant 0, but are now part of an array spilled to global memory. Maybe it'll be cheaper to pre-check the length and avoid those reads for <=27 (store the right values in two registers, and then use those).

I'll keep on experimenting. Meanwhile I tried using local memory for the plaintext buffer which got rid of the register spill, but then LWS was limited to 64 and speed dropped on 2080ti with 75%, to below 7500M instead of over 30G. We can go higher than LWS=64 with a max length of 59, but speed didn't increase at all. So I'm abandoning that idea.

ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'nt' for 'sm_75'
ptxas info    : Function properties for nt
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 36868 bytes smem, 432 bytes cmem[0], 16 bytes cmem[2]
binary size 36130

Using local we can no longer use an initializer so need to zero the buffer. I would have thought it to be faster anyway, at least with mask acceleration where we amortize the zeroing in the x22308 loop (BTW I only zero as much as actually needed) but the zeroing isn't even the problem! Not sure what is - it just isn't fast.

Maybe I was wrong in suggesting you try to reduce the divergence. This was a nice exercise, but those percentages do feel high as a price for rarely-used functionality. Maybe the 3x+ (vs. 2x+) hit with divergence was more acceptable. However, as I recall you also had unacceptable hit for performance on AMD there? Maybe simply adding an unlikely() to there would do the trick?

The nvidia speed is very similar among the various versions (although some would diverge more with varying lengths). The main problem is the AMD, it only runs well with the last version, and not perfect at that either. I used unlikely() in all versions.

Also, I'm not quite happy we don't know the AMD speed with recent drivers. It would be sad to ditch good stuff because we only test it on a driver from 2019.

magnumripper · 2023-03-13T16:11:47Z

On a side note I noticed a strange thing: If I replace the (added in while loop) single-line macro MD4(uint, nt_buffer, hash) (from opencl_md4.h) with Sayantan's many-lines version, it "should" end up exactly the same: Both use the same MD4_x() basic functions and the single-line MD4 macro should expand to identical code as each line in the other version. Yet, without the single-line macro: binary size 53963. With it: binary size 54406. Difference 443 bytes. From what!?

I can spot one single difference: the one-line macro protects (x) with parens here:

#define MD4STEP(f, a, b, c, d, x, s)  \
	(a) += f((b), (c), (d)) + (x); \
	    (a) = rotate((a), (uint)(s))

Theoretically that could be the reason for ending up slightly different so I tried removing it but it made no difference. For the life of me I can't see why there would be a difference in binary size of 443 bytes. I looked a little at the corresponding PTX files but didn't get wiser (except I could confirm the difference is not debug comments or some such).

solardiz · 2023-03-13T16:49:48Z

With all versions of your changes, nt_buffer becomes much larger, right? Maybe that's what causes the slowdown on AMD, e.g. through affecting whether it's allocated in registers or on stack, or/and through the placement of its hot (first/only limb) portions in caches (which become non-contiguous). Maybe you can split this array in two - for first/only limb and the rest? I don't know if that's a sufficient hint to the compiler or not. Is there a way to give an explicit hint that one of the buffers is hot and the other is cold? Maybe by using the latter only inside if (unlikely()) blocks. I also don't know if those decisions by the compiler are sometimes per-variable or maybe effectively only per-element anyway (if so, my suggested changes wouldn't make a difference). In fact, since compilers typically use SSA, the decisions are probably even finer-grained than per-element - they may vary between element written values' lifetimes - but maybe there's some higher-level layer at play as well.

solardiz · 2023-03-13T16:55:12Z

BTW, we also still have this pending MD4 G() optimization: #4727 (comment)

magnumripper · 2023-03-13T17:36:28Z

With all versions of your changes, nt_buffer becomes much larger, right? Maybe that's what causes the slowdown on AMD

Yes, for length 59 it grows from 56 bytes to 128, and for length 125 it becomes 320 bytes. And you're probably right: Indeed the AMD reacts well on limiting the new code to length 59 (~2.7% performance drop instead of ~6.4%) whereas on nvidia it doesn't matter the slightest.

Maybe you can split this array in two - for first/only limb and the rest?

I can't see any obvious way to do so that wouldn't introduce complexity. I'll ponder it though.

BTW, we also still have this pending MD4 G() optimization

I should get that in while at it, but it shouldn't matter for nvidia (using lut3) nor AMD (using bitselect). OTOH I've had the idea to revert to basic macros (neither lut3 nor bitselect) and see if the current nvidia optimizer can do better nowadays. I'm pretty sure someone (you?) said that inline assembly can ruin things for the optimizers and I can picture why.

magnumripper · 2023-03-13T17:58:39Z

I've had the idea to revert to basic macros (neither lut3 nor bitselect) and see if the current nvidia optimizer can do better nowadays.

Just tried this (with current bleeding-jumbo), no change in speed. Then changed to that optimization in #4727 and still no change in speed, or perhaps a sub percent drop.

magnumripper · 2023-03-13T20:58:52Z

I guess this price is for the reads at index 14 and 15, which were previously a register and a constant 0, but are now part of an array spilled to global memory. Maybe it'll be cheaper to pre-check the length and avoid those reads for <=27 (store the right values in two registers, and then use those).

I tried your suggestion and it seemed to help a little. AMD regression now is "only" 1.8% (best of five runs) or 1.5% (average of same five runs) as long as I also limit it to max. length of 59. EDIT I dropped that idea later on, after first trying to do it with all 16 elements of (one MD block of) nt_buffer, which had no effect at all on nvidia (I hoped it would), and also confirmed with more tests that neither change had any effect on AMD.

I really thought the "original code last" version would solve all problems. It didn't need tricks like the above - the original code works fine for the last block. Sure, we had two branches instead of one, but the impact on AMD was silly. I think I'll revisit that branch and stare at it a little.

magnumripper · 2023-03-15T14:54:53Z

I found the solution to any performance regression. Like I said, I noticed Sayantan used the full 128-bit binary - which not only wastes memory and hash table/bitmap space (and hot compare code!), but also requires two more complete MD4 steps. Even with the birthday paradox against us, we ought to do fine with 64 bits.

Fixing that wasn't trivial, but here's single-hash performance on 2080ti now (normal length 27 version):

Device 2: NVIDIA GeForce RTX 2080 Ti
Loaded 1 password hash (NT-opencl [MD4 OpenCL])
LWS=128 GWS=139264 (1088 blocks) x22308 
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
MAgN11!8         (test-MAgN11!8)     
1g 0:00:00:02 DONE (2023-03-15 15:32) 0.3355g/s 51260Mp/s 51260Mc/s 9223372TC/s Dev#2:50°C util:98% MAgN11!8
No remaining hashes

Current bleeding-jumbo consistently does 33655Mp/s so the above is a 50% boost. EDIT While the above is repeatable, it seems to be inflated because of the very short run - probably using max. GPU clocking.

Now hashcat is 50% faster than that at the exact same task, at 75866.9 MH/s, but it has optimizations for single-hash that we never bothered with (although I'm tempted to implement it in nt-opencl). And already at attacking two hashes (of which one is faked and uncrackable), we beat hashcat with 40265Mp/s against 35746.9 MH/s.

On a side note, hashcat benchmark claims 93781.9 MH/s. Not sure how you'd get that figure in real cracking when their real speed attacking a single hash with mask is just 75866.9 MH/s?

My current changes can easily be adopted to raw MD4/MD5 formats, they use the same shared code. And there are seven other format that can be fixed as well, but they don't (yet) share any code.

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

This is well tested code in other formats. About 10% boost on 2080ti, against 5300 hashes and pure wordlist, no mask. Also adds an entry in doc/NEWS. Closes openwall#5245.

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

This is well tested code in other formats. About 10% boost on 2080ti, against 5300 hashes and pure wordlist, no mask. Also adds an entry in doc/NEWS. Closes openwall#5245.

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

This is well tested code in other formats. About 10% boost on 2080ti, against 5300 hashes and pure wordlist, no mask. Also adds an entry in doc/NEWS. Closes openwall#5245.

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

This is well tested code in other formats. About 10% boost on 2080ti, against 5300 hashes and pure wordlist, no mask. Also adds an entry in doc/NEWS. Closes openwall#5245.

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See openwall#5245

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

This way we still got optimal speed for the original nt-opencl format that support lengths up to 27. 1 block of MD4 is up to 27 characters. 2 blocks is 59, 3 is 91, 4 is 123 and 5 is 125 (due to core max). As long as we bump it over 27 we don't seem to gain any speed by limiting it to less than 125 characters. See #5245

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes #5245

magnumripper added the enhancement label Mar 9, 2023

magnumripper self-assigned this Mar 9, 2023

magnumripper mentioned this issue Mar 9, 2023

NT-long OpenCL: Support lengths up to 125 #5246

Merged

magnumripper mentioned this issue Mar 12, 2023

Alternative solution nt kernel for long plaintexts #5250

Closed

solardiz added this to the Definitely 2.0.0 milestone Mar 14, 2023

magnumripper added a commit to magnumripper/john that referenced this issue Mar 21, 2023

NT-opencl: 64-bit binary size

7cbf377

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 21, 2023

NT-opencl: 64-bit binary size

a81bea8

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 23, 2023

NT-opencl: 64-bit binary size

b1c986d

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 23, 2023

NT-opencl: 64-bit binary size

b222c35

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 25, 2023

NT-opencl: 64-bit binary size

2558ebf

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 28, 2023

NT-opencl: 64-bit binary size

0a6956c

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 28, 2023

NT-opencl: 64-bit binary size

2044748

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 28, 2023

NT-opencl: 64-bit binary size

9267abb

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 28, 2023

NT-opencl: 64-bit binary size

3e57fb0

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 28, 2023

NT-opencl: 64-bit binary size

45d6a5c

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper added a commit to magnumripper/john that referenced this issue Mar 31, 2023

NT-opencl: 64-bit binary size

0c253e7

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes openwall#5245

magnumripper closed this as completed in #5246 Mar 31, 2023

magnumripper added a commit that referenced this issue Mar 31, 2023

NT-opencl: 64-bit binary size

6ed33a7

Not only do we save memory, we can reverse much more as well, and reject early. We check the remaining bits in cold host code, for good measure. Closes #5245

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NT-long-opencl #5245

NT-long-opencl #5245

magnumripper commented Mar 9, 2023 •

edited

Loading

magnumripper commented Mar 9, 2023

solardiz commented Mar 9, 2023

solardiz commented Mar 9, 2023

magnumripper commented Mar 9, 2023

solardiz commented Mar 9, 2023

magnumripper commented Mar 9, 2023

magnumripper commented Mar 10, 2023 •

edited

Loading

magnumripper commented Mar 10, 2023

magnumripper commented Mar 10, 2023

solardiz commented Mar 10, 2023

magnumripper commented Mar 10, 2023

solardiz commented Mar 10, 2023

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023

solardiz commented Mar 13, 2023

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023

solardiz commented Mar 13, 2023

solardiz commented Mar 13, 2023

magnumripper commented Mar 13, 2023

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 15, 2023 •

edited

Loading

NT-long-opencl #5245

NT-long-opencl #5245

Comments

magnumripper commented Mar 9, 2023 • edited Loading

magnumripper commented Mar 9, 2023

solardiz commented Mar 9, 2023

solardiz commented Mar 9, 2023

magnumripper commented Mar 9, 2023

solardiz commented Mar 9, 2023

magnumripper commented Mar 9, 2023

magnumripper commented Mar 10, 2023 • edited Loading

magnumripper commented Mar 10, 2023

magnumripper commented Mar 10, 2023

solardiz commented Mar 10, 2023

magnumripper commented Mar 10, 2023

solardiz commented Mar 10, 2023

magnumripper commented Mar 13, 2023 • edited Loading

magnumripper commented Mar 13, 2023 • edited Loading

magnumripper commented Mar 13, 2023

solardiz commented Mar 13, 2023

magnumripper commented Mar 13, 2023 • edited Loading

magnumripper commented Mar 13, 2023

solardiz commented Mar 13, 2023

solardiz commented Mar 13, 2023

magnumripper commented Mar 13, 2023

magnumripper commented Mar 13, 2023 • edited Loading

magnumripper commented Mar 13, 2023 • edited Loading

magnumripper commented Mar 15, 2023 • edited Loading

magnumripper commented Mar 9, 2023 •

edited

Loading

magnumripper commented Mar 10, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 13, 2023 •

edited

Loading

magnumripper commented Mar 15, 2023 •

edited

Loading