Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing --force flag when using pocl? #2398

Closed
Kazhuu opened this issue May 12, 2020 · 8 comments
Closed

Removing --force flag when using pocl? #2398

Kazhuu opened this issue May 12, 2020 · 8 comments

Comments

@Kazhuu
Copy link

Kazhuu commented May 12, 2020

While I was working on following pocl issue: #282. I noticed that hashcat needs to run with --force flag when using pocl runtime. This seems not be the case with earlier versions when pocl runtime support was added. So I was wondering because pocl almost seems to provide same speed as Intel OpenCL runtime is this flag necessary?

Also as a side note I wanted to ask related the earlier issue. How hashcat determines when to use cached kernel and what format the cached kernel is in? I'm assuming in some compiled format before it's fed to the OpenCL runtime for device execution.

@philsmd
Copy link
Member

philsmd commented May 13, 2020

Hey, thanks for your interest.

I hope you do not mind me asking, but are you some core pocl developer etc (because you say you are working on pocl issues) ? This question is of course just out of curiosity and maybe just important for us to understand how to "classify" this github issue and overall the pocl problems/situation etc.

To answer your question, we recently pushed this commit: 008072e and made it mandatory to use --force with some non-proprietary drivers, because we got just a huge amount of problem reports on github and on the forum where users mentioned all there cracking problems with some drivers.

We fortunately have a self-testing mechanism, to avoid users running long cracking jobs, to later figure out that something is not working (e.g. not even a simple hash example with known password being cracked correctly, without false negatives or false positives).
This is called our self-test and when it fails you get this warning:

* Device #1: ATTENTION! OpenCL kernel self-test failed.

Your device driver installation is probably broken.
See also: https://hashcat.net/faq/wrongdriver

Aborting session due to kernel self-test failure.

pocl/mesa users unfortunately got this warning with a lot of hash types... therefore we decided to make it easier for the user to understand that the vendor driver (proprietary driver) should be in general installed to avoid some problems.

You could test some algorithms very quickly by just running this:

ATTENTION (for the general users): never use --force in general. it's dangerous !

hashcat -b --benchmark-all --force

Attention: the usage of --force is discouraged for the general user... only use --force if you are an experienced user or developer etc

Some drivers, unfortunately, do not only show false positives/negatives, but literally crash while compiling the kernels or report very, very, very strange (code) problems that make absolute no sense and we don't think that these are hashcat (or in general OpenCL) problems and we are unable to fix/workaround them. This is a little sad, but I think there is hope to improve things with newer open-source driver versions etc etc etc

I think for the caching part, @jsteube could probably explain it better, but we basically just compile the kernel files once (with the native driver API/mechansim) and keep the result of the compilation in the kernels/ folder (if compiled directly from github, not the installed version, otherwise it will be in ~/.hashcat/kernels/ or similar). This feature of course only is used to avoid compiling the kernel for each run or for each same device etc... we just use the result of the OpenCL compiler/driver. If you see any speed problems with those caches, I assume (and I am pretty confident) it's not a hashcat problem... because it's the drivers output that will be stored and re-used. That said, sometimes for devs it makes sense to get rid of the kernels/ folder, especially if you are messing around with the kernel code and changed something there... hashcat won't always automatically re-generate the caches if kernel files are modified (this is of course again a task that only do experts, devs... the general user shouldn't change the OpenCL kernel code).

I didn't intend with this post to get too negative about pocl, actually it worked quite well for some users and simple hash types in the past... but it's just annoying to get so negative and a huge amount of user reports (because pocl/mesa is sometimes installed by default) and hashcat users sometimes do not understand that it would at least make sense to consider installing or play around with the proprietary driver...

Again, it would be great if we get some good feedback here and maybe start some collaboration and/or try to improve things together with pocl devs etc... we are totally up to this task... but as you can see on the issue you linked above, pocl/pocl#292, it's unfortunately the case that the open-source driver developer have probably more important problems to fix and can't really focus (it's of course understandable ! do not get me wrong) on complex hash cracking/OpenCL code that crashes, or fails to compute correct results, or just fails to compile simple code that follows the OpenCL specs (and by the way works of course with mostly all other drivers, so we sometimes take it for granted that the code can't really be the problem, if it works on each and every other OpenCL driver/compiler, like NVIDIA OpenCL / ROCm / AMDGPU etc, yeah even with few changes with CUDA compilers).

It would be great if a pocl dev could spend some time trying to figure out if there is room for improvement by testing the hash types (and therefore kernel code) that fail with --benchmark-all etc.

We would really need and appreciate a collaboration and help each other to improve things. That would be fantastic.

Thank you so much

CC: @jsteube , @pjaaskel

P.S.: I know some parts of this post sound a little bit negative... I didn't intend to blame anybody and didn't intend to exaggerate etc... pocl is actually working great when it comes to normal desktop usage etc ... and I really like the project... it's just still kind of unusable for hashcat users because of some problems (and the user experience of hashcat is of course affected if hashcat users see self-test errors and compilers crashing or reporting some kernel code errors)

@pjaaskel
Copy link

I can answer on Mauri's behalf. He is working in my group to improve the kernel compiler performance and I asked him to start with these old issues as interesting performance related test cases.

POCL has reached such a maturity level that probably around 95% of the user reported cases have been actually problems in the application source code. There are a few known unfixed bugs, but not too many to my knowledge. Users tend to blame those issues on POCL since a vendor implementation or two just silently ignores the issue or just works as intended "by luck" (these thing happen with parallel programs).

Unfortunately, POCL is still mainly developed on the side of our group's research work (I'm happy to recently see more actual community activity from outside our group appearing though!) which is the reason we do not often have resources to spend days/weeks on issues, which lately have ended up being application side issues, since this is work that doesn't help our publication outcomes (we are measured by the academic papers we produced, and their quality).

Therefore, to get your possible POCL kernel compilation issues solved (crashes are great and usually easy to debug, miscompilations not so), the first step is to report them as issues in POCL with as a small reproducer as possible.

Thanks!

@philsmd
Copy link
Member

philsmd commented May 13, 2020

Thank you very much for you answer. It's actually very close to what I expected, a very positive answer indeed.

I think we could start with some simple examples, but I don't think we have them ready to report yet... there are some older issues here: https://github.com/hashcat/hashcat/issues?q=is%3Aissue+pocl+is%3Aclosed and e.g. something like this #2344

The problem in some cases could also be that some users use old pocl version (maybe for which some known problems are already fixed w/ newer version etc) shipped by their distro etc.... so we would probably need to make sure (before reporting some new issues on the pocl github repo) to test with the latest code etc... any instructions here ? I hope @jsteube can help us to come up with some easy examples of hash types/algorithms and OpenCL code that we are sure is correct and just doesn't work with pocl...

I think this could be a good start for a collaboration... because both of our goal is to improve things. that's great. thx

@pjaaskel
Copy link

Yeah, using an old version of pocl shipped by some old Debian distro or such is a common problem we and pocl's reputation suffer from :) The first Issue report response from me is usually: Did you try with the latest pocl master and the latest LLVM since both of them have stabilized over time.

@philsmd
Copy link
Member

philsmd commented May 14, 2020

I've a little update for you. I did run over night a full -b --benchmark-all and a full test.sh run of the current hashcat master branch and latest pocl and the results actually look pretty promising...
only a few "not found" (false negatives) and a few "not matching" (false positives)...

I will probably need to investigate these issues a little bit further, because it could easily be the case that some of them have to do with other hashcat-related things (althrough pretty unlikely, because we actually regularly run intensive testing, but mainly with NVIDIA/AMD drivers :( ).

I tested like this:

sudo apt install -y build-essential ocl-icd-libopencl1 cmake git pkg-config libclang-dev clang llvm make ninja-build ocl-icd-libopencl1 ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils

git clone https://github.com/pocl/pocl

cd pocl/
mkdir build/
cd build/
cmake ..
make
sudo make install


sudo cp pocl.icd /etc/OpenCL/vendors/

maybe you can give feedback for this (maybe it's not the recommended way, but I really hope so, it at least worked fine for me) or other users can use this to do similar tests with pocl.

again, I'm actually quite positively surprised to see only a few problems (that we still need to debug/troubleshoot further the next days).... and this is why I got an idea ...

Would it make sense in your opinion to get rid of the --force for pocl, but force some minimum version number... e.g. the user must have at least "pocl 1.6" installed and therefore the reported version of the OpenCL platform must include "pocl 1.6" or similar (if lower the user must still use --force ). Would this make sense ?

I also want to discuss this idea with @jsteube <- maybe you, @jsteube , can jump in here as well.

Thank you very much in the meantime

@pjaaskel
Copy link

That is a good way to install latest pocl except I am not sure if you get the latest LLVM supported by pocl to get its fixes in too from your apt repo.

Yes, a minimum version number check would be possibly better - a bit less hostile towards our dear pocl, but perhaps add an LLVM version check too, if you suspect a problem with an older version.

@jsteube
Copy link
Member

jsteube commented Jun 3, 2020

After some testing with more recent versions of POCL (1.5+, LLVM 9) we can see a much more stable handling so I have re-enabled POCL with this commit: 34f71aa

Also note that the performance warning (compared with intel driver) was removed as it seems that with latest version the speeds are equal. Hashcat still prefers the native opencl runtime and selects it automatically if both drivers are installed, but if only POCL is installed, it will use POCL. The user can also configure hashcat to use POCL if both drivers are installed by using the -d parameter.

After some more testing it seems that the most relevant algorithms (like the new -m 22000) which is what most KALI users are looking for there's a higher speed on POCL than with intel opencl runtime. It's possible that the intel opencl runtime is using an older LLVM, but this is something that needs more testing. For good benchmark testing there's a script tools/benchmark_deep.pl it also includes some system commands to disable turbo modes to make benchmark numbers more accurate. You can either run it as root or as user, but if you run it as user, copy/paste the system commands and run them as root.

@jsteube jsteube closed this as completed Jun 3, 2020
@dizcza
Copy link

dizcza commented Jun 12, 2020

I've manually built POCL v1.5 (LLVM 6.0.0) in Ubuntu 18.04 CPU-only VPS instance. I get

$ clinfo
Number of platforms                               1
  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.5, Debug+Asserts, LLVM 6.0.0, RELOC, SPIR, SLEEF, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-Common KVM processor
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x6c636f70
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-k6-3
  Driver Version                                  1.5
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               2
  Max clock frequency                             1795MHz
  Device Partition                                (core)
    Max number of sub-devices                     2
    Supported partition types                     equally, by counts
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
=== CL_PROGRAM_BUILD_LOG ===
error: unknown target CPU 'k6-3'
  Preferred work group size multiple              <getWGsizes:675: build program : error -11>
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              3102065664 (2.889GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16777216 (16MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            67108864 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               8388608 (8MiB)
  Max number of constant args                     8
  Max constant buffer size                        8388608 (8MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_spir cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Portable Computing Language
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [POCL]
  clCreateContext(NULL, ...) [default]            Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   pthread-Common KVM processor
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   pthread-Common KVM processor
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Portable Computing Language
    Device Name                                   pthread-Common KVM processor

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

I still receive "the driver is outdated" in hashcat message, so I add --force flag. But then I get the error with hashcat

$ hashcat -m2500 -b --force
hashcat (v5.1.0-1861-g323e7463) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

You have enabled --force to bypass dangerous warnings and errors!
This can hide serious problems and should only be done when debugging.
Do not report hashcat issues encountered when using --force.
Kernel /usr/local/share/hashcat/OpenCL/m02500-optimized.cl:
Optimized kernel requested but not needed - falling back to pure kernel

OpenCL API (OpenCL 1.2 pocl 1.5, Debug+Asserts, LLVM 6.0.0, RELOC, SPIR, SLEEF, POCL_DEBUG) - Platform #1 [The pocl project]
============================================================================================================================
* Device #1: pthread-Common KVM processor, 2894/2958 MB (1024 MB allocatable), 2MCU

Benchmark relevant options:
===========================
* --force
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

clBuildProgram(): CL_BUILD_PROGRAM_FAILURE

error: unknown target CPU 'k6-3'

* Device #1: Kernel /usr/local/share/hashcat/OpenCL/shared.cl build failed.

Started: Fri Jun 12 09:13:50 2020
Stopped: Fri Jun 12 09:13:54 2020

Note: installing pocl-opencl-icd (POCL v1.4) via apt works with hashcat.


Update

But it works with POCL v1.5 and LLVM 9!

$ hashcat -m2500 -b
hashcat (v5.1.0-1861-g323e7463) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

Kernel /usr/local/share/hashcat/OpenCL/m02500-optimized.cl:
Optimized kernel requested but not needed - falling back to pure kernel

OpenCL API (OpenCL 1.2 pocl 1.5, Debug+Asserts, LLVM 9.0.0, RELOC, SPIR, SLEEF, POCL_DEBUG) - Platform #1 [The pocl project]
============================================================================================================================
* Device #1: pthread-Common KVM processor, 2894/2958 MB (1024 MB allocatable), 2MCU

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 2500 - WPA-EAPOL-PBKDF2 (Iterations: 4095)

Speed.#1.........:      343 H/s (74.64ms) @ Accel:128 Loops:1024 Thr:1 Vec:4

Started: Fri Jun 12 09:38:30 2020
Stopped: Fri Jun 12 09:40:27 2020

Note that ubuntu 18.04 selects llvm-6 when installing it with apt install llvm. So the user needs to pick the llvm- and clang-related packages v9. Here is the final command I used to install the required packages in order to build POCL v1.5 with LLVM 9:

# apt install clang-9 libclang-9-dev build-essential ocl-icd-libopencl1 cmake git pkg-config make ninja-build ocl-icd-libopencl1 ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils -y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants