Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple calls to clCompileProgram/clLinkProgram cause crash #560

Closed
kurtzmarc opened this issue May 21, 2020 · 7 comments
Closed

Multiple calls to clCompileProgram/clLinkProgram cause crash #560

kurtzmarc opened this issue May 21, 2020 · 7 comments

Comments

@kurtzmarc
Copy link

Environment

  • LWJGL version: 3.2.1
  • LWJGL build #: release
  • Java version: 1.8
  • Platform: Windows x64
  • Module: opencl

Description

We are using the OpenCL bindings to build and run OpenCL kernels. Instead of using the one-step clBuildProgram() call, we are using the two-step process where we call clCompileProgram() followed by clLinkProgram() to build our kernel. If we do this twice in a row, we get a crash on the second clLinkProgram call. I am attaching an isolated test case and a crash log. One very odd feature of this bug is that it seems to be affected by the Java stack. If I add random int variables, the crash will trigger/not-trigger. So you may need to add/remove random int variables depending on your system. I have tried it on four different OpenCL devices across two different Windows systems and found this bug to occur on all.

package org.test;

import org.junit.Test;
import org.lwjgl.PointerBuffer;
import org.lwjgl.system.MemoryStack;

import java.nio.IntBuffer;

import static org.lwjgl.opencl.CL10.*;
import static org.lwjgl.opencl.CL12.clCompileProgram;
import static org.lwjgl.opencl.CL12.clLinkProgram;
import static org.lwjgl.system.MemoryStack.stackPush;

public class CompileLinkBugTest
{
    @Test
    public void compileLinkBug()
    {
        try (MemoryStack stack = stackPush()) {
            PointerBuffer platforms = stack.mallocPointer(1);
            clGetPlatformIDs(platforms, (int[]) null);
            long platformPtr = platforms.get(0);

            PointerBuffer devices = stack.mallocPointer(1);
            clGetDeviceIDs(platformPtr, CL_DEVICE_TYPE_ALL, devices, (int[]) null);
            long devicePtr = devices.get(0);

            IntBuffer errcode_ret = stack.callocInt(1);

            PointerBuffer propeties = null;
            long contextPtr = clCreateContext(propeties, devicePtr, null, 0, errcode_ret);

            createProgramPtr(contextPtr, devicePtr, "__kernel void a() { return; }");
            createProgramPtr(contextPtr, devicePtr, "__kernel void b() { return; }");
        }
    }

    private long createProgramPtr(long contextPtr, long devicePtr, String src)
    {
        try (MemoryStack stack = stackPush()) {
            //int a = 0;  // NOTE - you may need to uncomment this to trigger crash
            if (contextPtr == 0) {
                throw new RuntimeException("Device is not loaded.");
            }
            long ptr = clCreateProgramWithSource(contextPtr, src, null);

            PointerBuffer deviceList = stack.pointers(devicePtr);

            clCompileProgram(ptr, deviceList, "-cl-std=CL2.0", null, null, null, 0);
            ptr = clLinkProgram(contextPtr, deviceList, "", ptr, null, 0);
            if (ptr == 0) {
                throw new RuntimeException("CL Linking failed\n");
            }
            return ptr;
        }
    }
}

hs_err_pid17432.log

Tested on:

Name: AMD Accelerated Parallel Processing 
Vendor: Advanced Micro Devices, Inc. 
Version: OpenCL 2.1 AMD-APP (2906.10) 
     Name: Ellesmere 
     Vendor: Advanced Micro Devices, Inc. 
     Version: OpenCL 2.0 AMD-APP (2906.10) 
     Device type: GPU

Name: NVIDIA CUDA 
Vendor: NVIDIA Corporation 
Version: OpenCL 1.2 CUDA 10.2.108 
     Name: Quadro M2000M 
     Vendor: NVIDIA Corporation 
     Version: OpenCL 1.2 CUDA 
     Device type: GPU

Name: Intel(R) OpenCL 
Vendor: Intel(R) Corporation 
Version: OpenCL 2.0  
     Name: Intel(R) HD Graphics P530 
     Vendor: Intel(R) Corporation 
     Version: OpenCL 2.0  
     Device type: GPU

     Name: Intel(R) Xeon(R) CPU E3-1535M v5 @ 2.90GHz 
     Vendor: Intel(R) Corporation 
     Version: OpenCL 2.0 (Build 10) 
     Device type: CPU
@Spasi
Copy link
Member

Spasi commented May 22, 2020

Thank you for reporting this!

@kurtzmarc
Copy link
Author

Thanks for the fix! Is there a date set for the next release?
Also does this same problem exist with clCreateSubDevices?

@httpdigest
Copy link
Member

When I look at the Khronos specification for clCreateSubDevices and then at the LWJGL3 method, I don't see any differences. Both signatures match.

@kurtzmarc
Copy link
Author

kurtzmarc commented Jun 1, 2020

You are correct. This appears to be a discrepancy just within the Khronos specs which mention:

Otherwise, it returns a NULL value with the following error values returned in errcode_ret

However there is no errcode_ret parameter or way to get an error code from clCreateSubDevices as far as I can see. It followed a similar pattern to what I saw with clBuildProgram so I wrongly assumed it had the same issue.

@httpdigest
Copy link
Member

Oh indeed! I didn't notice that, thanks for pointing out. That actually is a bug in their documentation.

mikailgedik added a commit to Illuminai/RayX that referenced this issue Dec 16, 2020
[*] Changed naming convention
[!] clLinkProgram crashes JVM on Windows: test further (see LWJGL/lwjgl3#560)
@mikailgedik
Copy link

Has the issue been resolved? This one's critical for me since this bug is a death note for my program

@Spasi
Copy link
Member

Spasi commented Jan 14, 2021

Yes, the fix is available in the 3.2.4 snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants