Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metal library: fail to serialize on new Macos Sonoma 14.1 #580

Open
ductm104 opened this issue Nov 30, 2023 · 11 comments
Open

Metal library: fail to serialize on new Macos Sonoma 14.1 #580

ductm104 opened this issue Nov 30, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@ductm104
Copy link

Describe the bug
I'm using Macbook Air m1 (base 8gb) with Macos Sonoma 14.1.1, python 3.11 (conda)
I have been trying to manually compile from source and seriallize it to use latter but got this error:
MTLLibrary is not formatted as a MetalLib file.

To Reproduce

import Metal, Cocoa, libdispatch
prg = """ 
#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data,
               uint3 gid [[threadgroup_position_in_grid]],
               uint3 lid [[thread_position_in_threadgroup]])
{
    *(data) = 10;
}
"""
device = Metal.MTLCreateSystemDefaultDevice()
options = Metal.MTLCompileOptions.new()
lib, err = device.newLibraryWithSource_options_error_(prg, options, None)
print(lib)
print(lib.functionNames())
ret = lib.serializeToURL_error_('./mylib.metallib', None)
print(ret)

Expected behavior

<_MTLLibrary: 0x600003b9af80>
    label = <none> 
    device = <AGXG13GDevice: 0x11e819e00>
        name = Apple M1 
    functionNames: E_
(
    "E_"
)
(False, Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.})

Additional context
With the same program, I successfully compiled using it with Xcode by:

xcrun -sdk macosx metal -x metal -c src.c -o lib.air
xcrun -sdk macosx metallib -o lib.metallib lib.air

It seems like there is a problem with new macos version on macbook air which causes the output binary library to have different format as compared with Xcode. Since I can load a Xcode-compiled metallib but not with PyobjMetal.

The code bellow ran successfully with above `lib.metallib'

fname = 'lib.metallib'
xlib, _ = device.newLibraryWithURL_error_(fname, None)
content = xlib.libraryDataContents().bytes().tobytes()
data = libdispatch.dispatch_data_create(content, len(content), None, None)
newlib, _ = device.newLibraryWithData_error_(data, None)
fxn = newlib.newFunctionWithName_('E_')
print(fxn)

The expected output should be:

<_MTLFunctionInternal: 0x14463f1b0>
    name = E_ 
    device = <AGXG13GDevice: 0x124808a00>
        name = Apple M1 
    functionType = MTLFunctionTypeKernel 
    attributes: <none>
@ductm104 ductm104 added the bug Something isn't working label Nov 30, 2023
@tomtom-95
Copy link

I have the same problem. @ductm104 Did you find a solution?

@tyoc213
Copy link

tyoc213 commented Jan 13, 2024

@ductm104 how did you fixed it? or ended up using

xcrun -sdk macosx metal -x metal -c src.c -o lib.air
xcrun -sdk macosx metallib -o lib.metallib lib.air

???

@ductm104
Copy link
Author

@tyoc213 I didn't. The issue hasn't been fixed.

@ronaldoussoren
Copy link
Owner

I don't get these errors with the code below, which is combines the code below, adds more printing and uses URLs where necessary.

import Metal, Cocoa, libdispatch
prg = """
#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data,
               uint3 gid [[threadgroup_position_in_grid]],
               uint3 lid [[thread_position_in_threadgroup]])
{
    *(data) = 10;
}
"""
device = Metal.MTLCreateSystemDefaultDevice()
options = Metal.MTLCompileOptions.new()
lib, err = device.newLibraryWithSource_options_error_(prg, options, None)
print(f"newLibrary {lib=} {err=}")
print(lib.functionNames())
ret, error = lib.serializeToURL_error_(Cocoa.NSURL.fileURLWithPath_('./mylib.metallib'), None)
print(f"serialize {ret=} {error=}")

fname = Cocoa.NSURL.fileURLWithPath_('mylib.metallib')
xlib, error = device.newLibraryWithURL_error_(fname, None)
print(f"newLibrary {xlib=}, {error=}")
content = xlib.libraryDataContents().bytes().tobytes()
data = libdispatch.dispatch_data_create(content, len(content), None, None)
newlib, error = device.newLibraryWithData_error_(data, None)
print(f"newWithData {newlib=}, {error=}")
fxn = newlib.newFunctionWithName_('E_')
print(f"{fxn=}")

Running this prints:

newLibrary lib=<_MTLLibrary: 0x600001eaa1c0>
    label = <none> 
    device = <AGXG13GDevice: 0x14b040000>
        name = Apple M1 
    functionNames: E_ err=None
(
    "E_"
)
serialize ret=True error=None
newLibrary xlib=<_MTLLibrary: 0x600001eaa540>
    label = <none> 
    device = <AGXG13GDevice: 0x14b040000>
        name = Apple M1 
    functionNames: E_, error=None
newWithData newlib=<_MTLLibrary: 0x600001eaa600>
    label = <none> 
    device = <AGXG13GDevice: 0x14b040000>
        name = Apple M1 
    functionNames: E_, error=None
fxn=<_MTLFunctionInternal: 0x14b80b9a0>
    name = E_ 
    device = <AGXG13GDevice: 0x14b040000>
        name = Apple M1 
    functionType = MTLFunctionTypeKernel 
    attributes: <none>

I don't use Metal myself, if this isn't expected behaviour I need some more information on what the expected behaviour is.

Output is on an M1 system running macOS 14.3.1 (23D60), using Python 3.12 and PyObjC from the repo (the latter shouldn't matter, there have been some updates in the repo but none that are relevant for this issue).

@pushpendre
Copy link

Weirdly enough when I run this same code I get the error that "MTLLibrary is not formatted as a MetalLib file"

(tg) [20:12:22]Mac:~/w/(02-22_06:16)
$ python -c 'import objc; print(objc.__version__)'
10.1
(tg) [20:13:26]Mac:~/w/(02-22_06:16)
$ sw_vers
ProductName:            macOS
ProductVersion:         14.3.1
BuildVersion:           23D60
(tg) [20:10:29]Mac:~/w/(02-22_06:16)
$ python ../tg/try_tensor.py 
newLibrary lib=<_MTLLibrary: 0x12f646b10>
    label = <none> 
    device = <AGXG13XDevice: 0x12f821a00>
        name = Apple M1 Pro 
    functionNames: E_ err=None
(
    "E_"
)
serialize ret=False error=Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.}
newLibrary xlib=None, error=Error Domain=MTLLibraryErrorDomain Code=6 "library not found" UserInfo={NSLocalizedDescription=library not found}
Traceback (most recent call last):
  File "/Users/pushpen/w/tinygrad/../tg/try_tensor.py", line 23, in <module>
    content = xlib.libraryDataContents().bytes().tobytes()
              ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'libraryDataContents'

@Leikoe
Copy link

Leikoe commented Mar 16, 2024

this bug is related to conda, it only happens when you use conda's python. though I have no idea why. @ronaldoussoren if you want to reproduce it.

@pushpendre
Copy link

Can confirm, I had to make some more changes to get this thing to work but the bug disappeared with the system python3. Specifically here were the steps

  1. Install XCode, if only command line tools are installed we might get an error while running xcrun
  2. confirm that xcrun can compile the c src
cat > src.c <<EOF
#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data,
               uint3 gid [[threadgroup_position_in_grid]],
               uint3 lid [[thread_position_in_threadgroup]])
{
    *(data) = 10;
}
EOF
xcrun -sdk macosx metal -x metal -c src.c -o lib.air && xcrun -sdk macosx metallib -o lib.metallib lib.air
  1. If the above is successful then make sure pyobjc is installed and the following command runs.
/usr/bin/python3  -m pip install pyobjc
/usr/bin/python3  -m pip install pyobjc-framework-Metal pyobjc-framework-libdispatch
/usr/bin/python3  -c 'from Foundation import NSURL'
  1. If we pass raw strings to the fileURLWithPath function then we get the error ValueError: NSInvalidArgumentException - -[OC_BuiltinPythonUnicode isFileURL]: unrecognized selector sent to instance 0x600002181e80 so instead of passing string pass the NSURL so I changed the code as follows
cat > try_tensor.py <<EOF
import Metal, Cocoa, libdispatch
from Foundation import NSURL

prg = """ 
#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data,
               uint3 gid [[threadgroup_position_in_grid]],
               uint3 lid [[thread_position_in_threadgroup]])
{
    *(data) = 10;
}
"""
device = Metal.MTLCreateSystemDefaultDevice()
options = Metal.MTLCompileOptions.new()
lib, err = device.newLibraryWithSource_options_error_(prg, options, None)
print(lib)
print(lib.functionNames())
file_url = NSURL.fileURLWithPath_('./mylib.metallib')
ret = lib.serializeToURL_error_(file_url, None)
print(ret)
EOF
/usr/bin/python3 try_tensor.py 
head -c 4 mylib.metallib 

The above shows a compiled file with the right MTLB header.

@ronaldoussoren
Copy link
Owner

this bug is related to conda, it only happens when you use conda's python. though I have no idea why. @ronaldoussoren if you want to reproduce it.

Do you also use the version of PyObjC packaged by Conda?

@Leikoe
Copy link

Leikoe commented Mar 18, 2024

this bug is related to conda, it only happens when you use conda's python. though I have no idea why. @ronaldoussoren if you want to reproduce it.

Do you also use the version of PyObjC packaged by Conda?

No, I used the pip one and even manually tried replacing it by hand in site-packages by the pip one from normal python, still gives the same problems

@Leikoe
Copy link

Leikoe commented Mar 18, 2024

You can track my debugging work at tinygrad/tinygrad#2226

@nullhook
Copy link

nullhook commented Mar 22, 2024

the metal driver and pyobjc doesn't seem like the issue here as the output is correct in non-virtual environments.

the main question is, why the returned binary data has header and footer of .metallib swallowed in conda environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants