Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build failed: aortion #178

Closed
colertyui opened this issue Nov 12, 2024 · 7 comments · Fixed by #182
Closed

build failed: aortion #178

colertyui opened this issue Nov 12, 2024 · 7 comments · Fixed by #182

Comments

@colertyui
Copy link

colertyui commented Nov 12, 2024

the operation runs normaly up until a certin part where it freezes and either crashes the terminal or reboots the entire system, happens by just running ./babs.sh -b on my machine.

[  0%] Generating venv/lib/python3.11/site-packages/triton.egg-link
cd /media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python && /usr/bin/cmake -E env VIRTUAL_ENV=/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv TRITON_USE_ROCM=ON ROCM_DEFAULT_DIR=/opt/rocm_sdk_612 MLIR_ENABLE_DUMP=1 LLVM_IR_ENABLE_DUMP=1 AMDGCN_ENABLE_DUMP=1 TRITON_BUILD_DIR=/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/triton_build /media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/bin/python setup.py develop
running develop
/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running egg_info
writing triton.egg-info/PKG-INFO
writing dependency_links to triton.egg-info/dependency_links.txt
writing entry points to triton.egg-info/entry_points.txt
writing requirements to triton.egg-info/requires.txt
writing top-level names to triton.egg-info/top_level.txt
reading manifest file 'triton.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'triton.egg-info/SOURCES.txt'
running build_ext
Re-run cmake no build system arguments
CMake Deprecation Warning at CMakeLists.txt:6 (cmake_policy):
  The OLD behavior for policy CMP0116 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- TRITON_USE_ROCM: ON
-- ROCM_DEFAULT_DIR: /opt/rocm_sdk_612
-- MLIR_ENABLE_DUMP: 1
-- LLVM_IR_ENABLE_DUMP: 1
-- AMDGCN_ENABLE_DUMP: 1
-- Adding Python module
-- Triton backends tuple: amd
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written to: /media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/build/cmake.linux-x86_64-cpython-3.11
Change Dir: '/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/build/cmake.linux-x86_64-cpython-3.11'

Run Build Command(s): /usr/bin/ninja -v -j 16
ninja: error: stat(/root/.triton/llvm/llvm-657ec732-ubuntu-x64/include/mlir/IR/AttrTypeBase.td): Permission denied

Traceback (most recent call last):
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 612, in <module>
    setup(
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 555, in run
    develop.run(self)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/command/develop.py", line 114, in install_for_development
    self.run_command('build_ext')
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 341, in run
    self.build_extension(ext)
  File "/media/phlq/HDD/rocm_sdk_builder/src_projects/aotriton/third_party/triton/python/setup.py", line 456, in build_extension
    subprocess.check_call(["cmake", "--build", "."] + build_args, cwd=cmake_dir)
  File "/opt/rocm_sdk_612/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'TritonRelBuildWithAsserts', '-j16']' returned non-zero exit status 1.
make[2]: *** [CMakeFiles/aotriton_venv_triton.dir/build.make:73: venv/lib/python3.11/site-packages/triton.egg-link] Error 1
make[2]: Leaving directory '/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton'
make[1]: *** [CMakeFiles/Makefile2:139: CMakeFiles/aotriton_venv_triton.dir/all] Error 2
make[1]: Leaving directory '/media/phlq/HDD/rocm_sdk_builder/builddir/038_aotriton'
make: *** [Makefile:136: all] Error 2
build failed: aotriton

I'm on Ubuntu Ubuntu 24.04.1 LTS, my hardware is:
ASUS TUF GAMING B550M-PLUS;
AMD Ryzen™ 7 5800X3D × 16;
AMD Radeon™ RX 5700;
I apologize in advance if Im being an idiot about something, I'm really new to coding in general, so Im trying my beast to learn from my mistakes, if the information porivided is insufficient, please, just tell me what else I need to provide and I will do so to the best of my capacities, I did try apt install libzstd-dev, didn`t change the outcome

@colertyui colertyui changed the title crash when using "./babs.sh -b" build failed: aortion Nov 12, 2024
@lamikr
Copy link
Owner

lamikr commented Nov 18, 2024

Hi, sorry for the delay and thank you for your report. I am wondering is your system running out of memory.
Do you have only the gfx1010 selected in your build_cfg.user file? Can you also tell how much RAM memory you have on your system?

@colertyui
Copy link
Author

colertyui commented Nov 18, 2024

Hi, sorry for the delay and thank you for your report. I am wondering is your system running out of memory.
Do you have only the gfx1010 selected in your build_cfg.user file? Can you also tell how much RAM memory you have on your system?

Hello, I only have the gfx1010 selected in my file, and I have one 16 gb 3000MHz, DDR4, CL16 stick of ram, hoped that would be enough, but if absolutely necessary, I can buy another one and run 32gb, I'm just working on a relatively tight budget, also, in case it is relevant, I'm storing all the code in a 1TB HDD, thank you very much for the assistance

@lamikr
Copy link
Owner

lamikr commented Nov 22, 2024

Can you try if it helps to set the MAX_JOBS variable to 8 cpus. Aotriton/triton will check if that is specified and then it should limit the build only to 8 from your 16 cpus.
So try to run these commands:

rm -rf builddir/038_aotriton
export MAX_JOBS=8
./babs.sh -b

I am not fully sure whether this helps. but at least the code in theory the
src_projects/aotriton/third_party/triton/python/setup.py checks if that is defined and then it should limit the cpu-usage to that number...

@colertyui
Copy link
Author

colertyui commented Nov 25, 2024

I'm sorry if this sounds, really dumb, I am a complete newbie as I said, but when I ran the command it just gave back "export: command not found" when I ran it with sudo, and "error: permission denied", I tried running them separately, and also tried running them all together in separate lines and once with them in the same line, which gave no output. It doesent matter how I write it, I get the same errors, and when running them exactly how you wrote it, ctrl c trl v ing it, I get the exact same error as before, build failed: aortion", Ill send the full code from the beggining to see if it helps with anything, this one is from my third or fourth attempt today
Untitled 1.odt

@colertyui
Copy link
Author

So yeah, I don't really know what I should do?

lamikr added a commit that referenced this issue Dec 19, 2024
patch aotriton to check MAX_JOBS environment variable
in aotriton v2src/CMakeLists.txt and use that for limiting the
amount of python processes allowed to build and compress hsaco files.

This fixes the out of memory problem on cases where computer has lot of
CPUs compared to amount of memory.
Note that this fix only works when using Ninja.
(cmake's limitation for add_custom_jobs command)

MAX_JOBS environment variable and force to use ninja for building
aotriton are set in binfo file.

fixes: #178

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Dec 19, 2024
patch aotriton to check MAX_JOBS environment variable
in aotriton v2src/CMakeLists.txt and use that for limiting the
amount of python processes allowed to build and compress hsaco files.

This fixes the out of memory problem on cases where computer has lot of
CPUs compared to amount of memory.
Note that this fix only works when using Ninja.
(cmake's limitation for add_custom_jobs command)

MAX_JOBS environment variable and force to use ninja for building
aotriton are set in binfo file.

fixes: #178

Signed-off-by: Mika Laitio <[email protected]>
@lamikr lamikr closed this as completed in a6bf4f2 Dec 19, 2024
@lamikr
Copy link
Owner

lamikr commented Dec 19, 2024

I noticed that Aotriton had actually some build process where the "MAX_JOBS" parameter did not work but I have now fixed that place, so it should be now possible to do the build by only having 16GB of memory.

About your problems:

  1. I am not sure why the "export MAX_JOBS=8" line is giving you error, if you run it on Linux terminal because that should just set the 8 as a value for that parameter.

  2. Then I noticed one weird line in your log
    "/root/.triton/llvm/llvm-657ec732-ubuntu-x64/include/mlir/IR/AttrTypeBase.td"

This indicated that it's trying to get files from root-users .triton directory. So I am wondering have you tried to make your build by accident as a root-user instead of regular user?

  1. It may be best to try to do clean build one more time just in case with these steps.
1) Clone the latest rocm sdk builder to new directory with commands:

git clone https://github.com/lamikr/rocm_sdk_builder.git

2) Clean first the old build first with command:

sudo rm -rf /opt/rocm_sdk_612

3) configure the rocm sdk to use rx 5700 as a GPU:
cd rocm_sdk_builder
./babs.sh -c 
(select the rx5700)

4) Do the build by using only 8 CPUs to save memory:
export MAX_JOBS=8
./babs.sh -b

@lamikr lamikr reopened this Dec 19, 2024
@lamikr
Copy link
Owner

lamikr commented Jan 11, 2025

I close this now as a fixed.

@lamikr lamikr closed this as completed Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants