Skip to content

Improve the rockbuilder build logic and config options#795

Merged
lamikr merged 18 commits into
mainfrom
users/lamikr/wip/rockbuilder_update
Jun 17, 2025
Merged

Improve the rockbuilder build logic and config options#795
lamikr merged 18 commits into
mainfrom
users/lamikr/wip/rockbuilder_update

Conversation

@lamikr
Copy link
Copy Markdown
Contributor

@lamikr lamikr commented Jun 9, 2025

Updated rockbuilder logic and config options:

  • update the default rocm_home default seach path after the code was moved to another directory: ../../dist/build/rocm
  • ROCM_HOME env variable can be used to specify a different location for rocm-sdk to be used
  • use patches from externa-builds/pytorch/patches folder instead of forking it
  • support building pytorch, pytorch audio and vision also on windows without requiring a bash prompt
  • add option to specify environment variables
    • environment variables are set at the time when project phases are executed and then reset back to previous values or removed after the project has been build
      (env_common, env_linux, env_windows)
  • git am fix for hipify run on windows
    • Hipified patches could fail to apply with command "git am" if the hipify command was executed on dos-prompt instead of using a bash-prompt.
    • Reason for the failure was that the original hipify patches expected that they would be applied to files having Linux line ending. When hipify is executed on dos-prompt they wil however have a dos line ending and that caused the patched to file.
    • This is fixed with '--ignore-whitespace' flag for git am.
  • Update messages printed out during the build steps
  • Improve code checkout and hipify phases
  • Add separate/optional hipify_cmd to config files instead of using the config-command to execute the hipify because we need to do in the background do additional things during the hipify phase
    a. If hipify command is specified in config file
    - run hipify command to scan and convert files to hip/rocm format
    - commit code changes made by hipify command to own commit
    b. If patches/hipified directory exist
    - apply patches from that directory even if the hipify_cmd is not specified
  • Add optional pre_config and post_config command phases
  • Perform "git reset --hard" automatically before trying to execute the git checkout command.
    • This fixes git checkout failures in cases where the project directory contains modifications to files that have not been committed. This can happen for the files for example if the previously executed build command has modified them
  • Add error checks, error message printouts and process termination if the command execution failed
    • added support for commands having multiple lines
      • all commands executed can now contain from more than
        one line (earlier restricted for the build command)
  • Added support for specifying cmake config commands
    • If cmake config options are specified, we will execue cmake-config, build and install steps automatically in addition of executing other config/build/install commands that can also be present in the config-file. For example the amd-smi project build and install will rquire the usage of both the cmake specific configure, build and install steps and then the execution of 'pip install'-command
  • Added more projects to test the building

@ScottTodd
Copy link
Copy Markdown
Member

I'll get to reviewing this tomorrow. In the meantime, please:

  1. Use a descriptive pull request title, like "Update rockbuilder portability and project support"
  2. Format the pull request description using line breaks, complete sentences, bullet points, etc. as appropriate (https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax)
  3. Fix pre-commit formatting errors (please configure this to run locally using https://pre-commit.com/ before you push, to save time on reviews)

@lamikr lamikr changed the title Users/lamikr/wip/rockbuilder update Imprpove the rockbuilder logic and config options Jun 10, 2025
@lamikr lamikr changed the title Imprpove the rockbuilder logic and config options Imprpove the rockbuilder build logic and config options Jun 10, 2025
@lamikr lamikr force-pushed the users/lamikr/wip/rockbuilder_update branch 2 times, most recently from e474982 to 17b5fdc Compare June 10, 2025 18:15
@ScottTodd ScottTodd changed the title Imprpove the rockbuilder build logic and config options Improve the rockbuilder build logic and config options Jun 10, 2025
@lamikr lamikr force-pushed the users/lamikr/wip/rockbuilder_update branch 2 times, most recently from 0879905 to c310a68 Compare June 11, 2025 01:36
lamikr and others added 5 commits June 10, 2025 21:03
- Update the default rocm_home default seach path after the code
   was moved to another directory: ../../dist/build/rocm
 - ROCM_HOME env variable can be used to specify a different location
   for the rocm-sdk that will be used by the rockbuilder
- Use patches from externa-builds/pytorch/patches folder instead of forking it
- Support building pytorch, pytorch audio and vision also on windows
  without requiring  a bash prompt
- Add option to specify environment variables
  - Environment variables are set at the time when project phases are executed
    and then reset back to previous values or removed after the project has been build
    (env_common, env_linux, env_windows)
- git am fix for hipify run on windows
    - Hipified patches could fail to apply with command "git am"
      if the hipify command was executed on dos-prompt instead of using a bash-prompt.
    - Reason for the failure was that the original hipify patches expected
      that they would be applied to files having Linux line ending.
      When hipify is executed on dos-prompt they wil however have a dos line ending
      and that caused the patched to file.
    - This is fixed with '--ignore-whitespace' flag for git am.
- Update messages printed out during the build steps
- Improve code checkout and hipify phases
- Add separate/optional hipify_cmd to config files instead of using
  the config-command to execute the hipify because we need to do in the
  background do additional things during the hipify phase
      a. If hipify command is specified in config file
        - run hipify command to scan and convert files to hip/rocm format
        - commit code changes made by hipify command to own commit
      b. If patches/hipified directory exist
        - apply patches from that directory even if the hipify_cmd is not specified
- Add optional pre_config and post_config command phases
- Perform "git reset --hard" automatically before trying to execute
  the git checkout command.
    - This fixes git checkout failures in cases where the project directory
      contains modifications to files that have not been committed.
      This can happen for the files for example if the previously executed build
      command has modified them
- Add error checks, error message printouts and process termination
  if the command execution failed:
    - added support for commands having multiple lines
    - all commands executed can now contain from more than
      one line (earlier restricted for the build command)
- Added support for specifying cmake config commands:
    - If cmake config options are specified, we will execue cmake-config,
      build and install steps automatically in addition of executing other
      config/build/install commands that can also be present in the config-file.
      For example the amd-smi project build and install will rquire the
      usage of both the cmake specific configure, build and install steps and
      then the execution of 'pip install'-command
- Added more projects to test the project config and build options

Signed-off-by: Mika Laitio <lamikr@gmail.com>
- we will instead use patches directory under
  external-builds/pytorch/patches for now

Signed-off-by: Mika Laitio <lamikr@gmail.com>
Signed-off-by: Mika Laitio <lamikr@gmail.com>
- add for now a very simple examples to tests to
  verify that pytorch, pytorch vision and audio libraries
  can be loaded and they can printout version information

Signed-off-by: Mika Laitio <lamikr@gmail.com>
pytorch audio needs to be build by using the clang even on windows and
this requires some patches both to pytorch and pytorch audio
- pytorch patches
  - do not apply certain msvc specific compiler options on windows build
    if clang is used instead on windows
- pytorch audio
  - Allow using gnu-gcc or clang in windows pytorch audio builds
    by specifying CC or CXX env variables. Default behavior will
    default for using the MSVC compiler.
  - fix pytorch audio linking error with unknown symbols referring
    to methods available in utils.cpp

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr force-pushed the users/lamikr/wip/rockbuilder_update branch from c310a68 to 977e68b Compare June 11, 2025 04:04
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be hard to review in a single pull request. The patch file changes could be split off and reviewed separately, then the rockbuilder changes will need continued iteration. We also need a plan for aligning with https://github.com/ROCm/TheRock/blob/main/external-builds/pytorch/build_prod_wheels.py and other new scripts for packaging processes. I'd like to see a quick turnaround on small code reviews (multiple small patches a day, not one big pull request after over a week), so this work doesn't drift and go stale.

Comment thread experimental/rockbuilder/lib_python/repo_management.py
Comment thread experimental/rockbuilder/lib_python/repo_management.py Outdated
Comment thread experimental/rockbuilder/lib_python/repo_management.py Outdated
Comment thread experimental/rockbuilder/lib_python/repo_management.py Outdated
Comment thread experimental/rockbuilder/lib_python/project_builder.py Outdated
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr force-pushed the users/lamikr/wip/rockbuilder_update branch from 977e68b to 00a10b5 Compare June 11, 2025 23:55
added the copy of wheel that is build
to separate packages/wheels directory under rockbuilder

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr force-pushed the users/lamikr/wip/rockbuilder_update branch from 00a10b5 to 0f9ded8 Compare June 11, 2025 23:56
@lamikr
Copy link
Copy Markdown
Contributor Author

lamikr commented Jun 12, 2025

@ScottTodd I pushed updated patch in, changes:

  1. I dropped the patches that will add new projects as discussed to be added later as a separate patches (sglang, comfyui)
  2. made code changes to error handling, etc. you suggested. (those changes are now in new git commit on top of the others to make review easier). I added also more comments to
    some locations in code you reviewed
  3. added the copy of wheel to packages/wheel directory under rockbuilder. (I tested that the errors are handled in case that there is for example no permission to copy the wheel to that location)

@ScottTodd
Copy link
Copy Markdown
Member

Code review notes:

  • Force pushing makes viewing diffs harder. Prefer to push new (possibly fixup) commits instead:
    image

    (Note that all commits are included there, instead of only the new changes)

    The "Compare" buttons sort of work, but those views aren't as integrated into the code review:
    image

  • I see you responded to some comments and resolve them, but many are still unresolved. You might need to expand them:
    image

    If using an editor like VSCode for the review, you can see comments inline. If using email for github notifications, all the comments are there too.

  • Please click "re-request review" when you want more feedback:
    image

    I have regularly have 10+ PR reviews in my queue at https://github.com/pulls/review-requested and any PRs that don't use that built in feature are likely to be missed.

lamikr added 2 commits June 12, 2025 16:01
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr requested a review from ScottTodd June 12, 2025 23:10
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I was able to build PyTorch locally on Windows using this, with a few local changes: https://github.com/ROCm/TheRock/compare/users/lamikr/wip/rockbuilder_update...ScottTodd:rockbuilder-feedback-1?expand=1

These remaining comments are blocking and then I think we can proceed with a merge. Lots of cleanup still needed but one step at a time and this code is still in experimental.

Comment on lines +54 to +63
+ # Do not set cl-compiler option on windows if Clang or GNU compiler is used
+ if(NOT ${CMAKE_CXX_COMPILER_ID} MATCHES "Clang|GNU")
+ if (MSVC)
+ option(USE_MSVC_OPTIONS "Use msvc compiler options" ON)
+ endif()
+ endif()
# until they can be unified, keep these lists synced with setup.py
- if(MSVC)
-
+ if (USE_MSVC_OPTIONS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this patch to pytorch for now and revisit separately from the rockbuilder changes. This patch interferes with compilation on my Windows machine since it goes down the non-MSVC path with -Wall when what we're currently using is clang-cl (clang's "sorta compatible with MSVC" frontend). As a result, I get 5GB+ of warning logs like this:

      |                      ^
D:/projects/TheRock/experimental/rockbuilder/src_projects/pytorch/build/aten/src\ATen/ops/ormqr_ops.h(33,10): warning: 'constexpr' specifier is incompatible with C++98 [-Wc++98-compat]
   33 |   static constexpr const char* name = "aten::ormqr";
      |          ^
D:/projects/TheRock/experimental/rockbuilder/src_projects/pytorch/build/aten/src\ATen/ops/ormqr_ops.h(34,10): warning: 'constexpr' specifier is incompatible with C++98 [-Wc++98-compat]
   34 |   static constexpr const char* overload_name = "";
      |          ^

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - let's leave pytorch_audio on Windows support to a separate PR. I see other issues building pytorch_audio with this patch that we can resolve later.

First in the list is this, from running import torch without ROCM_HOME/bin on the PATH:

Traceback (most recent call last):
  File "D:\projects\TheRock\experimental\rockbuilder\src_projects\pytorch_vision\setup.py", line 12, in <module>
    import torch
  File "D:\projects\TheRock\experimental\rockbuilder\.venv\Lib\site-packages\torch\__init__.py", line 270, in <module>
    _load_dll_libraries()
  File "D:\projects\TheRock\experimental\rockbuilder\.venv\Lib\site-packages\torch\__init__.py", line 266, in _load_dll_libraries
    raise err
OSError: [WinError 126] The specified module could not be found. Error loading "D:\projects\TheRock\experimental\rockbuilder\.venv\Lib\site-packages\torch\lib\caffe2_nvrtc.dll" or one of its dependencies.

    Patch_dir:   D:/projects/TheRock/external-builds/pytorch/patches/pytorch
    Build_dir:   D:/projects/TheRock/experimental/rockbuilder/builddir/pytorch

Comment thread experimental/rockbuilder/projects/pytorch.cfg Outdated
Comment thread experimental/rockbuilder/lib_python/project_builder.py Outdated
Comment thread experimental/rockbuilder/lib_python/repo_management.py Outdated
Comment thread experimental/rockbuilder/README.md Outdated
lamikr added 6 commits June 13, 2025 17:22
_get_project_info_config_value method
  - change double underscore in method name to
    single underscrore because it's often used
    python convention to mark that method is
    intent to be private in class.
  - add self parameter as method is inside class
- fix scoping error in temp_env_list

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
- replace double underscore with single underscrore
  as a convention to mark "private method" in class.

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
pytorch patch works otherwise but causes log files
to increase due to "wall" flag set on windows builds.

Will study this more and do a separate improved PR related to this one.

This reverts commit 86395f7.
- we disable this now until pytorch vision
  and audio patches are worked on a little bit more
  and added later for windows build on separate commit

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr requested a review from ScottTodd June 14, 2025 00:32
Comment on lines +108 to +111
# 1) search the latest wheel file from certain directory
# 2) copy wheel to packages/wheel directory
# 3) install wheel to current python environment
def _handle_FIND_AND_HANDLE_LATEST_PYTHON_WHEEL_CMD(self, install_cmd):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of this function for a few reasons, but I also want to land this PR soon so we can continue development without backing things up too much.

Function naming and purpose

The all caps sentinel command name feels like a hack that we shouldn't need. Should reevaluate the architecture there. Also, more of a finer detail, but functions should aim to do one thing only, and this does three.

Error handling and control flow

This function is deeply nested when it could return early. It tracks ret as the status when returning early or using exceptions could remove the need for that tracking, making the logic easier to follow and safer. Again this has a bare except clause (except:). Never do that, unless you also want to catch KeyboardInterrupt (highly unlikely in this code).

There may be a bug in there where ret is ignored and is overwritten between lines 130 and 143, and then ret is set back to False when it is already falsey on line 147. Changing the code patterns used will make that much safer.

Package installing

I'm not sure if we want to "install wheel to the current python environment" - we may want to separate the venv running the script from the venv used to build packages if there are complicated deps. If we do install, setting PIP_BREAK_SYSTEM_PACKAGES is dangerous. Why is that needed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: control flow, see https://testing.googleblog.com/2017/06/code-health-reduce-nesting-reduce.html (I literally had this printed out and on display above my desk for several years, it is very helpful in writing safe and easy to read code)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree on nesting: it doesn't matter what the author thinks, what matters is the readers/maintainers - and no reader/maintainer considers precondition checks that are not early-exit a good thing.

Comment on lines +167 to +187
# Handle first special API command or commands:
# - Special commands are keywords that will trigger the execution
# of internal python function.
# - Special commands needs to be the first commands executed
while (
(ret == True)
and (exec_cmd is not None)
and (
exec_cmd.startswith(
"ROCK_CONFIG_CMD__FIND_AND_INSTALL_LATEST_PYTHON_WHEEL"
)
)
):
line_arr = exec_cmd.splitlines(True)
special_cmd = line_arr[0]
# then concat rest of the lines for next command to be executed
if len(line_arr) > 1:
exec_cmd = "".join(line_arr[1:])
else:
exec_cmd = None
ret = self._handle_FIND_AND_HANDLE_LATEST_PYTHON_WHEEL_CMD(special_cmd)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More complicated, possibly error-prone logic down here and I'm not convinced that we need any "special API commands". At that point we might want to hand-code in Python instead of using the one-side-fits-all config file approach.

Comment on lines +190 to +200
is_multiline = self.is_multiline_text(exec_cmd)
if is_multiline:
is_windows = any(platform.win32_ver())
if is_windows:
# subprocess.run does not handle the execution of multiple
# commands together in same in dos-prompt based command shell
# and instead they would each need to be separated with &&
# which would cause each of them to be run on own shell.
# Therefore in case of multiple commands, we need to write them
# first to bat-file and then execute that bat-file.
os.makedirs(self.project_build_dir, exist_ok=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the use case for multiple commands and why this approach solves that well? I'd like to remove all this complexity if possible.

Comment on lines +419 to +423
print(
"Error, Invalid environment variable key-value pair in project: "
+ self.project_name
)
print("Key: " + key_value_str)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: logging can use python f-strings

Comment on lines +460 to +465
def do_clean(self, clean_cmd):
ret = True
# we want to return true for clean command even if the project has not been checked out yet
if self.project_src_dir.is_dir() == True:
ret = self._handle_command_exec("clean", clean_cmd, self.project_exec_dir)
return ret
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ret handling code is introducing complexity for no benefit. In this case we could just return True without tracking a variable that is ignored. I'm repeating myself many times here, but we should rework how control flow and error handling is done.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just my style always to use same flow in each function no matter whether the function is simple or complex. I am simple man and like to reuse the same flow if possible :-)

  1. set default ret value
  2. do business logic and change ret value if needed
  3. return the ret

Another thing is that in some cases some functions would perhaps not need to return true/false at all. But I prefer always to return success indicator if possible. (sometimes put it in parameter at style (int *err) in c, if ret is reserved for some other purposes.

Reason for that is that sometimes the function may be simple in the beginning but grow to more complex during the time and at that point I do not be apple to check the caller side whether error code checking is added there. (In good luck the library has become so popular that I do not even know it's users)

Comment on lines +613 to 619
def do_cmake_build(self, cmake_config):
ret = True
if cmake_config:
cpu_count = os.cpu_count()
build_cmd = "make -j" + str(cpu_count)
ret = self._handle_command_exec("make", build_cmd, self.project_build_dir)
return ret
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use Ninja instead of make?

We also have other places where parallel building using the full CPU count is causing oversaturation on high core count machines (e.g. 96), so we should watch for that here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I think we can actually change the configure command to be "-GNinja" and then use "cmake --build ." instead in future.

- added the CONFIG.md to document the
  config files and options available there
  to specify how the application is build
- renamed the rockbuilder build in command
  ROCK_CONFIG_CMD__FIND_AND_INSTALL_LATEST_PYTHON_WHEEL
  to
  RCB_CMD__FIND_AND_INSTALL_LATEST_PYTHON_WHEEL

Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Copy link
Copy Markdown
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run pre-commit then we can land this.

Comment on lines +11 to +33
```mermaid
sequenceDiagram
participant RockBuilder
participant Core_App_List
RockBuilder->>Core_App_List: Read Configuration
Core_App_List-->>RockBuilder: Application List

loop For_each_Application
RockBuilder->>App_Specific_Config: Read Application Configuration
App_Specific_Config-->>RockBuilder: Application Specific Build Settings
RockBuilder->>RockBuilder: Check Whether to Continue or Skip building [in Linux/Windows]
RockBuilder->>RockBuilder: Set Env Variables
RockBuilder->>RockBuilder: Checkout Source Code
RockBuilder->>RockBuilder: Apply Base Patches [optional]
RockBuilder->>RockBuilder: Hipify_cmd [optional]
RockBuilder->>RockBuilder: Apply hipify patches [optional]
RockBuilder->>RockBuilder: Init_cmd [optional]
RockBuilder->>RockBuilder: Configure_cmd [optional]
RockBuilder->>RockBuilder: Build_cmd [optional]
RockBuilder->>RockBuilder: Install_cmd [optional]
RockBuilder->>RockBuilder: Unset Env Variables
end
```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice diagram, thanks!

Comment on lines +21 to +24
- pytorch (projects/pytorch.cfg)
- pytorch vision (projects/pytorch_vision.cfg)
- pytorch audio (projects/pytorch_audio.cfg)
- torch migraphx (projects/torch_migraphx.cfg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These could use hyperlinks or some other markdown formatting to make this part of the docs a bit easier to navigate (particularly on mobile where switching to looking through files takes more taps)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

lamikr added 2 commits June 17, 2025 11:12
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
@lamikr lamikr merged commit 2449227 into main Jun 17, 2025
4 checks passed
@lamikr lamikr deleted the users/lamikr/wip/rockbuilder_update branch June 17, 2025 18:39
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants