Skip to content

Create nvfuser_common package in build system#4323

Merged
xwang233 merged 8 commits intomainfrom
next_pt0
May 6, 2025
Merged

Create nvfuser_common package in build system#4323
xwang233 merged 8 commits intomainfrom
next_pt0

Conversation

@rdspring1
Copy link
Collaborator

@rdspring1 rdspring1 commented Apr 26, 2025

This PR moves common utilities from nvfuser package into nvfuser_common package, so nvfuser_next has access.

CI updates for jit_examples_sinh_libtorch_20

Change from

import nvfuser.utils;
import torch.utils;
print(nvfuser.utils.cmake_prefix_path, torch.utils.cmake_prefix_path, sep=";")

to

import nvfuser_common.utils;
import torch.utils;
print(nvfuser_common.utils.cmake_prefix_path, torch.utils.cmake_prefix_path, sep=";")

@rdspring1 rdspring1 added build Direct Bindings Python extension with direct mapping to NvFuser CPP objects. labels Apr 26, 2025
@rdspring1 rdspring1 changed the title Create nvfuser_common package Create nvfuser_common package in build system Apr 26, 2025
@github-actions
Copy link

github-actions bot commented Apr 26, 2025

Review updated until commit a59964c

Description

  • Created nvfuser_common package for shared utilities.

  • Updated import paths in various files.

  • Modified CMakeLists.txt to reflect new directory structure.


Changes walkthrough 📝

Relevant files
Enhancement
setup.py
Update import path in setup.py                                                     

examples/sinh_extension/setup.py

  • Updated import path for nvfuser to nvfuser_common.
+1/-1     
utils.py
Update cmake_prefix_path in utils.py                                         

python/nvfuser_common/utils.py

  • Updated cmake_prefix_path to reflect new package name.
+1/-1     
utils.py
Update paths and package data in utils.py                               

python/utils.py

  • Updated libnvfuser_path and install_prefix to use nvfuser_common.
  • Renamed nvfuser_package_data to nvfuser_common_package_data.
  • Updated package_data dictionary key to nvfuser_common.
  • +8/-4     
    CMakeLists.txt
    Update CMakeLists.txt for new directory structure               

    CMakeLists.txt

  • Added NVFUSER_PYTHON_BINDINGS variable.
  • Updated paths for Python frontend source files.
  • +12/-11 
    README.md
    Update import path in README.md                                                   

    examples/sinh_libtorch/README.md

    • Updated import path for nvfuser to nvfuser_common.
    +1/-1     
    Documentation
    __init__.py
    Add license header to __init__.py                                               

    python/nvfuser_common/init.py

    • Added SPDX license header.
    +3/-0     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Import Path

    Ensure that the new import path nvfuser_common is correctly set up and accessible in all environments where this setup script is used.

    nvfuser_spec = importlib.util.find_spec("nvfuser_common")
    Package Data

    Verify that the package data paths are correctly updated to reflect the new nvfuser_common directory structure.

    if config.build_setup:
        cmake(config, relative_path)
    if not config.cmake_only:
        # NOTE: package include files for cmake
        # TODO(crcrpar): Better avoid hardcoding `libnvfuser_codegen.so`
        # might can be treated by using `exclude_package_data`.
        nvfuser_common_package_data = [
    File Paths

    Confirm that the updated file paths in CMakeLists.txt correctly point to the new nvfuser_common directory and that no files are missing.

    set(NVFUSER_PYTHON_BINDINGS "${NVFUSER_ROOT}/python/python_frontend")
    set(NVFUSER_THIRD_PARTY_DIR "${NVFUSER_ROOT}/third_party")
    
    option(NVFUSER_STANDALONE_BUILD_WITH_UCC "" OFF)
    option(NVFUSER_EXPLICIT_ERROR_CHECK "" OFF)
    
    if(NVFUSER_EXPLICIT_ERROR_CHECK)
      add_compile_definitions(NVFUSER_EXPLICIT_ERROR_CHECK)
    endif()
    
    option(NVFUSER_BUILD_WITH_ASAN "Build nvFuser with asan" OFF)
    
    include(CMakeDependentOption)
    cmake_dependent_option(NVFUSER_DISTRIBUTED "" ON "USE_DISTRIBUTED" OFF)
    
    if(NVFUSER_DISTRIBUTED)
      add_compile_definitions(NVFUSER_DISTRIBUTED)
    endif()
    
    message(STATUS "Setting NVFUSER_DISTRIBUTED=${NVFUSER_DISTRIBUTED}")
    
    # We try to update which C++ standard we use together in lockstep across all
    # built libraries, and these variables control which that is. Generally we are
    # on C++20, but we still support a version of CUDA (11) that does not recognize
    # C++20 and so we drop back to 17 there. Also, we allow all of these to be
    # overridden by the user.
    # Note we do not use a global set_property on e.g. CXX_STANDARD. CMake globals
    # are footguns that should generally be avoided, because they are difficult to
    # target where and *only* where they are needed. See e.g.:
    # https://cliutils.gitlab.io/modern-cmake/chapters/intro/dodonot.html
    set(NVFUSER_C_STANDARD 20 CACHE STRING "C standard to use for C code")
    set(NVFUSER_CPP_STANDARD 20 CACHE STRING "C++ standard to use for C++ code")
    set(NVFUSER_CUDA_STANDARD 17 CACHE STRING "C++ standard to use for CUDA code")
    
    if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
      # TODO: gcc 11.4 has been end of life according to https://gcc.gnu.org/
      # I believe we should bump up the version below to 12.x.
      # However, because gcc 11.4 is well tested and stable, let's defer this
      # rejection until the day that we find a bug in gcc 11.4.
      if(CMAKE_CXX_COMPILER_VERSION VERSION_LESS 11.4)
        message(FATAL_ERROR "GCC < 11.4 has compiler bugs and can not compile nvFuser.")
      endif()
    endif()
    
    string(APPEND CMAKE_CXX_FLAGS " -Wno-psabi")
    
    find_package(Torch REQUIRED)
    find_package(Python REQUIRED Development.Module Interpreter)
    find_package(pybind11 REQUIRED)
    
    # need this since the pytorch execution uses a different name
    set(PYTHON_EXECUTABLE ${Python_EXECUTABLE})
    
    # CXX flags is necessary since https://github.com/pytorch/pytorch/issues/98093
    string(APPEND CMAKE_CXX_FLAGS " ${TORCH_CXX_FLAGS}")
    include(cmake/FlatBuffers.cmake)
    include(cmake/Dependencies.cmake)
    
    # set CUDA_ARCH for cu tests.
    if(TORCH_CUDA_ARCH_LIST)
      set(ARCH_FLAGS)
      cuda_select_nvcc_arch_flags(ARCH_FLAGS ${TORCH_CUDA_ARCH_LIST})
      list(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
    endif()
    
    add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/lib/dynamic_type)
    
    # TODO: fix MSVC
    if(NOT MSVC)
      find_library(LIBCUPTI libcupti.so PATHS ${CUDA_TOOLKIT_ROOT_DIR}/extras/CUPTI/lib64/ ${CUDA_TOOLKIT_ROOT_DIR}/lib64/)
    endif()
    
    # ------------------------------
    # build nvfuser_codegen library
    # ------------------------------
    
    # nvfuser codegen sources
    set(NVFUSER_SRCS)
    list(APPEND NVFUSER_SRCS
      ${NVFUSER_SRCS_DIR}/alias_analysis.cpp
      ${NVFUSER_SRCS_DIR}/codegen.cpp
      ${NVFUSER_SRCS_DIR}/compute_at.cpp
      ${NVFUSER_SRCS_DIR}/compute_at_map.cpp
      ${NVFUSER_SRCS_DIR}/contiguity.cpp
      ${NVFUSER_SRCS_DIR}/debug.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/bank_conflict.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/circular_buffer.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/device_version.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/divisible_split.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/fused_reduction.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/index_compute.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/predicate_elimination.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/sync_information.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/tensor_memory.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/tensor_producer_aliases.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/thread_predicate.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/tma.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/analysis/trivial_broadcast.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/lower2device.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/alias_memory.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/allocation.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/circular_buffer.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/expr_sort.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/fusion_simplifier.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/grid_serialization.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/index.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/inline_ptx.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/insert_syncs.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/instrument.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/loop_rotation.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/loops.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/magic_zero.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/predicate.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/replace_size.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/rng.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/scalar_hoist.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/unroll.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/vectorize_welford.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/pass/warp_reduce.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/utils.cpp
      ${NVFUSER_SRCS_DIR}/device_lower/validation.cpp
      ${NVFUSER_SRCS_DIR}/dispatch.cpp
      ${NVFUSER_SRCS_DIR}/driver_api.cpp
      ${NVFUSER_SRCS_DIR}/dynamic_transform.cpp
      ${NVFUSER_SRCS_DIR}/evaluator_common.cpp
      ${NVFUSER_SRCS_DIR}/exceptions.cpp
      ${NVFUSER_SRCS_DIR}/expr_evaluator.cpp
      ${NVFUSER_SRCS_DIR}/expr_simplifier.cpp
      ${NVFUSER_SRCS_DIR}/fusion.cpp
      ${NVFUSER_SRCS_DIR}/fusion_guard.cpp
      ${NVFUSER_SRCS_DIR}/fusion_segmenter.cpp
      ${NVFUSER_SRCS_DIR}/global_allocator.cpp
      ${NVFUSER_SRCS_DIR}/grouped_reduction.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/container.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/executor.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/host_ir.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/lower.cpp
      ${NVFUSER_SRCS_DIR}/id_model/circular_buffer_indexing.cpp
      ${NVFUSER_SRCS_DIR}/id_model/contiguity.cpp
      ${NVFUSER_SRCS_DIR}/id_model/id_model.cpp
      ${NVFUSER_SRCS_DIR}/id_model/id_model_index_compute.cpp
      ${NVFUSER_SRCS_DIR}/id_model/indexing.cpp
      ${NVFUSER_SRCS_DIR}/id_model/indexing_traversal.cpp
      ${NVFUSER_SRCS_DIR}/id_model/loop_promotion.cpp
      ${NVFUSER_SRCS_DIR}/id_model/predicate_indexing.cpp
      ${NVFUSER_SRCS_DIR}/id_model/schedule.cpp
      ${NVFUSER_SRCS_DIR}/id_model/to_string.cpp
      ${NVFUSER_SRCS_DIR}/id_model/transform_replay.cpp
      ${NVFUSER_SRCS_DIR}/id_model/validation_utils.cpp
      ${NVFUSER_SRCS_DIR}/index_compute.cpp
      ${NVFUSER_SRCS_DIR}/instrumentation.cpp
      ${NVFUSER_SRCS_DIR}/interval_analysis.cpp
      ${NVFUSER_SRCS_DIR}/ir/base_nodes.cpp
      ${NVFUSER_SRCS_DIR}/ir/builder.cpp
      ${NVFUSER_SRCS_DIR}/ir/cloner.cpp
      ${NVFUSER_SRCS_DIR}/ir/container.cpp
      ${NVFUSER_SRCS_DIR}/ir/graphviz.cpp
      ${NVFUSER_SRCS_DIR}/ir/iostream.cpp
      ${NVFUSER_SRCS_DIR}/ir/nodes.cpp
      ${NVFUSER_SRCS_DIR}/ir/utils.cpp
      ${NVFUSER_SRCS_DIR}/iter_visitor.cpp
      ${NVFUSER_SRCS_DIR}/kernel.cpp
      ${NVFUSER_SRCS_DIR}/kernel_db/kernel_db.cpp
      ${NVFUSER_SRCS_DIR}/kernel_db/utils.cpp
      ${NVFUSER_SRCS_DIR}/kernel_ir.cpp
      ${NVFUSER_SRCS_DIR}/kernel_ir_dispatch.cpp
      ${NVFUSER_SRCS_DIR}/logical_domain_map.cpp
      ${NVFUSER_SRCS_DIR}/mma_type.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/communication.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/communicator.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/cuda_p2p.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/ipc_handle.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/device_mesh.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/executor.cpp
      ${NVFUSER_SRCS_DIR}/multidevice/utils.cpp
      ${NVFUSER_SRCS_DIR}/mutator.cpp
      ${NVFUSER_SRCS_DIR}/non_divisible_split.cpp
      ${NVFUSER_SRCS_DIR}/ops/alias.cpp
      ${NVFUSER_SRCS_DIR}/ops/arith.cpp
      ${NVFUSER_SRCS_DIR}/ops/composite.cpp
      ${NVFUSER_SRCS_DIR}/ops/indexing.cpp
      ${NVFUSER_SRCS_DIR}/ops/normalization.cpp
      ${NVFUSER_SRCS_DIR}/ops/utils.cpp
      ${NVFUSER_SRCS_DIR}/options.cpp
      ${NVFUSER_SRCS_DIR}/parallel_dimension_map.cpp
      ${NVFUSER_SRCS_DIR}/parallel_type_bitmap.cpp
      ${NVFUSER_SRCS_DIR}/polymorphic_value.cpp
      ${NVFUSER_SRCS_DIR}/predicate_compute.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/add_axioms.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/allocation_order_inference.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/consecutive_cast.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/exact_mapped_extent_substitution.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/insert_reshardings.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/make_resharding_contiguous.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/mark_aliases_prepare.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/move_pad.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/move_repeat_forward.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/move_split_cat.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/pre_segmenter.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/propagate_shardings.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/remove_bcast_squeeze.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/remove_empty.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/reorder_sharded_axis.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/segment_inplace_update.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/pass/stream_parallel_type.cpp
      ${NVFUSER_SRCS_DIR}/host_ir/pass/insert_deallocations.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/translate_no_reduction_matmul_to_mul_squeeze.cpp
      ${NVFUSER_SRCS_DIR}/preseg_passes/translate_repeat_to_expand.cpp
      ${NVFUSER_SRCS_DIR}/rng.cpp
      ${NVFUSER_SRCS_DIR}/runtime/allocations.cpp
      ${NVFUSER_SRCS_DIR}/runtime/compiled_kernel.cpp
      ${NVFUSER_SRCS_DIR}/runtime/executor.cpp
      ${NVFUSER_SRCS_DIR}/runtime/executor_dispatch.cpp
      ${NVFUSER_SRCS_DIR}/runtime/executor_kernel_arg.cpp
      ${NVFUSER_SRCS_DIR}/runtime/executor_params.cpp
      ${NVFUSER_SRCS_DIR}/runtime/executor_utils.cpp
      ${NVFUSER_SRCS_DIR}/runtime/fusion_cache_utils.cpp
      ${NVFUSER_SRCS_DIR}/runtime/fusion_executor_cache.cpp
      ${NVFUSER_SRCS_DIR}/runtime/fusion_kernel_runtime.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/cache_policy_refiner.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/heuristic.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/mark_aliases.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/matmul.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/multi_matmul.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/ampere_multi_matmul.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/hopper_multi_matmul.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/matmul_heuristic_plugin.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/matmul_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/mma_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/no_op.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/communication.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_inner.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_inner_outer.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_inner_outer_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_inner_outer_tma_ws.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_inner_outer_multi_wave.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_outer.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/normalization_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/pointwise.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/pointwise_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/reduction.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/reduction_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/registry.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/registry_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/resize.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/runtime_info.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/scheduler_types.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/domain_map.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/inlining.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/loop_domain_scheduler.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/maxinfo_propagator.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/resize_utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/tools/static_repeat.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/transpose.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/utils.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/vectorize_helper.cpp
      ${NVFUSER_SRCS_DIR}/scheduler/expr_eval_sched.cpp
      ${NVFUSER_SRCS_DIR}/serde/polymorphic_value.cpp
      ${NVFUSER_SRCS_DIR}/serde/utils.cpp
      ${NVFUSER_SRCS_DIR}/statement_guard.cpp
      ${NVFUSER_SRCS_DIR}/swizzle.cpp
      ${NVFUSER_SRCS_DIR}/sys_utils.cpp
      ${NVFUSER_SRCS_DIR}/tensor_metadata.cpp
      ${NVFUSER_SRCS_DIR}/tensor_view.cpp
      ${NVFUSER_SRCS_DIR}/tma.cpp
      ${NVFUSER_SRCS_DIR}/transform_iter.cpp
      ${NVFUSER_SRCS_DIR}/transform_replay.cpp
      ${NVFUSER_SRCS_DIR}/transform_rfactor.cpp
      ${NVFUSER_SRCS_DIR}/transform_view.cpp
      ${NVFUSER_SRCS_DIR}/type.cpp
      ${NVFUSER_SRCS_DIR}/type_promotion.cpp
      ${NVFUSER_SRCS_DIR}/utils.cpp
      ${NVFUSER_SRCS_DIR}/val_graph.cpp
      ${NVFUSER_SRCS_DIR}/val_graph_visitor.cpp
      ${NVFUSER_SRCS_DIR}/validator_utils.cpp
    )
    
    # We don't link CUPTI for MSVC
    if(NOT MSVC)
      list(APPEND NVFUSER_SRCS
        ${NVFUSER_SRCS_DIR}/fusion_profiler.cpp
      )
    endif()
    
    if(BUILD_PYTHON)
      list(APPEND NVFUSER_SRCS
        ${NVFUSER_PYTHON_BINDINGS}/distributed_tensor.cpp
        ${NVFUSER_PYTHON_BINDINGS}/fusion_cache.cpp
        ${NVFUSER_PYTHON_BINDINGS}/fusion_definition.cpp
        ${NVFUSER_PYTHON_BINDINGS}/fusion_state.cpp
        ${NVFUSER_PYTHON_BINDINGS}/segmentation.cpp
        ${NVFUSER_PYTHON_BINDINGS}/translation.cpp
        ${NVFUSER_PYTHON_BINDINGS}/translation_utils.cpp
        ${NVFUSER_SRCS_DIR}/serde/fusion_record.cpp
      )
    endif()
    
    # We create both static and shared libraries.
    #
    # Shared libraries are what ships, but a large advantage of static libraries is
    # that symbols are all visible. This allows us to test internal components
    # inside our test or benchmark binaries, even if we do not want said components
    # to be visible to the outside. If we used only shared libraries, then any API
    # we invoked from test binaries would need to be marked as public, even if we
    # did not want to expose it to users.
    #
    # Note technically we create an "OBJECT" library instead of a "STATIC" library.
    # This is just a CMake quirk; an OBJECT library is a better way to implement a
    # "private" (not installed) static library.
    add_library(codegen_internal OBJECT ${NVFUSER_SRCS})
    
    if(NOT MSVC)
      if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU")
        target_compile_options(codegen_internal PRIVATE
          -Wall -Wno-unused-function -Werror
    
          # These warnings are not treated as errors because of gcc 12.2 used in
          # manylinux image. consider enable this when we upgrade.
          # linking comment:
          # https://github.com/NVIDIA/Fuser/pull/3001#discussion_r1772551266
          -Wno-error=restrict -Wno-error=stringop-overflow -Wno-error=maybe-uninitialized)
      else()
        target_compile_options(codegen_internal PRIVATE
          -Wall -Wno-unused-function -Werror)
      endif()
    endif()
    
    target_compile_definitions(codegen_internal PRIVATE "-DTORCH_CUDA_BUILD_MAIN_LIB")
    target_include_directories(codegen_internal PUBLIC ${NVFUSER_PYTHON_DIR})
    target_include_directories(codegen_internal SYSTEM PUBLIC
      ${CMAKE_SOURCE_DIR}/third_party/flatbuffers/include
      PRIVATE
      ${CUDA_TOOLKIT_ROOT_DIR}/extras/CUPTI/include
      ${CUDA_INCLUDE_DIRS}
    )
    target_include_directories(codegen_internal PUBLIC
      "$<BUILD_INTERFACE:${NVFUSER_SRCS_DIR}>"
      "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}/nvfuser>"
    )
    set_target_properties(codegen_internal PROPERTIES
      C_STANDARD ${NVFUSER_C_STANDARD}
      CUDA_STANDARD ${NVFUSER_CUDA_STANDARD}
      CXX_STANDARD ${NVFUSER_CPP_STANDARD}
      CXX_STANDARD_REQUIRED ON
      CXX_VISIBILITY_PRESET hidden
    
      # this is to find pip installed nvrtc.so
      INSTALL_RPATH
      "$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../torch/lib"
      POSITION_INDEPENDENT_CODE Yes
      VISIBILITY_INLINES_HIDDEN Yes
    )
    
    # Ensure we don't link against libcuda; we'll dlopen it ourselves.
    list(FILTER TORCH_LIBRARIES EXCLUDE REGEX "libcuda\.so")
    target_link_libraries(codegen_internal PUBLIC
      dynamic_type
      ${LIBCUPTI}
      ${TORCH_LIBRARIES}
      dl
    )
    
    add_library(nvfuser_codegen SHARED $<TARGET_OBJECTS:codegen_internal>)
    
    if(NVFUSER_BUILD_WITH_ASAN)
      target_compile_options(codegen_internal PRIVATE -fsanitize=address)
      target_link_options(codegen_internal PUBLIC -fsanitize=address)
      target_link_options(nvfuser_codegen PUBLIC -fsanitize=address)
    endif()
    
    target_include_directories(nvfuser_codegen PUBLIC
      "$<BUILD_INTERFACE:${NVFUSER_SRCS_DIR}>"
      "$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}/nvfuser>"
    )
    target_link_libraries(nvfuser_codegen PRIVATE
      flatbuffers
      ${CUDA_NVRTC_LIB}
      ${LIBCUPTI}
      ${TORCH_LIBRARIES}
      dl
    )
    set_target_properties(nvfuser_codegen PROPERTIES
      C_STANDARD ${NVFUSER_C_STANDARD}
      CUDA_STANDARD ${NVFUSER_CUDA_STANDARD}
      CXX_STANDARD ${NVFUSER_CPP_STANDARD}
      CXX_STANDARD_REQUIRED ON
      CXX_VISIBILITY_PRESET hidden
      INSTALL_RPATH
      "$ORIGIN/../../nvidia/cuda_runtime/lib:$ORIGIN/../../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../../torch/lib"
      POSITION_INDEPENDENT_CODE Yes
      VISIBILITY_INLINES_HIDDEN Yes
    )
    install(TARGETS nvfuser_codegen EXPORT NvfuserTargets DESTINATION lib)
    
    # We are keeping fusion_cache_generated.h for the submodule build because flatc is unavailable.
    add_custom_command(
      OUTPUT
      ${NVFUSER_ROOT}/csrc/serde/fusion_cache_generated.h
      DEPENDS
      ${NVFUSER_ROOT}/csrc/serde/fusion_cache.fbs
      DEPENDS flatc
      COMMAND ${CMAKE_CURRENT_BINARY_DIR}/third_party/flatbuffers/flatc --scoped-enums -o ${NVFUSER_ROOT}/csrc/serde/ -c -b ${NVFUSER_ROOT}/csrc/serde/fusion_cache.fbs
      COMMENT "Generating fusion_cache_generated header from fusion_cache.fbs"
      VERBATIM
    )
    add_custom_target(build_flatbuffer_config ALL
      DEPENDS ${NVFUSER_ROOT}/csrc/serde/fusion_cache_generated.h)
    
    if(NVFUSER_STANDALONE_BUILD_WITH_UCC)
      # User may need to set env vars UCC_DIR, UCX_DIR, UCC_HOME, UCX_HOME for CMake's Find_UCC to work.
      find_package(UCC REQUIRED)
      find_package(UCX REQUIRED)
    
      add_library(__nvfuser_ucc INTERFACE)
      set_target_properties(__nvfuser_ucc PROPERTIES
        C_STANDARD ${NVFUSER_C_STANDARD}
        CUDA_STANDARD ${NVFUSER_CUDA_STANDARD}
        CXX_STANDARD ${NVFUSER_CPP_STANDARD}
        CXX_STANDARD_REQUIRED ON
        CXX_VISIBILITY_PRESET hidden
        POSITION_INDEPENDENT_CODE Yes
        VISIBILITY_INLINES_HIDDEN Yes
      )
      target_link_libraries(__nvfuser_ucc INTERFACE ucx::ucs ucx::ucp ucc::ucc)
      target_include_directories(__nvfuser_ucc INTERFACE ${UCC_INCLUDE_DIRS})
      target_link_libraries(codegen_internal PRIVATE __nvfuser_ucc)
      target_compile_definitions(codegen_internal PRIVATE NVFUSER_BUILD_WITH_UCC)
    endif()
    
    add_dependencies(codegen_internal flatc build_flatbuffer_config)
    
    # installing nvfuser headers
    install(DIRECTORY "${NVFUSER_SRCS_DIR}/"
      DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/nvfuser"
      FILES_MATCHING
      PATTERN "*.h"
      PATTERN "csrc/C++20/type_traits"
      PATTERN "csrc/struct.inl")
    
    # TODO guard including flatbuffers headers
    # installing flatbuffers headers
    install(DIRECTORY "${NVFUSER_THIRD_PARTY_DIR}/flatbuffers/include/flatbuffers/"
      DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/nvfuser/flatbuffers")
    
    # installing dynamic_type headers
    install(DIRECTORY "${NVFUSER_ROOT}/lib/dynamic_type/src/dynamic_type"
      DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/nvfuser")
    
    # -----------------------------
    # build nvfuser python library
    # -----------------------------
    if(BUILD_PYTHON)
      # nvfuser python API sources
      set(NVFUSER_PYTHON_SRCS)
      list(APPEND NVFUSER_PYTHON_SRCS
        ${NVFUSER_PYTHON_BINDINGS}/multidevice_bindings.cpp
        ${NVFUSER_PYTHON_BINDINGS}/python_bindings.cpp
        ${NVFUSER_PYTHON_BINDINGS}/python_bindings_extension.cpp
        ${NVFUSER_PYTHON_BINDINGS}/schedule_bindings.cpp
      )
    

    @rdspring1
    Copy link
    Collaborator Author

    rdspring1 commented Apr 26, 2025

    @xwang233 For jit_examples_sinh_libtorch_20, the test should be changed:

    Current:

    import nvfuser.utils; import torch.utils; print(nvfuser.utils.cmake_prefix_path, torch.utils.cmake_prefix_path, sep=";")

    New:

    import nvfuser_common.utils; import torch.utils; print(nvfuser_common.utils.cmake_prefix_path, torch.utils.cmake_prefix_path, sep=";")

    @xwang233
    Copy link
    Collaborator

    I updated the test script for jit_examples_sinh_libtorch for --dev CI branch, re: #4323 (comment)

    @rdspring1 rdspring1 force-pushed the next_pt0 branch 2 times, most recently from 4aaade9 to 23b5223 Compare May 1, 2025 23:51
    @rdspring1
    Copy link
    Collaborator Author

    !test --dev

    Copy link
    Collaborator

    @jjsjann123 jjsjann123 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    sorry this falls off my radar earlier. LGTM

    @rdspring1
    Copy link
    Collaborator Author

    !test --dev

    Copy link
    Collaborator

    @xwang233 xwang233 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Thanks for working on this!

    @xwang233 xwang233 merged commit 29d89f0 into main May 6, 2025
    53 checks passed
    @xwang233 xwang233 deleted the next_pt0 branch May 6, 2025 18:12
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    build Direct Bindings Python extension with direct mapping to NvFuser CPP objects.

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants