Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Host irs] alias and preallocated output support #4144

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Mar 26, 2025

This PR belongs to a series of stacked PRs:

  1. => You are here: [Host irs] alias and preallocated output support #4144
  2. [Host Ir] refactor and cleanup lowering and segmentation #4145
  3. [Host ir] support for set reduce and binary op #4146
  4. [Host irs] Stream lowering of single device fusions #4147

What

  • Support for aliases in HostIrContainer. When a Tensor tv1 is marked as being the alias of tv0, then, at runtime, tv0's concrete data/buffer will be used for the op. It is a way to reuse buffers that have been allocated elsewhere within the TensorView's SSA paradigm. Chained aliasing (tv2-->tv1-->tv0) are supported.
  • Fix preallocated outputs in HostIrEvaluator

Why

It is necessary for stream parallelization, where typically we allocate the full output buffer but each stream writes to a slice of this buffer.

How

The aliasing is stored in the HostIrContainer through a map.

At the HostIrEvaluator level, instead of operating directly on the ExprEvaluator to write/read concrete data, we first apply the alias indirection

Copy link

Description

  • Added support for aliases in HostIrContainer

  • Enhanced HostIrEvaluator to handle preallocated outputs

  • Simplified and hardened allocation in for loop test


Changes walkthrough 📝

Relevant files
Enhancement
container.cpp
Added alias printing                                                                         

csrc/host_ir/container.cpp

  • Added printing of aliases in print method
+5/-0     
executor.cpp
Enhanced output handling and allocation                                   

csrc/host_ir/executor.cpp

  • Removed unused helper functions getKnownTensorOrUndefined
  • Added checks for input aliases in HostIrEvaluator constructor
  • Refactored runWithInput to handle preallocated outputs
  • Updated handle methods to use getKnownConcreteData and getAlias
  • Added support for preallocated outputs in handle(PostOnStream*)
  • Simplified and hardened allocation in handle(ForLoop*)
  • +109/-107
    container.h
    Added alias support                                                                           

    csrc/host_ir/container.h

    • Added markAlias and alias methods
    +12/-0   
    executor.h
    Added alias handling methods                                                         

    csrc/host_ir/executor.h

  • Added methods for alias handling (getAlias, isKnown,
    getKnownConcreteData, getKnownTensorOrUndefined, bind, invalidate)
  • +32/-1   
    Tests
    test_host_irs.cpp
    Added tests for alias and preallocated outputs                     

    tests/cpp/test_host_irs.cpp

  • Added test for preallocated outputs
  • Added test for alias setting and getting
  • Added test to throw on input alias
  • +110/-4 

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The getKnownConcreteData method does not handle the case where getAlias(val) returns a nullptr. This could lead to a segmentation fault if val is not found in the alias map.

          in_tensor,
          out_tensor);
      if (work != nullptr) {
        work->wait();
      }
    }
    
    // Evaluate outputs that are marked as Evaluate
    Code Duplication

    The getKnownTensorOrUndefined method is very similar to getKnownConcreteData. Consider refactoring to avoid code duplication.

    auto out = host_ir_container_->outputs()[out_idx];
    auto alias_info = host_ir_container_->getOutputAlias(out);
    if (alias_info.type == AllocationType::Evaluate) {
      NVF_ERROR(
    Error Handling

    The getKnownTensorOrUndefined method returns an empty tensor if the value is not known. This might not be the best way to handle this case, as it could lead to silent errors. Consider throwing an exception or handling it differently.

    auto out = host_ir_container_->outputs()[out_idx];
    auto alias_info = host_ir_container_->getOutputAlias(out);
    if (alias_info.type == AllocationType::Evaluate) {
      NVF_ERROR(

    @samnordmann
    Copy link
    Collaborator Author

    !test

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant