Skip to content

#16353: skip no volume tensors #16619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions tests/ttnn/unit_tests/operations/eltwise/test_unary.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,12 @@ def test_sin(device, h, w):
run_unary_test(device, h, w, ttnn.sin)


@pytest.mark.parametrize("h", [0])
@pytest.mark.parametrize("w", [1])
def test_01_volume_sin(device, h, w):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a similar test for binary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a test_01_volume_tensors test in test_binary_bcast.py.

run_unary_test(device, h, w, ttnn.sin)


@pytest.mark.parametrize("h", [64])
@pytest.mark.parametrize("w", [128])
def test_asin(device, h, w):
Expand Down
17 changes: 17 additions & 0 deletions ttnn/cpp/ttnn/device_operation.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,17 @@ concept DeviceOperationWithCustomProgramCacheConcept =
{ device_operation_t::compute_program_hash(operation_attributes, tensor_args)} -> std::convertible_to<tt::stl::hash::hash_t>;
};

template <typename device_operation_t>
concept HasSkipLaunch = requires(
device_operation_t op,
const typename device_operation_t::operation_attributes_t& operation_attributes,
const typename device_operation_t::tensor_args_t& tensor_args,
const typename device_operation_t::tensor_return_value_t& tensor_return_value) {
{
device_operation_t::skip_launch(operation_attributes, tensor_args, tensor_return_value)
} -> std::convertible_to<bool>;
};

namespace detail {
template <typename... Ts>
[[nodiscard]] std::variant<Ts...> map_index_to_variant(std::size_t i, std::variant<Ts...>) {
Expand Down Expand Up @@ -238,6 +249,12 @@ template <DeviceOperationConcept device_operation_t>
void launch_on_worker_thread(auto cq_id, auto device_operation_id, const auto& operation_attributes, const auto& tensor_args, auto &tensor_return_value, auto& device) {
ZoneScopedN("TT_DNN_DEVICE_OP");

if constexpr (HasSkipLaunch<device_operation_t>) {
if (device_operation_t::skip_launch(operation_attributes, tensor_args, tensor_return_value)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a universal rule that can be applied to all operations.
Skip if output volume is 0.

Not pushing it in this PR. Just curios what everyone think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to make that assumption, in case the op decided that it should initialize padding values or other weird edge-cases. Last I checked, the decision on how to handle padding in tiled layouts has still not been resolved, so I thought ops should explicitly opt into this skipping behavior at least for now.

return;
}
}

auto& program_cache = device->get_program_cache();

auto program_hash = 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,13 @@ operation::OpPerformanceModel BinaryDeviceOperation::create_op_performance_model
return result;
}

bool BinaryDeviceOperation::skip_launch(
const operation_attributes_t& attributes,
const tensor_args_t& tensor_args,
const tensor_return_value_t& tensor_return_value) {
return tensor_return_value.logical_shape().volume() == 0;
}

std::tuple<BinaryDeviceOperation::operation_attributes_t, BinaryDeviceOperation::tensor_args_t>
BinaryDeviceOperation::invoke(
const Tensor& input_tensor_a_arg,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,8 @@ struct BinaryDeviceOperation {
const tensor_args_t& tensor_args,
tensor_return_value_t& tensor_return_value);

static bool skip_launch(const operation_attributes_t&, const tensor_args_t&, const tensor_return_value_t&);

static std::tuple<operation_attributes_t, tensor_args_t> invoke(
const Tensor& input_tensor_a_arg,
const Tensor& input_tensor_b_arg,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,13 @@ tt::stl::hash::hash_t BinaryNgDeviceOperation::compute_program_hash(
attributes, input_tensor_a.dtype(), std::get<DeviceStorage>(input_tensor_a.storage()).memory_config());
}

bool BinaryNgDeviceOperation::skip_launch(
const operation_attributes_t& attributes,
const tensor_args_t& tensor_args,
const tensor_return_value_t& tensor_return_value) {
return tensor_return_value.logical_shape().volume() == 0;
}

std::tuple<BinaryNgDeviceOperation::operation_attributes_t, BinaryNgDeviceOperation::tensor_args_t>
BinaryNgDeviceOperation::invoke(
const Tensor& input_tensor_a_arg,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ struct BinaryNgDeviceOperation {
static spec_return_value_t compute_output_specs(const operation_attributes_t&, const tensor_args_t&);
static tensor_return_value_t create_output_tensors(const operation_attributes_t&, const tensor_args_t&);
static tt::stl::hash::hash_t compute_program_hash(const operation_attributes_t&, const tensor_args_t&);
static bool skip_launch(const operation_attributes_t&, const tensor_args_t&, const tensor_return_value_t&);

// tensor-tensor invocation
static std::tuple<operation_attributes_t, tensor_args_t> invoke(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,13 @@ tt::stl::hash::hash_t UnaryDeviceOperation::compute_program_hash(
return hash;
}

bool UnaryDeviceOperation::skip_launch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I wonder if instead we should return some noop from select_program_factory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding what you mean by this, I think that would be a much more invasive change to make.

const operation_attributes_t& attributes,
const tensor_args_t& tensor_args,
const tensor_return_value_t& tensor_return_value) {
return tensor_return_value.logical_shape().volume() == 0;
}

std::tuple<UnaryDeviceOperation::operation_attributes_t, UnaryDeviceOperation::tensor_args_t>
UnaryDeviceOperation::invoke(
const Tensor& input,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ struct UnaryDeviceOperation {

static tt::stl::hash::hash_t compute_program_hash(const operation_attributes_t&, const tensor_args_t&);

static bool skip_launch(const operation_attributes_t&, const tensor_args_t&, const tensor_return_value_t&);

static std::tuple<operation_attributes_t, tensor_args_t> invoke(
const Tensor& input,
const std::vector<UnaryWithParam>& op_chain,
Expand Down
Loading