Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu_blas_lt_gemm_runner #2766

Open
wants to merge 4 commits into
base: r2.13-rocm-enhanced-upate-llvm
Choose a base branch
from

Conversation

ScXfjiang
Copy link

No description provided.

@ScXfjiang ScXfjiang requested a review from pemeliya November 20, 2024 23:36
@ScXfjiang ScXfjiang marked this pull request as ready for review November 20, 2024 23:37
@ScXfjiang ScXfjiang requested a review from pemeliya November 21, 2024 16:15
@ScXfjiang
Copy link
Author

we also have this TF_USE_CUBLASLT for enable hipblaslt https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt/tensorflow/core/kernels/gpu_utils.cc#L98-L107

and it seems for CUDA-only https://github.com/ROCm/tensorflow-upstream/blob/r2.13-rocm-enhanced-hipblaslt_gpu_blas_lt_runner/tensorflow/core/kernels/matmul_op_impl.h#L411-L413

I'm thinking maybe we should unify these flags as one for non-xla @ScXfjiang @pemeliya

I prefer flags in debug_options_flags.cc to env vars in general. The latter is hard to manage. How do you think about it? @pemeliya

@i-chaochen
Copy link

debug_options_flags.cc is for XLA specific stuff.

If this is TF and non-xla, these flags are more messy.

@ScXfjiang ScXfjiang changed the base branch from r2.13-rocm-enhanced-hipblaslt to r2.13-rocm-enhanced-upate-llvm November 25, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants