You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.
The text was updated successfully, but these errors were encountered:
What is the advantage of doing it this way? The current process is to take advantage of the fact that the model builder is aware of the attention operator for a specific device and dtype.
Is this for experimentation purposes? If so, maybe we can expose a extra_options flag to override the default attention operator.
Hi @baijumeswani, the idea is to decouple the tie of device/dtype with built attention op. Consider custom eps that implements attention op with dtype not supported in ort cpu/cuda.
Currently these are inferred from the combination of other configurations such as device and dtype. It is more flexible for downstream users if this can be selected by choice.
The text was updated successfully, but these errors were encountered: