-
Notifications
You must be signed in to change notification settings - Fork 18
Add AttentionOp #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add AttentionOp #708
Conversation
| dnf install -y \ | ||
| which wget gcc zlib-devel bzip2 bzip2-devel readline-devel sqlite \ | ||
| sqlite-devel xz xz-devel libffi-devel curl git ncurses-devel \ | ||
| openssh-clients libcudnn8-devel zip jq \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudnn8 conflicts with cudnn9 in the base container.
| //===----------------------------------------------------------------------===// | ||
|
|
||
| bool tensorrt::AttentionOp::isValidForTensorRTVersion( | ||
| int64_t trtMajorVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check the minor version here.
| def TensorRT_AttentionNormalizationOpAttr : TensorRT_EnumAttr<TensorRT_AttentionNormalizationOp, "attention_normalization_op">{ | ||
| } | ||
|
|
||
| def TensorRT_DataType : TensorRT_I32EnumAttr< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't already have datatype?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't have an op that explicitly requires data type as an input, for example cast op uses MLIR data types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the same here? I'm assuming we could use the same helpers that cast uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast uses output tensor type to indicate the data type. e.g.
tensorrt.cast %arg0 : tensor<3xf16> to tensor<3xf32>
but in attention op, normalization_quantize_to_type parameter is i) optional and ii) represents an intermidiate quantization data type.
No description provided.