Skip to content

Conversation

@yizhuoz004
Copy link
Collaborator

No description provided.

dnf install -y \
which wget gcc zlib-devel bzip2 bzip2-devel readline-devel sqlite \
sqlite-devel xz xz-devel libffi-devel curl git ncurses-devel \
openssh-clients libcudnn8-devel zip jq \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudnn8 conflicts with cudnn9 in the base container.

//===----------------------------------------------------------------------===//

bool tensorrt::AttentionOp::isValidForTensorRTVersion(
int64_t trtMajorVersion) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check the minor version here.

def TensorRT_AttentionNormalizationOpAttr : TensorRT_EnumAttr<TensorRT_AttentionNormalizationOp, "attention_normalization_op">{
}

def TensorRT_DataType : TensorRT_I32EnumAttr<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't already have datatype?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have an op that explicitly requires data type as an input, for example cast op uses MLIR data types

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the same here? I'm assuming we could use the same helpers that cast uses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast uses output tensor type to indicate the data type. e.g.

tensorrt.cast %arg0 : tensor<3xf16> to tensor<3xf32>

but in attention op, normalization_quantize_to_type parameter is i) optional and ii) represents an intermidiate quantization data type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants