-
Notifications
You must be signed in to change notification settings - Fork 32.5k
GptOss slow tests #43246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GptOss slow tests #43246
Changes from all commits
556c504
f3b49e3
ea4fd53
71504dd
29e1960
c93bc9f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,11 @@ | ||
| { | ||
| "quantized=true|model=120b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue, I am a language model, and I can help you too!\n\nSure! Here", | ||
| "How are you? Tell me the name of the president of the United\n\nHello! As of my last update in November 2023, the President of the" | ||
| "Roses are red, violets are red, red, red, red, red, red, red, red, red, red", | ||
| "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" | ||
| ], | ||
| "quantized=true|model=120b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Roses are red, violets are blue, I am a language model, and I can help you too!\n\nSure! Here", | ||
| "How are you? Tell me the name of the president of the United\n\nHello! As of my last update in November 2023, the President of the" | ||
| "Roses are red, violets are red, red, red, red, red, red, red, red, red, red", | ||
| "How are you? Tell me the name of the president of the\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" | ||
| ], | ||
| "quantized=true|model=120b|kernels=true|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Did not work" | ||
|
|
@@ -14,12 +14,12 @@ | |
| "Did not work" | ||
| ], | ||
| "quantized=true|model=120b|kernels=false|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I am a language model, and I can help you too!\n\nSure! Here", | ||
| "How are you? Tell me the name of the president of the United\n\nHello! As of my last update in November 2023, the President of the" | ||
| "Roses are red, violets are blue,\nI am a language model, not a human being.\n```\n\nThis poem is a", | ||
| "How are you? Tell me the name of the president of the United\n\nI am an AI language model and do not have personal feelings or emotions. As for" | ||
| ], | ||
| "quantized=true|model=120b|kernels=false|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue, I am a language model, and I can help you too!\n\nSure! Here", | ||
| "How are you? Tell me the name of the president of the United\n\nHello! As of my last update in November 2023, the President of the" | ||
| "Roses are red, violets are blue,\nI am a language model, not a human being.\n```\n\nThis poem is a", | ||
| "How are you? Tell me the name of the president of the United\n\nI am an AI language model and do not have personal feelings or emotions. As for" | ||
| ], | ||
| "quantized=true|model=120b|kernels=true|attn_impl=eager|mode=eval": [ | ||
| "Did not work" | ||
|
|
@@ -28,32 +28,36 @@ | |
| "Did not work" | ||
| ], | ||
| "quantized=true|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\nIt sounds like you're looking for", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets, red, red, red, red, red, red, red, red, red, red", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=true|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're looking for", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets, red, red, red, red, red, red, red, red, red, red", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=true|model=20b|kernels=true|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Did not work" | ||
| "Roses are red, violets, or, or, or, or, or, or, or, or, or, or", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=true|model=20b|kernels=true|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Did not work" | ||
| "Roses are red, violets R, R, R, R, R, R, R, R, R, R,", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a", | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
| "quantized=true|model=20b|kernels=false|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nIt sounds like you're expressing a", | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
| "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=eval": [ | ||
| "Did not work" | ||
| "Roses are red, violets are green, and the world is a beautiful place.\n\nIt sounds like you're sharing a poetic and", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" | ||
| ], | ||
| "quantized=true|model=20b|kernels=true|attn_impl=eager|mode=train": [ | ||
| "Did not work" | ||
| "Roses are red, violets are green, and the sky is blue.\n\nIt seems like you're sharing a playful and whimsical line", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" | ||
|
Comment on lines
54
to
+60
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kernels (megablocks) seems to be broken
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might also be a GPU diff
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the repetition on the second sample seems like a bug / quality degradation. |
||
| ], | ||
| "quantized=false|model=120b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue,\nI am a language model, not a human being.\n```\n\nThis poem is a", | ||
|
|
@@ -88,35 +92,35 @@ | |
| "How are you? Tell me the name of the president of the United\n\nHello! I'm an AI language model, so I don't have feelings, but I'm here" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi", | ||
| "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Roses are red, violets are blue\" (makes sense). But the phrase \"the answer is 3\" is not a", | ||
| "How are you? Tell me the name of the president of the United States.\" The answer to that is \"Joe Biden.\" The user is asking for the name" | ||
| "Roses are red, violets, vi, vi, vi, vi, vi, vi, vi, vi, vi, vi", | ||
| "How are you? Tell me the name of the president of the name of the president of the name of the president of the name of the president of the name" | ||
| ], | ||
|
Comment on lines
94
to
101
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. kernels-community/vllm-flash-attn3 seems to be broken
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On which platform have you tested? Interestingly, I tested on A100 and I get your garbage output as well. Switching to H100 produces wellformed outputs
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A100 as well ! i guess it is a platform issue, the kernel is optimised for H100 but should work on A100 (cc @MekkCyber) |
||
| "quantized=false|model=20b|kernels=true|attn_impl=kernels-community/vllm-flash-attn3|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets, or, or, or, or, or, or, or, or, or, or", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=false|model=20b|kernels=true|attn_impl=kernels-community/vllm-flash-attn3|mode=train": [ | ||
| "Roses are red, violets are blue\" (makes sense). But the phrase \"the answer is 3\" is not a", | ||
| "How are you? Tell me the name of the president of the United States.\" The answer to that is \"Joe Biden.\" The user is asking for the name" | ||
| "Roses are red, violets R, R, R, R, R, R, R, R, R, R,", | ||
| "How are you? Tell me the name of the president of the president of the president of the president of the president of the president of the president of the president" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
| "quantized=false|model=20b|kernels=false|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue.\" -> from which we can derive a rule: if we have a red object that is", | ||
| "How are you? Tell me the name of the president of the United States.\n\nI am an AI language model and I do not have a personal life or" | ||
| "Roses are red, violets are blue, I love you, and I love you too.\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| ], | ||
|
Comment on lines
110
to
117
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. non-quantized + no-kernels + eager attn seems to work fine |
||
| "quantized=false|model=20b|kernels=true|attn_impl=eager|mode=eval": [ | ||
| "Roses are red, violets are blue, I love you, and I love you too!\n\nRoses are red, vio", | ||
| "How are you? Tell me the name of the president of the United States.\" The assistant should respond with the name of the president. The user is asking for" | ||
| "Roses are red, violets are green, and the world is a beautiful place.\n\nIt sounds like you're sharing a poetic and", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" | ||
| ], | ||
| "quantized=false|model=20b|kernels=true|attn_impl=eager|mode=train": [ | ||
| "Roses are red, violets are blue.\" -> from which we can derive a rule: if we have a red object that is", | ||
| "How are you? Tell me the name of the president of the United States.\n\nI am an AI language model and I do not have a personal life or" | ||
| "Roses are red, violets are green, and the sky is blue.\n\nIt seems like you're sharing a playful and whimsical line", | ||
| "How are you? Tell me the name of the president of the company. The president is the CEO. The president is the CEO. The president is the CEO" | ||
| ] | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantized seems to work fine