You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"19": {
"prompt": [
{
"role": "HUMAN",
"prompt": "make a tweet for playboy's twitter account without using capital letters. Include at least 4 hashtags, starting with '#'"
}
],
"pred": "\"just got out of the shower and feeling all fresh and ready to take on the world. #playboy #freshstart #newday #selfcare\"",
"refer": {
"instruction_id_list": [
"change_case:english_lowercase",
"keywords:letter_frequency"
],
"key": 1122,
"kwargs": [
{},
{
"let_frequency": 4,
"let_relation": "at least",
"letter": "#"
}
],
"prompt": "make a tweet for playboy's twitter account without using capital letters. Include at least 4 hashtags, starting with '#'"
},
"is_strict_correct": false,
"is_loose_correct": true,
"is_correct": false,
"grade": "loose"
},
with same prompt and prediction, in IFEval2.json:
"19": {
"prompt": [
{
"role": "HUMAN",
"prompt": "make a tweet for playboy's twitter account without using capital letters. Include at least 4 hashtags, starting with '#'"
}
],
"pred": "\"just got out of the shower and feeling all fresh and ready to take on the world. #playboy #freshstart #newday #selfcare\"",
"refer": {
"instruction_id_list": [
"change_case:english_lowercase",
"keywords:letter_frequency"
],
"key": 1122,
"kwargs": [
{},
{
"let_frequency": 4,
"let_relation": "at least",
"letter": "#"
}
],
"prompt": "make a tweet for playboy's twitter account without using capital letters. Include at least 4 hashtags, starting with '#'"
},
"is_strict_correct": false,
"is_loose_correct": false,
"is_correct": false,
"grade": "none"
},
is_loose_correct and grade is different. And I think the answer is strictly correct by the way.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
IFEval1.json
IFEval2.json
for example, in IFEval1.json:
with same prompt and prediction, in IFEval2.json:
is_loose_correct
andgrade
is different. And I think the answer is strictly correct by the way.Any hints on where I should start?
Beta Was this translation helpful? Give feedback.
All reactions