Skip to content

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented Apr 15, 2024

what this PR does:

If you want to log out something comming from the metrics, simply return it in the metric dict.
for example, if you want to log out the judge response when using llm_as_judge, simply return the response in the dict.

{
   "score": score,
   "judgement": judge_response
}

the judgement field is a string and will not be aggregated. however, it will be logged in the details for each sample.

@NathanHB NathanHB requested a review from clefourrier April 16, 2024 07:10
Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
However, do you have an example of a parquet file saved with this new method? Under which key are these added info stored?

@NathanHB
Copy link
Member Author

NathanHB commented Apr 16, 2024

However, do you have an example of a parquet file saved with this new method? Under which key are these added info stored?

yeah i do, in the detail file, they are added in the metrics field.
for example:

{
 'metrics': {'judgement': ["The assistant's answer provides a correct and "
                           "helpful solution to the user's question. The "
                           'assistant correctly implements a function that '
                           'finds the common elements in two arrays without '
                           'using any extra data structures. The function '
                           'compares each element of one array with each '
                           'element of the other array and adds the common '
                           'elements to a temporary array. The explanation '
                           'provided is clear and easy to understand, and an '
                           'example of how to use the function is also '
                           'included.\n'
                           '\n'
                           "The assistant's solution is simpler and more "
                           'concise compared to the reference answer, but it '
                           'achieves the same result efficiently. The '
                           "assistant's solution has a time complexity of "
                           'O(n^2), which is acceptable for this problem.\n'
                           '\n'
                           "Overall, the assistant's answer is correct, clear, "
                           "and helpful in addressing the user's question.\n"
                           '\n'
                           'Rating: [[9]]',
                           "The assistant's response to the user's second "
                           'question is correct. The assistant provided an '
                           'optimized implementation of the '
                           '`find_common_elements` function that uses list '
                           'comprehension to achieve a time complexity of '
                           'O(n), where n is the length of the shorter array. '
                           'This solution is efficient and meets the '
                           'requirement of not using any extra data structures '
                           'besides those used to store and return the common '
                           'elements.\n'
                           '\n'
                           "The assistant's explanation is clear and concise, "
                           'breaking down how the implementation works and '
                           'providing an example of how to use the function. '
                           'Overall, the response is helpful and addresses the '
                           "user's query effectively.\n"
                           '\n'
                           'Rating: [[10]]'],
             'multi_turn': 10,
             'single_turn': 9,
             'user_prompt': [[{'content': 'You are a helpful assistant.',
                               'role': 'system'},
                              {'content': '[Instruction]\n'
                                          'Please act as an impartial judge '
                                          'and evaluate the quality of the '
                                          'response provided by an AI '
                                          'assistant to the user question '
                                          'displayed below. Your evaluation '
                                          'should consider correctness and '
                                          'helpfulness. You will be given a '
                                          'reference answer and the '
                                          "assistant's answer. Begin your "
                                          'evaluation by comparing the '
                                          "assistant's answer with the "
                                          'reference answer. Identify and '
                                          'correct any mistakes. Be as '
                                          'objective as possible. After '
                                          'providing your explanation, you '
                                          'must rate the response on a scale '
                                          'of 1 to 10 by strictly following '
                                          'this format: "[[rating]]", for '
                                          'example: "Rating: [[5]]".\n'
                                          '\n'
                                          '[Question]\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '[The Start of Reference Answer]\n'
                                          'O(n^2) or O(nlogn) is expected. The '
                                          'following is a O(n^2) solution. you '
                                          'can also sort them first and use '
                                          'two pointers.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common_elements = []\n'
                                          '    for i in range(len(arr1)):\n'
                                          '        for j in range(len(arr2)):\n'
                                          '            if arr1[i] == arr2[j]:\n'
                                          '                # Check if the '
                                          'element is already in the '
                                          'common_elements list\n'
                                          '                if arr1[i] not in '
                                          'common_elements:\n'
                                          '                    '
                                          'common_elements.append(arr1[i])\n'
                                          '    return common_elements\n'
                                          '```\n'
                                          '[The End of Reference Answer]\n'
                                          '\n'
                                          "[The Start of Assistant's Answer]\n"
                                          ' To find the common elements in two '
                                          'arrays without using any extra data '
                                          'structures, we can use a simple '
                                          'loop that compares each element of '
                                          'one array with each element of the '
                                          'other array. If the elements are '
                                          'equal, they are added to a '
                                          'temporary array. Here is an example '
                                          'of how this can be implemented in '
                                          'Python:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common = []\n'
                                          '    for x in arr1:\n'
                                          '        if x in arr2:\n'
                                          '            common.append(x)\n'
                                          '    return common\n'
                                          '```\n'
                                          'This function takes two arrays '
                                          '`arr1` and `arr2` as input, and '
                                          'returns a list of the common '
                                          'elements between the two arrays.\n'
                                          '\n'
                                          'Here is an explanation of how the '
                                          'function works:\n'
                                          '\n'
                                          '1. The function starts by '
                                          'initializing an empty list `common` '
                                          'to store the common elements.\n'
                                          '2. It then uses a for loop to '
                                          'iterate through the elements of '
                                          '`arr1`.\n'
                                          '3. For each element `x` in `arr1`, '
                                          'the function checks if `x` is also '
                                          'an element of `arr2` using the `in` '
                                          'operator.\n'
                                          '4. If `x` is found in `arr2`, it is '
                                          'added to the `common` list using '
                                          'the `append` method.\n'
                                          '5. The function returns the '
                                          '`common` list.\n'
                                          '\n'
                                          'Here is an example of how you can '
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'This will output the common '
                                          'elements between `arr1` and `arr2`, '
                                          'which are `2`, `3`, and `4`.\n'
                                          "[The End of Assistant's Answer]",
                               'role': 'user'}],
                             [{'content': 'Please act as an impartial judge '
                                          'and evaluate the quality of the '
                                          'response provided by an AI '
                                          'assistant to the user question. '
                                          'Your evaluation should consider '
                                          'correctness and helpfulness. You '
                                          'will be given a reference answer '
                                          "and the assistant's answer. You "
                                          'evaluation should focus on the '
                                          "assistant's answer to the second "
                                          'question. Begin your evaluation by '
                                          "comparing the assistant's answer "
                                          'with the reference answer. Identify '
                                          'and correct any mistakes. Be as '
                                          'objective as possible. After '
                                          'providing your explanation, you '
                                          'must rate the response on a scale '
                                          'of 1 to 10 by strictly following '
                                          'this format: "[[rating]]", for '
                                          'example: "Rating: [[5]]".\n'
                                          '\n',
                               'role': 'system'},
                              {'content': '<|The Start of Reference Answer|>\n'
                                          '\n'
                                          '### User:\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '### Reference answer:\n'
                                          'O(n^2) or O(nlogn) is expected. The '
                                          'following is a O(n^2) solution. you '
                                          'can also sort them first and use '
                                          'two pointers.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common_elements = []\n'
                                          '    for i in range(len(arr1)):\n'
                                          '        for j in range(len(arr2)):\n'
                                          '            if arr1[i] == arr2[j]:\n'
                                          '                # Check if the '
                                          'element is already in the '
                                          'common_elements list\n'
                                          '                if arr1[i] not in '
                                          'common_elements:\n'
                                          '                    '
                                          'common_elements.append(arr1[i])\n'
                                          '    return common_elements\n'
                                          '```\n'
                                          '\n'
                                          '### User:\n'
                                          'Now the constraint of not using '
                                          'extra data structure is removed, '
                                          'implement one with the best time '
                                          'complexity.\n'
                                          '\n'
                                          '### Reference answer:\n'
                                          'Simply use hash table (set or dict) '
                                          'to achieve O(n) time complexity.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    set1 = set(arr1)\n'
                                          '    set2 = set(arr2)\n'
                                          '    common_elements = '
                                          'set1.intersection(set2)\n'
                                          '    return list(common_elements)\n'
                                          '```\n'
                                          '\n'
                                          '<|The End of Reference Answer|>\n'
                                          '\n'
                                          '\n'
                                          "<|The Start of Assistant A's "
                                          'Conversation with User|>\n'
                                          '\n'
                                          '### User:\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '### Assistant A:\n'
                                          ' To find the common elements in two '
                                          'arrays without using any extra data '
                                          'structures, we can use a simple '
                                          'loop that compares each element of '
                                          'one array with each element of the '
                                          'other array. If the elements are '
                                          'equal, they are added to a '
                                          'temporary array. Here is an example '
                                          'of how this can be implemented in '
                                          'Python:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common = []\n'
                                          '    for x in arr1:\n'
                                          '        if x in arr2:\n'
                                          '            common.append(x)\n'
                                          '    return common\n'
                                          '```\n'
                                          'This function takes two arrays '
                                          '`arr1` and `arr2` as input, and '
                                          'returns a list of the common '
                                          'elements between the two arrays.\n'
                                          '\n'
                                          'Here is an explanation of how the '
                                          'function works:\n'
                                          '\n'
                                          '1. The function starts by '
                                          'initializing an empty list `common` '
                                          'to store the common elements.\n'
                                          '2. It then uses a for loop to '
                                          'iterate through the elements of '
                                          '`arr1`.\n'
                                          '3. For each element `x` in `arr1`, '
                                          'the function checks if `x` is also '
                                          'an element of `arr2` using the `in` '
                                          'operator.\n'
                                          '4. If `x` is found in `arr2`, it is '
                                          'added to the `common` list using '
                                          'the `append` method.\n'
                                          '5. The function returns the '
                                          '`common` list.\n'
                                          '\n'
                                          'Here is an example of how you can '
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'This will output the common '
                                          'elements between `arr1` and `arr2`, '
                                          'which are `2`, `3`, and `4`.\n'
                                          '\n'
                                          '### User:\n'
                                          'Now the constraint of not using '
                                          'extra data structure is removed, '
                                          'implement one with the best time '
                                          'complexity.\n'
                                          '\n'
                                          '### Assistant A:\n'
                                          " Sure! Here's an optimized "
                                          'implementation of the '
                                          '`find_common_elements` function '
                                          'that has a time complexity of O(n), '
                                          'where n is the length of the '
                                          'shorter array:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    return [x for x in arr1 if x in '
                                          'arr2]\n'
                                          '```\n'
                                          "Here's how this implementation "
                                          'works:\n'
                                          '\n'
                                          '1. The function uses a list '
                                          'comprehension to generate the '
                                          'common elements.\n'
                                          '2. The list comprehension iterates '
                                          'over the elements of `arr1`, and '
                                          'for each element `x`, it checks if '
                                          '`x` is also an element of `arr2` '
                                          'using the `in` operator.\n'
                                          '3. If `x` is found in `arr2`, it is '
                                          'added to the list of common '
                                          'elements using the `[]` syntax.\n'
                                          '4. The list comprehension returns '
                                          'the list of common elements.\n'
                                          '\n'
                                          'The time complexity of this '
                                          'implementation is O(n), where n is '
                                          'the length of the shorter array, '
                                          'because the list comprehension '
                                          'iterates over the elements of '
                                          '`arr1` once, regardless of the '
                                          'length of `arr2`.\n'
                                          '\n'
                                          "Here's an example of how you can "
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'I hope this helps! Let me know if '
                                          'you have any questions.\n'
                                          '\n'
                                          "<|The End of Assistant A's "
                                          'Conversation with User|>',
                               'role': 'user'}]]},
}

@clefourrier
Copy link
Member

Hm would there be a way to put them elsewhere? Like in an "additional info" field?

@NathanHB
Copy link
Member Author

Hm would there be a way to put them elsewhere? Like in an "additional info" field?

i don't really see why, that's where i wanted to put them in the first place. judgement prompt is tightly related to the metric

@clefourrier
Copy link
Member

What else does the metric field contain in your above file?

@NathanHB
Copy link
Member Author

nothing, the only metric we are using for mt_bench is llm_as_judge

@clefourrier
Copy link
Member

Ok, that works for now! If we use this system more in the future for extra logging, we might need to move some values to another field of details

@NathanHB NathanHB merged commit 6a48e4e into main Apr 16, 2024
hynky1999 pushed a commit that referenced this pull request May 22, 2025
what this PR does:

If you want to log out something comming from the metrics, simply return it in the metric dict.
for example, if you want to log out the judge response when using llm_as_judge, simply return the response in the dict.

```
{
   "score": score,
   "judgement": judge_response
}
````

the `judgement` field is a string and will not be aggregated. however, it will be logged in the details for each sample.

---------

Co-authored-by: Nathan Habib <[email protected]>
NathanHB added a commit that referenced this pull request Sep 19, 2025
what this PR does:

If you want to log out something comming from the metrics, simply return it in the metric dict.
for example, if you want to log out the judge response when using llm_as_judge, simply return the response in the dict.

```
{
   "score": score,
   "judgement": judge_response
}
````

the `judgement` field is a string and will not be aggregated. however, it will be logged in the details for each sample.

---------

Co-authored-by: Nathan Habib <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants