Nathan add logging to metrics #157

NathanHB · 2024-04-15T16:03:49Z

what this PR does:

If you want to log out something comming from the metrics, simply return it in the metric dict.
for example, if you want to log out the judge response when using llm_as_judge, simply return the response in the dict.

{
   "score": score,
   "judgement": judge_response
}

the judgement field is a string and will not be aggregated. however, it will be logged in the details for each sample.

src/lighteval/evaluator.py

src/lighteval/logging/info_loggers.py

clefourrier

LGTM
However, do you have an example of a parquet file saved with this new method? Under which key are these added info stored?

NathanHB · 2024-04-16T07:24:54Z

However, do you have an example of a parquet file saved with this new method? Under which key are these added info stored?

yeah i do, in the detail file, they are added in the metrics field.
for example:

{
 'metrics': {'judgement': ["The assistant's answer provides a correct and "
                           "helpful solution to the user's question. The "
                           'assistant correctly implements a function that '
                           'finds the common elements in two arrays without '
                           'using any extra data structures. The function '
                           'compares each element of one array with each '
                           'element of the other array and adds the common '
                           'elements to a temporary array. The explanation '
                           'provided is clear and easy to understand, and an '
                           'example of how to use the function is also '
                           'included.\n'
                           '\n'
                           "The assistant's solution is simpler and more "
                           'concise compared to the reference answer, but it '
                           'achieves the same result efficiently. The '
                           "assistant's solution has a time complexity of "
                           'O(n^2), which is acceptable for this problem.\n'
                           '\n'
                           "Overall, the assistant's answer is correct, clear, "
                           "and helpful in addressing the user's question.\n"
                           '\n'
                           'Rating: [[9]]',
                           "The assistant's response to the user's second "
                           'question is correct. The assistant provided an '
                           'optimized implementation of the '
                           '`find_common_elements` function that uses list '
                           'comprehension to achieve a time complexity of '
                           'O(n), where n is the length of the shorter array. '
                           'This solution is efficient and meets the '
                           'requirement of not using any extra data structures '
                           'besides those used to store and return the common '
                           'elements.\n'
                           '\n'
                           "The assistant's explanation is clear and concise, "
                           'breaking down how the implementation works and '
                           'providing an example of how to use the function. '
                           'Overall, the response is helpful and addresses the '
                           "user's query effectively.\n"
                           '\n'
                           'Rating: [[10]]'],
             'multi_turn': 10,
             'single_turn': 9,
             'user_prompt': [[{'content': 'You are a helpful assistant.',
                               'role': 'system'},
                              {'content': '[Instruction]\n'
                                          'Please act as an impartial judge '
                                          'and evaluate the quality of the '
                                          'response provided by an AI '
                                          'assistant to the user question '
                                          'displayed below. Your evaluation '
                                          'should consider correctness and '
                                          'helpfulness. You will be given a '
                                          'reference answer and the '
                                          "assistant's answer. Begin your "
                                          'evaluation by comparing the '
                                          "assistant's answer with the "
                                          'reference answer. Identify and '
                                          'correct any mistakes. Be as '
                                          'objective as possible. After '
                                          'providing your explanation, you '
                                          'must rate the response on a scale '
                                          'of 1 to 10 by strictly following '
                                          'this format: "[[rating]]", for '
                                          'example: "Rating: [[5]]".\n'
                                          '\n'
                                          '[Question]\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '[The Start of Reference Answer]\n'
                                          'O(n^2) or O(nlogn) is expected. The '
                                          'following is a O(n^2) solution. you '
                                          'can also sort them first and use '
                                          'two pointers.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common_elements = []\n'
                                          '    for i in range(len(arr1)):\n'
                                          '        for j in range(len(arr2)):\n'
                                          '            if arr1[i] == arr2[j]:\n'
                                          '                # Check if the '
                                          'element is already in the '
                                          'common_elements list\n'
                                          '                if arr1[i] not in '
                                          'common_elements:\n'
                                          '                    '
                                          'common_elements.append(arr1[i])\n'
                                          '    return common_elements\n'
                                          '```\n'
                                          '[The End of Reference Answer]\n'
                                          '\n'
                                          "[The Start of Assistant's Answer]\n"
                                          ' To find the common elements in two '
                                          'arrays without using any extra data '
                                          'structures, we can use a simple '
                                          'loop that compares each element of '
                                          'one array with each element of the '
                                          'other array. If the elements are '
                                          'equal, they are added to a '
                                          'temporary array. Here is an example '
                                          'of how this can be implemented in '
                                          'Python:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common = []\n'
                                          '    for x in arr1:\n'
                                          '        if x in arr2:\n'
                                          '            common.append(x)\n'
                                          '    return common\n'
                                          '```\n'
                                          'This function takes two arrays '
                                          '`arr1` and `arr2` as input, and '
                                          'returns a list of the common '
                                          'elements between the two arrays.\n'
                                          '\n'
                                          'Here is an explanation of how the '
                                          'function works:\n'
                                          '\n'
                                          '1. The function starts by '
                                          'initializing an empty list `common` '
                                          'to store the common elements.\n'
                                          '2. It then uses a for loop to '
                                          'iterate through the elements of '
                                          '`arr1`.\n'
                                          '3. For each element `x` in `arr1`, '
                                          'the function checks if `x` is also '
                                          'an element of `arr2` using the `in` '
                                          'operator.\n'
                                          '4. If `x` is found in `arr2`, it is '
                                          'added to the `common` list using '
                                          'the `append` method.\n'
                                          '5. The function returns the '
                                          '`common` list.\n'
                                          '\n'
                                          'Here is an example of how you can '
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'This will output the common '
                                          'elements between `arr1` and `arr2`, '
                                          'which are `2`, `3`, and `4`.\n'
                                          "[The End of Assistant's Answer]",
                               'role': 'user'}],
                             [{'content': 'Please act as an impartial judge '
                                          'and evaluate the quality of the '
                                          'response provided by an AI '
                                          'assistant to the user question. '
                                          'Your evaluation should consider '
                                          'correctness and helpfulness. You '
                                          'will be given a reference answer '
                                          "and the assistant's answer. You "
                                          'evaluation should focus on the '
                                          "assistant's answer to the second "
                                          'question. Begin your evaluation by '
                                          "comparing the assistant's answer "
                                          'with the reference answer. Identify '
                                          'and correct any mistakes. Be as '
                                          'objective as possible. After '
                                          'providing your explanation, you '
                                          'must rate the response on a scale '
                                          'of 1 to 10 by strictly following '
                                          'this format: "[[rating]]", for '
                                          'example: "Rating: [[5]]".\n'
                                          '\n',
                               'role': 'system'},
                              {'content': '<|The Start of Reference Answer|>\n'
                                          '\n'
                                          '### User:\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '### Reference answer:\n'
                                          'O(n^2) or O(nlogn) is expected. The '
                                          'following is a O(n^2) solution. you '
                                          'can also sort them first and use '
                                          'two pointers.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common_elements = []\n'
                                          '    for i in range(len(arr1)):\n'
                                          '        for j in range(len(arr2)):\n'
                                          '            if arr1[i] == arr2[j]:\n'
                                          '                # Check if the '
                                          'element is already in the '
                                          'common_elements list\n'
                                          '                if arr1[i] not in '
                                          'common_elements:\n'
                                          '                    '
                                          'common_elements.append(arr1[i])\n'
                                          '    return common_elements\n'
                                          '```\n'
                                          '\n'
                                          '### User:\n'
                                          'Now the constraint of not using '
                                          'extra data structure is removed, '
                                          'implement one with the best time '
                                          'complexity.\n'
                                          '\n'
                                          '### Reference answer:\n'
                                          'Simply use hash table (set or dict) '
                                          'to achieve O(n) time complexity.\n'
                                          '\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    set1 = set(arr1)\n'
                                          '    set2 = set(arr2)\n'
                                          '    common_elements = '
                                          'set1.intersection(set2)\n'
                                          '    return list(common_elements)\n'
                                          '```\n'
                                          '\n'
                                          '<|The End of Reference Answer|>\n'
                                          '\n'
                                          '\n'
                                          "<|The Start of Assistant A's "
                                          'Conversation with User|>\n'
                                          '\n'
                                          '### User:\n'
                                          'Implement a program to find the '
                                          'common elements in two arrays '
                                          'without using any extra data '
                                          'structures, besides that used to '
                                          'store and return the common '
                                          'elements.\n'
                                          '\n'
                                          '### Assistant A:\n'
                                          ' To find the common elements in two '
                                          'arrays without using any extra data '
                                          'structures, we can use a simple '
                                          'loop that compares each element of '
                                          'one array with each element of the '
                                          'other array. If the elements are '
                                          'equal, they are added to a '
                                          'temporary array. Here is an example '
                                          'of how this can be implemented in '
                                          'Python:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    common = []\n'
                                          '    for x in arr1:\n'
                                          '        if x in arr2:\n'
                                          '            common.append(x)\n'
                                          '    return common\n'
                                          '```\n'
                                          'This function takes two arrays '
                                          '`arr1` and `arr2` as input, and '
                                          'returns a list of the common '
                                          'elements between the two arrays.\n'
                                          '\n'
                                          'Here is an explanation of how the '
                                          'function works:\n'
                                          '\n'
                                          '1. The function starts by '
                                          'initializing an empty list `common` '
                                          'to store the common elements.\n'
                                          '2. It then uses a for loop to '
                                          'iterate through the elements of '
                                          '`arr1`.\n'
                                          '3. For each element `x` in `arr1`, '
                                          'the function checks if `x` is also '
                                          'an element of `arr2` using the `in` '
                                          'operator.\n'
                                          '4. If `x` is found in `arr2`, it is '
                                          'added to the `common` list using '
                                          'the `append` method.\n'
                                          '5. The function returns the '
                                          '`common` list.\n'
                                          '\n'
                                          'Here is an example of how you can '
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'This will output the common '
                                          'elements between `arr1` and `arr2`, '
                                          'which are `2`, `3`, and `4`.\n'
                                          '\n'
                                          '### User:\n'
                                          'Now the constraint of not using '
                                          'extra data structure is removed, '
                                          'implement one with the best time '
                                          'complexity.\n'
                                          '\n'
                                          '### Assistant A:\n'
                                          " Sure! Here's an optimized "
                                          'implementation of the '
                                          '`find_common_elements` function '
                                          'that has a time complexity of O(n), '
                                          'where n is the length of the '
                                          'shorter array:\n'
                                          '```\n'
                                          'def find_common_elements(arr1, '
                                          'arr2):\n'
                                          '    return [x for x in arr1 if x in '
                                          'arr2]\n'
                                          '```\n'
                                          "Here's how this implementation "
                                          'works:\n'
                                          '\n'
                                          '1. The function uses a list '
                                          'comprehension to generate the '
                                          'common elements.\n'
                                          '2. The list comprehension iterates '
                                          'over the elements of `arr1`, and '
                                          'for each element `x`, it checks if '
                                          '`x` is also an element of `arr2` '
                                          'using the `in` operator.\n'
                                          '3. If `x` is found in `arr2`, it is '
                                          'added to the list of common '
                                          'elements using the `[]` syntax.\n'
                                          '4. The list comprehension returns '
                                          'the list of common elements.\n'
                                          '\n'
                                          'The time complexity of this '
                                          'implementation is O(n), where n is '
                                          'the length of the shorter array, '
                                          'because the list comprehension '
                                          'iterates over the elements of '
                                          '`arr1` once, regardless of the '
                                          'length of `arr2`.\n'
                                          '\n'
                                          "Here's an example of how you can "
                                          'use this function:\n'
                                          '```\n'
                                          'arr1 = [1, 2, 3, 4, 5]\n'
                                          'arr2 = [2, 3, 4, 5, 6]\n'
                                          'common = find_common_elements(arr1, '
                                          'arr2)\n'
                                          'print(common) # [2, 3, 4]\n'
                                          '```\n'
                                          'I hope this helps! Let me know if '
                                          'you have any questions.\n'
                                          '\n'
                                          "<|The End of Assistant A's "
                                          'Conversation with User|>',
                               'role': 'user'}]]},
}

clefourrier · 2024-04-16T07:25:47Z

Hm would there be a way to put them elsewhere? Like in an "additional info" field?

NathanHB · 2024-04-16T07:27:07Z

Hm would there be a way to put them elsewhere? Like in an "additional info" field?

i don't really see why, that's where i wanted to put them in the first place. judgement prompt is tightly related to the metric

clefourrier · 2024-04-16T07:33:38Z

What else does the metric field contain in your above file?

NathanHB · 2024-04-16T08:01:51Z

nothing, the only metric we are using for mt_bench is llm_as_judge

clefourrier · 2024-04-16T08:04:25Z

Ok, that works for now! If we use this system more in the future for extra logging, we might need to move some values to another field of details

what this PR does: If you want to log out something comming from the metrics, simply return it in the metric dict. for example, if you want to log out the judge response when using llm_as_judge, simply return the response in the dict. ``` { "score": score, "judgement": judge_response } ```` the `judgement` field is a string and will not be aggregated. however, it will be logged in the details for each sample. --------- Co-authored-by: Nathan Habib <[email protected]>

Nathan Habib and others added 2 commits April 15, 2024 14:27

homogenize logging for metrics

8c0fb23

commit

7cec7e6

clefourrier reviewed Apr 15, 2024

View reviewed changes

src/lighteval/evaluator.py Show resolved Hide resolved

clefourrier reviewed Apr 15, 2024

View reviewed changes

src/lighteval/logging/info_loggers.py Show resolved Hide resolved

NathanHB added 2 commits April 16, 2024 06:59

commit

b68222d

commit

f4e4c90

NathanHB requested a review from clefourrier April 16, 2024 07:10

clefourrier reviewed Apr 16, 2024

View reviewed changes

src/lighteval/logging/info_loggers.py Show resolved Hide resolved

clefourrier approved these changes Apr 16, 2024

View reviewed changes

Merge branch 'main' into nathan-add-logging-to-metrics

c529f03

Merge branch 'main' into nathan-add-logging-to-metrics

6f89ce1

NathanHB merged commit 6a48e4e into main Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nathan add logging to metrics #157

Nathan add logging to metrics #157

Uh oh!

NathanHB commented Apr 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier left a comment

Uh oh!

NathanHB commented Apr 16, 2024 •

edited

Loading

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

NathanHB commented Apr 16, 2024

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

NathanHB commented Apr 16, 2024

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

Uh oh!

Nathan add logging to metrics #157

Nathan add logging to metrics #157

Uh oh!

Conversation

NathanHB commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

clefourrier left a comment

Choose a reason for hiding this comment

Uh oh!

NathanHB commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

NathanHB commented Apr 16, 2024

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

NathanHB commented Apr 16, 2024

Uh oh!

clefourrier commented Apr 16, 2024

Uh oh!

Uh oh!

NathanHB commented Apr 15, 2024 •

edited

Loading

NathanHB commented Apr 16, 2024 •

edited

Loading