Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The number of LLM evaluated examples #11

Open
kkk-an opened this issue Sep 23, 2024 · 1 comment
Open

The number of LLM evaluated examples #11

kkk-an opened this issue Sep 23, 2024 · 1 comment

Comments

@kkk-an
Copy link

kkk-an commented Sep 23, 2024

I just run below code and find that the examples need to be evaluated by LLM are not equivalent to your papers.
`rule_based_source = ["E2E", "WIKIEVENTS", "CONLL2003",
"text_editing", "cnn_dailymail", "xsum", "samsum", "gigaword", "arxiv",
"BBH_logical", "BBH_time", "self_made_space", "gsm_8k"]

for type in ["content", "situation", "format", "example", "mixed"]:
data = json.load(open(f"./data/{type}_constraints.json"))
rule, llm = 0, 0
for d in data:
level = d["level"]
if level == 0:
continue
source = d["source"]
if source in rule_based_source:
rule += 1
else:
llm += 1
print(f"type: {type}, rule: {rule}, llm:{llm}")`

image

Is there any misunderstanding in your paper of code?

Thanks for your reply.

@kkk-an
Copy link
Author

kkk-an commented Sep 23, 2024

I have checked my gpt4_discriminative_eval_input and find that the number of examples that need to be evaluated by LLMs are:
content: 65 | mixed: 45 | format: 140 | situation: 70
but your paper just reports:
content: 50 | mixed: 10 | format: 120 | situation: 55

I am very confused and kindly request your help, thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant