Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation issue with downstream evaluation codes #17

Closed
wbaek opened this issue Mar 2, 2022 · 1 comment
Closed

Evaluation issue with downstream evaluation codes #17

wbaek opened this issue Mar 2, 2022 · 1 comment
Assignees
Labels
bug Something isn't working evaluation

Comments

@wbaek
Copy link
Contributor

wbaek commented Mar 2, 2022

We have been reported to have issues with our downstream evaluation due to issues such as the following link.
haven-jeon/KoGPT2-subtasks#1

We investigated the range that affects the problem, and it was confirmed that there was only a problem with the NSMC finetuning accuracy among the following evaluation tables.

Models #params method NSMC (Acc.) KorSTS(spearman)
SKT-AI/KoGPT-2 2.0[2] 125M finetuning 93.3 78.4
SKT-AI/KoGPT-2 Trinity[3] 1.2B finetuning 93.2 83.4
HyperCLOVA[1] 1.3B p-tuning 91.7 -
HyperCLOVA[1] 39.0B p-tuning 93.0 -
Ours 6.0B finetuning 95.7 85.3

We plan to share the evaluation results that solved the problem as soon as possible.

@wbaek wbaek added bug Something isn't working evaluation labels Mar 2, 2022
@wbaek wbaek self-assigned this Mar 2, 2022
@wbaek
Copy link
Contributor Author

wbaek commented Mar 2, 2022

Models #params method NSMC (Acc.)
SKT-AI/KoGPT-2 2.0[2] 125M finetuning 89.0*
SKT-AI/KoGPT-2 Trinity[3] 1.2B finetuning 91.1*
HyperCLOVA[1] 1.3B p-tuning 91.7
HyperCLOVA[1] 39.0B p-tuning 93.0
Ours 6.0B finetuning 91.7
  • We conducted this experiments using [4], with same hyper-parameters as modified code.
  • * indicates that the modified code was re-experimented on a publicly available pre-trained GPT model.

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021).
[4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).

@wbaek wbaek closed this as completed Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working evaluation
Projects
None yet
Development

No branches or pull requests

1 participant