Evaluation issue with downstream evaluation codes #17

wbaek · 2022-03-02T06:00:00Z

We have been reported to have issues with our downstream evaluation due to issues such as the following link.
haven-jeon/KoGPT2-subtasks#1

We investigated the range that affects the problem, and it was confirmed that there was only a problem with the NSMC finetuning accuracy among the following evaluation tables.

Models	#params	method	NSMC (Acc.)	KorSTS(spearman)
SKT-AI/KoGPT-2 2.0[2]	125M	`finetuning`	93.3	78.4
SKT-AI/KoGPT-2 Trinity[3]	1.2B	`finetuning`	93.2	83.4
HyperCLOVA[1]	1.3B	`p-tuning`	91.7	-
HyperCLOVA[1]	39.0B	`p-tuning`	93.0	-
Ours	6.0B	`finetuning`	95.7	85.3

We plan to share the evaluation results that solved the problem as soon as possible.

The text was updated successfully, but these errors were encountered:

wbaek · 2022-03-02T08:38:08Z

Models	#params	method	NSMC (Acc.)
SKT-AI/KoGPT-2 2.0[2]	125M	`finetuning`	89.0*
SKT-AI/KoGPT-2 Trinity[3]	1.2B	`finetuning`	91.1*
HyperCLOVA[1]	1.3B	`p-tuning`	91.7
HyperCLOVA[1]	39.0B	`p-tuning`	93.0
Ours	6.0B	`finetuning`	91.7

We conducted this experiments using [4], with same hyper-parameters as modified code.
* indicates that the modified code was re-experimented on a publicly available pre-trained GPT model.

[1] HyperCLOVA: Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).
[2] SKT-AI/KoGPT-2 2.0: "SKT-AI/KoGPT2: Korean GPT-2 pretrained cased (KoGPT2)." https://github.com/SKT-AI/KoGPT2 (2021).
[3] SKT-AI/KoGPT-2 Trinity: "Ko-GPT-Trinity 1.2B." https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5 (2021).
[4] KoGPT2-subtasks: "KoGPT2 v2.0 한국어 평가 모듈" https://github.com/haven-jeon/KoGPT2-subtasks (2021).

wbaek added bug Something isn't working evaluation labels Mar 2, 2022

wbaek self-assigned this Mar 2, 2022

wbaek closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation issue with downstream evaluation codes #17

Evaluation issue with downstream evaluation codes #17

wbaek commented Mar 2, 2022

wbaek commented Mar 2, 2022 •

edited

Loading

Evaluation issue with downstream evaluation codes #17

Evaluation issue with downstream evaluation codes #17

Comments

wbaek commented Mar 2, 2022

wbaek commented Mar 2, 2022 • edited Loading

wbaek commented Mar 2, 2022 •

edited

Loading