From a06fe9f15c237f5b725d25e0017d0bde6b5261ed Mon Sep 17 00:00:00 2001 From: Jonathan Tow <41410219+jon-tow@users.noreply.github.com> Date: Tue, 24 Jan 2023 23:30:26 -0500 Subject: [PATCH] Update stale comment from results table (#222) * Remove stale comment from results table * Add details --- examples/summarize_rlhf/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/summarize_rlhf/README.md b/examples/summarize_rlhf/README.md index 7f0dfb8f1..49369e0f2 100644 --- a/examples/summarize_rlhf/README.md +++ b/examples/summarize_rlhf/README.md @@ -40,7 +40,7 @@ For an in-depth description of the example, please refer to our [blog post](http ### Results -On 1,000 samples from CNN/DailyMail test dataset: +The following tables display ROUGE and reward scores on the test set of the TL;DR dataset between SFT and PPO models. 1. SFT vs PPO