feat: add input chunking to summarize task #157

edward-ly · 2024-11-19T23:13:21Z

Closes #150. Lightly tested with gpt-4o, gpt-4o-mini, and gpt-3.5-turbo.

julien-nc

👍

The admin setting input could be moved in the "Text generation" section. Wdyt?

lib/TaskProcessing/SummaryProvider.php

lib/Service/OpenAiSettingsService.php

edward-ly · 2024-11-20T16:54:29Z

The admin setting input could be moved in the "Text generation" section. Wdyt?

Maybe the chunk size could also be considered as a usage limit, but since we only use it on the summarize task, I think it would make sense to move it there too.

lib/Service/OpenAiSettingsService.php

lib/TaskProcessing/SummaryProvider.php

marcelklehr

The code looks really good. The algorithm deviates a little from what llm2 does, though: In llm2 the summaries of the chunks are concatenated and fed through the same algorithm again (ie. chunked and summarized) until there is only one summary left.

marcelklehr

Oops, clicked the wrong button.

edward-ly · 2024-11-22T19:12:28Z

The code looks really good. The algorithm deviates a little from what llm2 does, though: In llm2 the summaries of the chunks are concatenated and fed through the same algorithm again (ie. chunked and summarized) until there is only one summary left.

Now that you mention it, I do see the similar loop in llm2 now, fixed.

lib/TaskProcessing/SummaryProvider.php

marcelklehr · 2024-11-29T08:07:44Z

I'm about to change the algorithm in llm2 slightly, so we should change it here as well a last time: When the input is shorter than the chunk size it should still go through the llm summarizer once.

lib/AppInfo/Application.php

julien-nc

👍 A few adjustments and good to go!

lib/TaskProcessing/SummaryProvider.php

src/components/AdminSettings.vue

lib/TaskProcessing/SummaryProvider.php

github-actions · 2024-12-04T02:47:00Z

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

marcelklehr · 2024-12-05T08:22:43Z

Looks good and works in my tests. It seems we lost multilingual support though. When summarizing a German text that is longer than the chunk size the resulting summary was in english in my test.

edward-ly · 2024-12-05T16:12:46Z

Looks good and works in my tests. It seems we lost multilingual support though. When summarizing a German text that is longer than the chunk size the resulting summary was in english in my test.

In my testing, summarizing English text produced output in either Spanish or Italian, hence the change in prompt. If you want me to revert that change for now, we can do that.

marcelklehr · 2024-12-05T16:20:27Z

In my testing, summarizing English text produced output in either Spanish or Italian

Oh, that's not intended of course :/ Ideally we should find a prompt that works for all languages to produce the summary in the same language as the input.

edward-ly · 2024-12-06T05:16:57Z

Changed the prompt again after some basic prompt engineering and experimenting. Hopefully, this will produce more reliable results.

lib/TaskProcessing/SummaryProvider.php

marcelklehr · 2024-12-09T11:21:14Z

Thanks for these adjustments, works well for me now!

Signed-off-by: Edward Ly <[email protected]>

…ings service Signed-off-by: Edward Ly <[email protected]>

…nguage boundaries Signed-off-by: Edward Ly <[email protected]>

…PI, update prompts Addresses an issue where the input and output text are not always in the same language Signed-off-by: Edward Ly <[email protected]>

Signed-off-by: Edward Ly <[email protected]>

edward-ly · 2024-12-19T20:39:05Z

Rebased and adjusted after #167.

marcelklehr · 2024-12-20T06:34:56Z

Woop woop 🎉
Thanks @edward-ly !

edward-ly requested a review from julien-nc November 19, 2024 23:13

julien-nc requested changes Nov 20, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

lib/Service/OpenAiSettingsService.php Outdated Show resolved Hide resolved

julien-nc requested a review from marcelklehr November 21, 2024 13:12

marcelklehr reviewed Nov 22, 2024

View reviewed changes

lib/Service/OpenAiSettingsService.php Show resolved Hide resolved

marcelklehr reviewed Nov 22, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

marcelklehr approved these changes Nov 22, 2024

View reviewed changes

marcelklehr requested changes Nov 22, 2024

View reviewed changes

marcelklehr reviewed Nov 23, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

marcelklehr reviewed Nov 29, 2024

View reviewed changes

lib/AppInfo/Application.php Outdated Show resolved Hide resolved

julien-nc requested changes Nov 29, 2024

View reviewed changes

marcelklehr reviewed Nov 29, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

github-actions bot added the feedback-requested label Dec 4, 2024

edward-ly force-pushed the feat/chunk-size branch 2 times, most recently from d9f8184 to 344c92c Compare December 5, 2024 00:41

edward-ly force-pushed the feat/chunk-size branch from 344c92c to 2a45a02 Compare December 6, 2024 05:15

marcelklehr reviewed Dec 6, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

marcelklehr reviewed Dec 6, 2024

View reviewed changes

lib/TaskProcessing/SummaryProvider.php Outdated Show resolved Hide resolved

edward-ly force-pushed the feat/chunk-size branch 2 times, most recently from 31940fb to 79189c6 Compare December 7, 2024 01:14

marcelklehr approved these changes Dec 9, 2024

View reviewed changes

edward-ly requested a review from julien-nc December 9, 2024 16:05

julien-nc approved these changes Dec 10, 2024

View reviewed changes

edward-ly force-pushed the feat/chunk-size branch from cc30634 to fffe764 Compare December 19, 2024 06:40

edward-ly added 5 commits December 19, 2024 08:55

feat(AdminSettings): add max input chunk size parameter for summary task

4ac9c03

Signed-off-by: Edward Ly <[email protected]>

fix(SummaryProvider): get OpenAI completion model parameter from sett…

0eb63ee

…ings service Signed-off-by: Edward Ly <[email protected]>

feat(SummaryProvider): split user prompt into chunks along natural la…

72a20d7

…nguage boundaries Signed-off-by: Edward Ly <[email protected]>

fix(SummaryProvider): use summary prompt as system prompt in OpenAI A…

e91e161

…PI, update prompts Addresses an issue where the input and output text are not always in the same language Signed-off-by: Edward Ly <[email protected]>

fix(OpenAiProviderTest): update summary prompt

5c78917

Signed-off-by: Edward Ly <[email protected]>

edward-ly force-pushed the feat/chunk-size branch from fffe764 to 5c78917 Compare December 19, 2024 20:28

edward-ly merged commit 15b93f9 into main Dec 19, 2024
7 checks passed

edward-ly deleted the feat/chunk-size branch December 20, 2024 06:37

julien-nc mentioned this pull request Jan 7, 2025

prepare 3.3.0 #171

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add input chunking to summarize task #157

feat: add input chunking to summarize task #157

edward-ly commented Nov 19, 2024

julien-nc left a comment

edward-ly commented Nov 20, 2024

marcelklehr left a comment

marcelklehr left a comment

edward-ly commented Nov 22, 2024

marcelklehr commented Nov 29, 2024

julien-nc left a comment

github-actions bot commented Dec 4, 2024

marcelklehr commented Dec 5, 2024

edward-ly commented Dec 5, 2024

marcelklehr commented Dec 5, 2024

edward-ly commented Dec 6, 2024

marcelklehr commented Dec 9, 2024

edward-ly commented Dec 19, 2024

marcelklehr commented Dec 20, 2024

feat: add input chunking to summarize task #157

feat: add input chunking to summarize task #157

Conversation

edward-ly commented Nov 19, 2024

julien-nc left a comment

Choose a reason for hiding this comment

edward-ly commented Nov 20, 2024

marcelklehr left a comment

Choose a reason for hiding this comment

marcelklehr left a comment

Choose a reason for hiding this comment

edward-ly commented Nov 22, 2024

marcelklehr commented Nov 29, 2024

julien-nc left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 4, 2024

marcelklehr commented Dec 5, 2024

edward-ly commented Dec 5, 2024

marcelklehr commented Dec 5, 2024

edward-ly commented Dec 6, 2024

marcelklehr commented Dec 9, 2024

edward-ly commented Dec 19, 2024

marcelklehr commented Dec 20, 2024