Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix incorrect impl of flow (streaming gen.) for local LLMs #381

Merged
merged 7 commits into from
Sep 6, 2023

Conversation

Intex32
Copy link
Member

@Intex32 Intex32 commented Sep 4, 2023

When playing with local models, I noticed that the generation (returning a flow) is not printed in real time to the console. After investigation, I figured out this was because of the coroutineScope in the Gpt4All file that waits until all child jobs are done.

I am not very fluent with the Flow API, still I believe that I improved the code rather than worsen. Feedback is appreciated. Also consider my comments.

val message = assistant(buffer.toString())
.map { it.choices.mapNotNull { it.delta?.content }.reduce(String::plus) }
.onEach { emit(it) }
.fold("", String::plus)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to achieve the same as the onEach and fold? Currently, each item is emitted to another outer flow.
This is required because we have to fold the flow and get the full generated string in order to add it to the memory. On the other hand, we want to pass each item forward to the caller of the function so that they can do whatever the want to do with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kind-of depends on the desired behavior of the resulting Flow.
onEach will back-pressure the fold, so the resulting Flow can still back-pressure the fold which is desired.

onEach is the same of map { f(it); it }.

My concern is that fold will take everything in memory, but I guess that's inherit to MemoryManagement.addMemoriesAfterStream. Just going by the name it sounds like it puts everything in memory anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the goal is indeed to get the full string of generated text and store in the conversation history. At the same time I want to forward every individual item to the outer flow.
I came to the conclusion there is no better way.

If you have time, can you explain what you mean by the terminology of back-pressure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Intex32 I explain back-pressure a little bit here. https://youtu.be/n724AdSAkOI
Let me know if this clears it up for you, if not we can discuss it further ☺️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nomisRev I have more questions than before 😂. I especially liked the part where you talked about back pressure.
The pen has some classy dance moves tbh.

@Intex32 Intex32 marked this pull request as ready for review September 4, 2023 09:40
@Intex32 Intex32 changed the title fix incorrect impl of flow (streaming gen.) for local embeddings fix incorrect impl of flow (streaming gen.) for local LLMs Sep 4, 2023
Montagon
Montagon previously approved these changes Sep 4, 2023
@Intex32
Copy link
Member Author

Intex32 commented Sep 4, 2023

@nomisRev Could you perhaps have a look? in particular in respect to my comments

Copy link
Contributor

@nomisRev nomisRev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @Intex32! Great work 🙌

@Intex32 Intex32 merged commit d171ed0 into main Sep 6, 2023
@Intex32 Intex32 deleted the local-llm-improvements branch September 6, 2023 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants