Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we stream responses? #149

Open
mneedham opened this issue Jul 23, 2024 · 11 comments
Open

Can we stream responses? #149

mneedham opened this issue Jul 23, 2024 · 11 comments
Assignees
Labels
[adalflow] suggest core feature New core features in the base classes and optimization
Milestone

Comments

@mneedham
Copy link

Describe the bug

Not sure if this is a bug or if it's not supposed to work this way, but I can't figure out how to stream the response from the LLM.

To Reproduce

from lightrag.core.generator import Generator
from lightrag.components.model_client import OllamaClient

model_client = OllamaClient()
model_kwargs = {"model": "phi3", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
generator({"input_str": "What is the capital of France?"})

Returns:

GeneratorOutput(
    data=None,
    error='Error parsing the completion: <generator object Client._stream at 0x11e388480>',
    usage=None,
    raw_response='<generator object Client._stream at 0x11e388480>',
    metadata=None
)

Expected behavior

I want to be able to iterate over the response and render it as it's produced.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Mac OS Sonoma 14.5

@liyin2015
Copy link
Member

@mneedham I need to add the stream on the model client, let me try to add it

liyin2015 added a commit that referenced this issue Jul 23, 2024
@liyin2015 liyin2015 added this to the V0.1.0 milestone Jul 23, 2024
@liyin2015 liyin2015 added the [adalflow] suggest core feature New core features in the base classes and optimization label Jul 23, 2024
@liyin2015
Copy link
Member

@mneedham its updated, if you update pip to 0.1.0.b5 you should be able to stream the output

output = Generator()
for chunk in output.data:
    print(chunk)

@mneedham
Copy link
Author

Awesome - it works :D Thanks!

@mneedham
Copy link
Author

mneedham commented Jul 24, 2024

I am testing it out with my usual ridiculous prompt!

model_client = OllamaClient(host="http://localhost:11434")
model_kwargs = {"model": "llama3.1", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
output = generator({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
for chunk in output.data:
  print(chunk, end='', flush=True)

What an interesting scenario!

If a lion and an elephant met three dogs and four hyenas, I think it's likely that the outcome would be quite dramatic.

Firstly, the lion would probably take charge of the situation, being the apex predator in the savannah. The
elephants, being gentle giants, might try to stay calm and avoid any confrontation.

However, the presence of the three dogs could potentially cause a commotion. They might bark excitedly at the sight of the big cats, which could distract the lion and give the elephant an opportunity to intervene.

The four hyenas, on the other hand, would likely be more interested in scavenging for food than engaging in
a full-blown battle. They might try to sneak up behind the dogs and steal their scraps, or even attempt to
steal some of the elephant's food.

But if all else fails, I imagine the lion would assert its dominance by chasing after one of the smaller animals (perhaps the dogs?) to show who's boss. The elephant, being a gentle giant, might try to calm everyone down by using its size and presence to intimidate the hyenas into backing off.

Of course, this is all just hypothetical – in reality, each animal would behave according to their natural instincts and survival strategies! What do you think?

@mneedham
Copy link
Author

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

@liyin2015
Copy link
Member

liyin2015 commented Jul 24, 2024

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

@mneedham
Copy link
Author

mneedham commented Jul 24, 2024

Do we need some sort of await in parse_stream_response to handle the acall function?

output = await generator.acall({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x10c9cac20>: argument of type 'async_generator' is not iterable
def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        yield chunk["response"] if "response" in chunk else None

@mneedham
Copy link
Author

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

@liyin2015
Copy link
Member

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

Good, then there is no bug.

@mneedham
Copy link
Author

@liyin2015 does this function also need to check for AsyncGenerator to have it work with the acall function?

    def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> Any:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            return parse_stream_response(completion)
        else:
            return parse_generate_response(completion)

At the moment I get this error when using acall with stream:True:

Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x124d731c0>: argument of type 'async_generator' is not iterable

@mneedham
Copy link
Author

@liyin2015 I tried a fix here, but I only did it for Ollama Client so far

#158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[adalflow] suggest core feature New core features in the base classes and optimization
Projects
None yet
Development

No branches or pull requests

3 participants