Can we stream responses? #149

mneedham · 2024-07-23T08:25:15Z

Describe the bug

Not sure if this is a bug or if it's not supposed to work this way, but I can't figure out how to stream the response from the LLM.

To Reproduce

from lightrag.core.generator import Generator
from lightrag.components.model_client import OllamaClient

model_client = OllamaClient()
model_kwargs = {"model": "phi3", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
generator({"input_str": "What is the capital of France?"})

Returns:

GeneratorOutput(
    data=None,
    error='Error parsing the completion: <generator object Client._stream at 0x11e388480>',
    usage=None,
    raw_response='<generator object Client._stream at 0x11e388480>',
    metadata=None
)

Expected behavior

I want to be able to iterate over the response and render it as it's produced.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Mac OS Sonoma 14.5

The text was updated successfully, but these errors were encountered:

liyin2015 · 2024-07-23T17:20:24Z

@mneedham I need to add the stream on the model client, let me try to add it

liyin2015 · 2024-07-23T23:56:13Z

@mneedham its updated, if you update pip to 0.1.0.b5 you should be able to stream the output

output = Generator()
for chunk in output.data:
    print(chunk)

mneedham · 2024-07-24T17:50:02Z

Awesome - it works :D Thanks!

mneedham · 2024-07-24T17:51:13Z

I am testing it out with my usual ridiculous prompt!

model_client = OllamaClient(host="http://localhost:11434")
model_kwargs = {"model": "llama3.1", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
output = generator({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
for chunk in output.data:
  print(chunk, end='', flush=True)

What an interesting scenario!

If a lion and an elephant met three dogs and four hyenas, I think it's likely that the outcome would be quite dramatic.

Firstly, the lion would probably take charge of the situation, being the apex predator in the savannah. The
elephants, being gentle giants, might try to stay calm and avoid any confrontation.

However, the presence of the three dogs could potentially cause a commotion. They might bark excitedly at the sight of the big cats, which could distract the lion and give the elephant an opportunity to intervene.

The four hyenas, on the other hand, would likely be more interested in scavenging for food than engaging in
a full-blown battle. They might try to sneak up behind the dogs and steal their scraps, or even attempt to
steal some of the elephant's food.

But if all else fails, I imagine the lion would assert its dominance by chasing after one of the smaller animals (perhaps the dogs?) to show who's boss. The elephant, being a gentle giant, might try to calm everyone down by using its size and presence to intimidate the hyenas into backing off.

Of course, this is all just hypothetical – in reality, each animal would behave according to their natural instincts and survival strategies! What do you think?

mneedham · 2024-07-24T18:19:50Z

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

liyin2015 · 2024-07-24T18:21:40Z

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

mneedham · 2024-07-24T18:50:15Z

Do we need some sort of await in parse_stream_response to handle the acall function?

output = await generator.acall({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x10c9cac20>: argument of type 'async_generator' is not iterable

def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        yield chunk["response"] if "response" in chunk else None

mneedham · 2024-07-24T18:52:25Z

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

liyin2015 · 2024-07-24T19:29:26Z

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

Good, then there is no bug.

mneedham · 2024-07-24T20:01:51Z

@liyin2015 does this function also need to check for AsyncGenerator to have it work with the acall function?

    def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> Any:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            return parse_stream_response(completion)
        else:
            return parse_generate_response(completion)

At the moment I get this error when using acall with stream:True:

Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x124d731c0>: argument of type 'async_generator' is not iterable

mneedham · 2024-07-25T06:18:47Z

@liyin2015 I tried a fix here, but I only did it for Ollama Client so far

#158

liyin2015 · 2025-02-10T19:55:17Z

@mneedham is the issue addressed?

liyin2015 added a commit that referenced this issue Jul 23, 2024

implement issue: #149

7c33beb

liyin2015 added this to the V0.1.0 milestone Jul 23, 2024

liyin2015 added the [adalflow] suggest core feature New core features in the base classes and optimization label Jul 23, 2024

liyin2015 assigned liyin2015 and Alleria1809 Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we stream responses? #149

Can we stream responses? #149

mneedham commented Jul 23, 2024

liyin2015 commented Jul 23, 2024

liyin2015 commented Jul 23, 2024

mneedham commented Jul 24, 2024

mneedham commented Jul 24, 2024 •

edited

Loading

mneedham commented Jul 24, 2024

liyin2015 commented Jul 24, 2024 •

edited

Loading

mneedham commented Jul 24, 2024 •

edited

Loading

mneedham commented Jul 24, 2024

liyin2015 commented Jul 24, 2024

mneedham commented Jul 24, 2024

mneedham commented Jul 25, 2024

liyin2015 commented Feb 10, 2025

Can we stream responses? #149

Can we stream responses? #149

Comments

mneedham commented Jul 23, 2024

liyin2015 commented Jul 23, 2024

liyin2015 commented Jul 23, 2024

mneedham commented Jul 24, 2024

mneedham commented Jul 24, 2024 • edited Loading

mneedham commented Jul 24, 2024

liyin2015 commented Jul 24, 2024 • edited Loading

mneedham commented Jul 24, 2024 • edited Loading

mneedham commented Jul 24, 2024

liyin2015 commented Jul 24, 2024

mneedham commented Jul 24, 2024

mneedham commented Jul 25, 2024

liyin2015 commented Feb 10, 2025

mneedham commented Jul 24, 2024 •

edited

Loading

liyin2015 commented Jul 24, 2024 •

edited

Loading

mneedham commented Jul 24, 2024 •

edited

Loading