Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes token logging in callbacks when streaming=True is used. #241

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

3coins
Copy link
Collaborator

@3coins 3coins commented Oct 15, 2024

Fixes #240
Fixes #217

Code to verify

from langchain_aws import ChatBedrock
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

streaming = True

class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")

prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal} in a few words."])

model = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    streaming = streaming, 
    callbacks=[MyCustomHandler()]
)

chain = prompt | model

response = chain.invoke({"animal": "bears"})

Output

My custom handler, token: 
My custom handler, token: Bear
My custom handler, token: -
My custom handler, token: ly funny
My custom handler, token: .
My custom handler, token: 
My custom handler, token: 

@3coins 3coins marked this pull request as ready for review October 15, 2024 20:06
@3coins
Copy link
Collaborator Author

3coins commented Oct 15, 2024

@wuodar @supreetkt
Can you folks verify. Thanks.

@supreetkt
Copy link

Btw, this check also seems to be missing from SagemakerEndpoint class. So there might be empty tokens there as well.

@3coins
Copy link
Collaborator Author

3coins commented Oct 16, 2024

@supreetkt
Thanks for those suggestions. Upon checking some other partner implementations, there doesn't seem to be any filtering of messages in their code based on the content. There are valid cases in which other values in the callbacks might be useful, for example the response_metadata.

I will look separately into the responses to see if we are yielding extra chunks that shouldn't be exposed. As far as this PR is concerned, I believe this fixes the raised issue. Let me know if you have any other suggestions.

@supreetkt
Copy link

supreetkt commented Oct 16, 2024

@supreetkt Thanks for those suggestions. Upon checking some other partner implementations, there doesn't seem to be any filtering of messages in their code based on the content. There are valid cases in which other values in the callbacks might be useful, for example the response_metadata.

I will look separately into the responses to see if we are yielding extra chunks that shouldn't be exposed. As far as this PR is concerned, I believe this fixes the raised issue. Let me know if you have any other suggestions.

@ihmaws raised a similar concern this morning, and when I took a look at the generated chunks, they only contained stop reason and for our use case, having additional callback invocation for the websocket meant additional calls which add to additional costs. I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk:

message=AIMessageChunk(content='', additional_kwargs={}, response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None})

All this information is already packaged as a part of the final response:

content='Grizzly business.' additional_kwargs={} response_metadata={'stop_reason': 'end_turn', 'stop_sequence': None} id='run-xxx' usage_metadata={'input_tokens': 18, 'output_tokens': 9, 'total_tokens': 27}

At the end its your call based on your experience with more callback options. I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

@3coins
Copy link
Collaborator Author

3coins commented Oct 16, 2024

@supreetkt

I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk

Can you provide a link to that implementation.

I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

Yes, I think this can be handled easily by the callbacks based on their use case.

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

I will tackle this in a separate PR, but in general streaming attribute is legacy and not required at-all, only there for backwards compatibility, since streaming is natively supported by LCEL and Runnables, and callbacks approach should be discouraged IMO.

@supreetkt
Copy link

supreetkt commented Oct 16, 2024

@supreetkt

I checked one partner implementation and they seemed to be checking only to ensure that the output is a string. Here is a sample empty generated chunk

Can you provide a link to that implementation.

I can just handle the same logic on my end within our custom implementation of the on_llm_new_token too :)

Yes, I think this can be handled easily by the callbacks based on their use case.

I had one more concern: I tried using ChatBedrock with beta_use_converse_api set to True and it doesn't stream responses. I think its missing the usage of on_llm_new_token as well.

I will tackle this in a separate PR, but in general streaming attribute is legacy and not required at-all, only there for backwards compatibility, since streaming is natively supported by LCEL and Runnables, and callbacks approach should be discouraged IMO.

Here is the link. This logic allows for empty string.

Also, if you're tackling the converse API issue in a separate PR, this LGTM.

@3coins 3coins merged commit 902252b into langchain-ai:main Oct 17, 2024
12 checks passed
@3coins 3coins deleted the fix-callbacks-with-streaming branch December 4, 2024 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants