-
Notifications
You must be signed in to change notification settings - Fork 15.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: fix CommaSeparatedListOutputParser to handle columns that may contain commas in it #26365
Conversation
… commas in it, used built 'csv' lib
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there! This is a breaking change in certain cases. Would it be possible to only use the new csv-based parsing logic in the event some flag is set on the output parser? e.g. CommaSeparatedListOutputParser(parse_quoted_cells=True)
Yes I can definitely just make it a flag. I didn't realize it was breaking other use cases. I did run the test on this, and I added 1 additional test. So my understanding it covered the previous cases. Before adding a new flag, I would like to see if it's possible to cover for all identified use cases? Can you give me an example where it gets broken? For you convenience here are the current tests are:
Can you tell me what cases where it fails?
|
test_multiple_items_with_comma_existing_behavior note that the behavior will be different using the new flag (then it will match your test) |
Sorry I only looked at this right now. Adding the flag on class level I then tried to see if I can just add the flag on the method level.
This works on the .parse, but fails on .transform. I am not sure what is the purpose of .transform, I guess that is how its called down the chain? Appreciate some guidance on this, my initial thoughts that this would be simple change and even keeping the original behavior but I might be missing something. |
let's count the comma values in double quotes as a bugfix but leave format instructions |
…ontain commas in it (langchain-ai#26365) - **Description:** Currently CommaSeparatedListOutputParser can't handle strings that may contain commas within a column. It would parse any commas as the delimiter. Ex. "foo, foo2", "bar", "baz" It will create 4 columns: "foo", "foo2", "bar", "baz" This should be 3 columns: "foo, foo2", "bar", "baz" - **Dependencies:** Added 2 additional imports, but they are built in python packages. import csv from io import StringIO - **Twitter handle:** @jkyamog - [ ] **Add tests and docs**: 1. added simple unit test test_multiple_items_with_comma --------- Co-authored-by: Erick Friis <[email protected]> Co-authored-by: Bagatur <[email protected]> Co-authored-by: Bagatur <[email protected]>
Currently CommaSeparatedListOutputParser can't handle strings that may contain commas within a column. It would parse any commas as the delimiter.
Ex.
"foo, foo2", "bar", "baz"
It will create 4 columns: "foo", "foo2", "bar", "baz"
This should be 3 columns:
"foo, foo2", "bar", "baz"
Added 2 additional imports, but they are built in python packages.
import csv
from io import StringIO
Twitter handle: @jkyamog
Add tests and docs: