Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to understand the context #35

Open
rpalsaxena opened this issue Feb 29, 2020 · 6 comments
Open

Unable to understand the context #35

rpalsaxena opened this issue Feb 29, 2020 · 6 comments

Comments

@rpalsaxena
Copy link
Contributor

rpalsaxena commented Feb 29, 2020

I tried to test the library on this:

from price_parser import parse_price
parse_price("The price of 2000 plates is $13,004")

Output:

Price(amount=Decimal('2000'), currency='$')

It's unable to understand the context. I have gone through the code in parser.py, it's using regex to extract prices as well as the currency symbol.
Could you please share the vision of creating this library?

  1. Was it to process a column of a database that contains the prices like this:
    image

This way it can definitely resolve the minor human errors and help data scientists while preprocessing.

Or
The plan was to create a library that can extract prices from normal text like this:
"The price of 2000 plates is $13,004"?

@Gallaecio
Copy link
Member

I believe your use case is within the scope of the library.

@lopuhin
Copy link
Member

lopuhin commented Mar 12, 2020

I think support of inputs like "The price of 2000 plates is $13,004" could require a very different approach from what is currently implemented, although it would be nice to have.

To provide more context, this library was primarily developed to parse prices coming from website elements, where usually an element contains only the price with little extra text, while this example with a free-form text is more like a NER (named entity recognition) followed by price extraction.

@rpalsaxena
Copy link
Contributor Author

That makes sense @lopuhin !
Using the library for extracting the price from an HTML element with some noise is a reasonable choice.

@lopuhin
Copy link
Member

lopuhin commented Mar 12, 2020

Let's keep the issue open in case we can support this or someone has other opinion.

@lopuhin lopuhin reopened this Mar 12, 2020
@Gallaecio
Copy link
Member

Oh, I thought we had merged it already: #5 may fix this.

@Ayush-iitkgp
Copy link

Hey @Gallaecio, could you please review and merge the pull request (#5), I am running into similar problems. It might help me as well :)
TIA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants