Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incomplete extracted result #717

Open
haluwong opened this issue Sep 20, 2024 · 4 comments
Open

incomplete extracted result #717

haluwong opened this issue Sep 20, 2024 · 4 comments

Comments

@haluwong
Copy link

Hi All, we have a 7 pages pdf which is a delivery note and we would like to get the item information on it.
There are 13 items but unstract only extract 6 items.
I can use the prompt to get the total number of items, meaning all the pages are extracted.

But for the details, it cannot extract all the data.
Here is my prompt:

Extract the following details from the text and format them into JSON:

Part Number: The value that appears immediately before "UPC:". Ensure it is not the value after "CPU:". (e.g., 960-001312, PC-LABEL, UCSC-C220-M6S)
Ship Qty
Order Qty
SKU
Description
Serial Numbers
Return the result in JSON format as an array of objects, each containing:

"part_number"
"order_qty"
"ship_qty"
"sku"
"description"
"serial_numbers"

screen_20240920_03

screen_20240920_05

item after "007" cannot be extracted.
is there any limitation on the output size?

Here is the json output for the above prompt
result.json

@VikashPratheepan
Copy link
Contributor

Yes @haluwong - The gpt-4 model is having an output token limit of 4096.
You need to choose a model with higher output token limit.

@ashwanthkumar
Copy link

@VikashPratheepan -- Curious, how do we handle data extraction that is larger than LLM model's output token limit? I mean most LLMs are going big in input size and not so much on Output.

@shuveb
Copy link
Contributor

shuveb commented Oct 21, 2024

@ashwanthkumar we handle this by internally splitting the context, making multiple requests and responding with a concatenated result. However, this feature is only available in the enterprise version.

@cwikio
Copy link

cwikio commented Nov 5, 2024

I have the same problem. This should never happen since chatgpt simply asks you if you want to proceed. why is this feature not inbuild into the software? It makes it nearly useless for anything of useful size, perhaps apart from receipts and short bank statements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@ashwanthkumar @shuveb @haluwong @cwikio @VikashPratheepan and others