Erroneous Text output for IE task #21

riteshKumarUMass · 2022-08-12T15:32:17Z

Hi,
I tried fine tuning the model with custom receipt dataset for IE task and noticed issues with the output text extracted for given set of keys. It either misses out or add extra 1-2 characters to the actual text present in the document and this pattern is very frequent. I am using the default input_size: [1280, 960]. The images are really clear where any other off the shelf OCR model is able to extract text with no errors. I fine-tuned the model with 400 images with 15 keys and tested it on 100 samples. Has anyone encountered such issue?

gwkrsrch · 2022-08-17T01:43:02Z

Hi,
It will depend on the data/task. If you can share the input image with errors (and any additional helpful information), we may find a reason/solution faster. Given the limited information, I can give you a basic checklist:

Did your model training converge? You may need to adjust some hyperparameters, including the number of epochs, the input resolution of the encoder, or the max length of the decoder.
What is the target language? Is it okay with the current token vocabulary? Check Tips for training base model from scratch on smaller amount of datasets #11 also.

Plus, there is a hands-on tutorial at this link that might be helpful to you. In addition, checking other reported/resolved issues in this repository will also be useful. Hope this helps :)

gwkrsrch · 2022-08-24T02:41:45Z

Close this issue since there has been no update for a long time. Feel free to reopen it if you have anything new for sharing and debugging :)

gwkrsrch closed this as completed Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erroneous Text output for IE task #21

Erroneous Text output for IE task #21

riteshKumarUMass commented Aug 12, 2022

gwkrsrch commented Aug 17, 2022

gwkrsrch commented Aug 24, 2022

Erroneous Text output for IE task #21

Erroneous Text output for IE task #21

Comments

riteshKumarUMass commented Aug 12, 2022

gwkrsrch commented Aug 17, 2022

gwkrsrch commented Aug 24, 2022