Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between image and PDF #33

Open
gagaein opened this issue Dec 26, 2020 · 1 comment
Open

Mismatch between image and PDF #33

gagaein opened this issue Dec 26, 2020 · 1 comment

Comments

@gagaein
Copy link

gagaein commented Dec 26, 2020

Firstly, thank you for your useful dataset.
I have download Publaynet in forms in image and PDF. But I noticed that the image and PDF of the same page are NOT the same size. For example, the size of PDF file is 600.05792, but the JPG image's size is 602792. So the annotation should be sightly different for these 2 type of files.
How can I solve this problem? Thank you again!

@ajjimeno
Copy link
Member

ajjimeno commented Jan 6, 2021

Hi gagaein, we prepared the data set to identify the layout from images directly. We do not have the data to resize the images to match the original PDF page. I am wondering if a scaling factor between the image and the PDF page could be estimated and then the annotations scaled back to the PDF page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants