Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created PDF from folder dont't follow images #6

Open
DigitLib opened this issue Aug 19, 2021 · 1 comment
Open

Created PDF from folder dont't follow images #6

DigitLib opened this issue Aug 19, 2021 · 1 comment

Comments

@DigitLib
Copy link

There are 6 xml and jpg named 01...06
Created PDF don't follow this but goes random, e.g.
Page 1 -> 04.jpg
Page 2 -> 03.jpg
Page 3 -> 01.jpg
Page 4 -> 06.jpg
Page 5 -> 02.jpg
Page 6 -> 05.jpg

Is there any convention how to name files?

Thank you!

@stweil
Copy link

stweil commented Apr 12, 2024

The tool processes the PAGE XML files of a directory without sorting the directory entries. Try ls -U to get the unsorted list of directory entries. So the page order in the PDF is not random, but can be unexpected. This happens especially if the PAGE XML were not created in a sorted order, for example when running parallel OCR processes.

I used a trick to get sorted entries. Run these commands in the directory with the (unsorted) PAGE XML files:

mkdir sorted
cd sorted
ln -s ../*.xml .

Then use the newly created directory sorted instead of the unsorted directory.

Of course it would be much better to fix the code and sort the directory entries there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants