Transkribus Document Uploader

This Python script automates the process of uploading multiple documents with image and xml files to a Transkribus collection using Transkribus' REST API. It is tailored to mass uploading the results of running the Loghi pipeline.

Requirements

Python 3.11
requests library

You can install requests using pip:

pip install requests

Setup

Clone this repository:

git clone https://github.com/cconzen/TranskribusBatchUpload.git

Navigate to the project directory:

cd TranskribusBatchUpload

Edit main.py to include your Transkribus account details and the ID of the destination collection:

collection_id = "YOUR_COLLECTION_ID"
username = "YOUR_EMAIL"
password = "YOUR_PASSWORD"

Set the base directory for processing documents:

base_dir = 'PATH/TO/DIRECTORY'

Usage

Run the script by executing the following command:

python main.py

The script will:

Log in to your Transkribus account.
Process the specified directory and its subdirectories to find image files and their corresponding XML files.
Create a new job for each directory, uploading it as a document in your Transkribus collection.

Directory Structure

The directory you process should follow this structure:

base_dir/
│
└───document_name/
    │
    ├───page001.jpg
    ├───page002.jpg
    ├───page/
        ├── page001.xml
        └── page002.xml

document_name: The name of the directory will be used as the document name in Transkribus.
image1.jpg, image2.jpg: Image files representing pages of the document.
image1.xml, image2.xml: pageXML files.

Notes

Ensure that your images are in .jpg format and your XML files are named correctly.
The XML files should be located inside a page subdirectory under the same directory as the images.
The pages are sorted based on their names. Make sure they follow a naming convention which makes them sortable, or adjust the code to pay mind to the naming conventions of your files.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transkribus Document Uploader

Requirements

Setup

Usage

Directory Structure

Notes

License

About

Languages

License

cconzen/TranskribusBatchUpload

Folders and files

Latest commit

History

Repository files navigation

Transkribus Document Uploader

Requirements

Setup

Usage

Directory Structure

Notes

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages