A Python-based tool for merging PDF files with automatic table of contents generation. Designed to help researchers and students organize their research papers and documents.
- Merge multiple PDF files into a single document
- Automatically generate a table of contents
- Create a title page with customizable metadata
- Handle special characters in filenames
- Preserve original PDF content and formatting
- Support for long filenames with proper wrapping
- Automatic page numbering
-
Clone the repository:
git clone https://github.com/wahidyankf/sourcerer.git cd sourcerer
-
Install dependencies using Poetry:
poetry install
You can use Sourcerer through Poetry to merge PDF files:
# Merge PDFs in a directory with default output name (merged_pdfs.pdf)
poetry run python cli.py --merge-pdf -dir /path/to/pdfs
# Merge PDFs with a custom output filename
poetry run python cli.py --merge-pdf -dir /path/to/pdfs -n custom_output.pdf
--merge-pdf
: Activate PDF merging functionality-dir, --directory
: Path to directory containing PDF files to merge-n, --name
: Optional output filename (default: merged_pdfs.pdf)
-
Place your PDF files in a directory:
research_papers/ ├── paper1.pdf ├── paper2.pdf └── paper3.pdf
-
Run Sourcerer:
poetry run python cli.py --merge-pdf -dir ./research_papers -n research_collection.pdf
-
The output will be a single PDF file (
research_collection.pdf
) containing:- A title page
- A table of contents with page numbers
- All your PDFs merged in sequence with preserved formatting
- Python 3.12 or higher
- Poetry (Python package manager)
- Node.js 20 or higher (for development tools)
-
Clone the repository:
git clone https://github.com/wahidyankf/sourcerer.git cd sourcerer
-
Install dependencies:
npm install # Install development tools npm run install # Install Python dependencies via Poetry
npm run format
- Format code using Black and isortnpm test
- Run tests and type checkingnpm run watch
- Watch for changes and run tests automatically
The project uses pytest for testing. Run the test suite with:
npm test
Tests cover various aspects including:
- PDF merging functionality
- Table of contents generation
- Title page creation
- Special character handling
- File naming and path management
The project uses GitHub Actions for continuous integration, which:
- Runs on Ubuntu latest
- Tests with Python 3.12
- Checks code formatting
- Runs the test suite
- Provides test results as artifacts
Wahidyan Kresna Fridayoka [email protected]