Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 1.38 KB

README.md

File metadata and controls

46 lines (40 loc) · 1.38 KB

Python-CSPDF

Python Check Similarity PDF from active directory and store it to csv file. Project inspired by diff-pdf

Installation

pip install -r requirements.py

Before Use !!

  1. Install all required depedencies.
  2. Copy cspdf.py into directory that contains pdf file to be compared.
  3. Run cspdf.py script.
  4. Note: This script just work on pdf files only, if you have word document please convert it into pdf first.

Usage

  1. Check similarity all pdf files on current active directory
    python cspdf.py -a -o comparison.csv
  2. Check similarity one pdf file then compare with all pdf files on current active directory
    python cspdf.py -t a.pdf -o comparison.csv
  3. Check similarity including image comparison (slow processing)
    # Just add -i or --image argument
    python cspdf.py -i -t a.pdf -o comparison.csv
  4. Get help
    python cspdf.py -h

Similarity Check Methods

  1. Text similarity with Sequence Matcher
  2. Image similarity with Structural Similarity Index (SSIM)

Libraries

  1. PDFMiner
  2. PyMuPDF
  3. OpenCV Python
  4. Scikit Image
  5. TQDM Progress Bar

Credits

Made by Zavier, enjoyy ✨