Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linearize PDFs for better metadata removal #111

Open
bhadaway opened this issue Dec 23, 2020 · 3 comments
Open

Linearize PDFs for better metadata removal #111

bhadaway opened this issue Dec 23, 2020 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bhadaway
Copy link

I read that you wish to keep this app minimalistic, and as someone who shares the same philosophy, I can appreciate that.

I'm wondering if adding linearization of PDF files (so that meta data is actually removed), would be within that scope, or overkill?

Here, someone is using QPDF to compliment ExifTool to accomplish that:

https://blog.joshlemon.com.au/protecting-your-pdf-files-and-metadata/

@szTheory
Copy link
Owner

Thanks for bringing this to my attention! I have to look into this more but it sounds promising. I suppose we'd just have to include the latest 64-bit qpdf binary for each platform with the distribution, then for PDFs run qpdf before exiftool during the processing phase.

While I do want to keep the number of settings and buttons to a minimum, I also want the main feature of the app, removing metadata, to be comprehensive. For this reason I'm also exploring removing extended filesystem attributes. So better PDF handling is something I'd like to add if it can be done well.

@szTheory szTheory added enhancement New feature or request help wanted Extra attention is needed labels Dec 23, 2020
@szTheory szTheory pinned this issue Dec 23, 2020
@bhadaway
Copy link
Author

It would be amazing because currently, the only other options for secure PDF cleanup are:

  1. If you have a copy of Adobe Acrobat (expensive and bloated), then you can sanitize documents.
  2. Uploading your documents to an online tool that scrubs them (I can't think of a more dangerous and counterintuitive option for privacy and security, which is the entire point).
  3. Combining multiple command line recipes to get it done right (which, even if you're comfortable using command line, is still a pain).

There actually is one other option that's super easy and straightforward, that most people's operating systems support natively. And that's simply to print as PDF, which apparently flattens the document and removes all the metadata. But, I'm not confident it's 100% fool-proof. It would be nicer to actually see the before and after (what your app does) to verify it's been cleaned.

@szTheory szTheory changed the title Would linearizing PDFs be a worthwhile feature to consider? Linearize PDFs for better metadata removal May 2, 2021
@szTheory szTheory unpinned this issue Dec 8, 2021
@WeAreLegion999
Copy link

Why does QPDF produce different files everytime? I used the same source file to generate files through QPDF at two instances and the binary file comparison shows differences in the two PDFs produced, despite the input file being the same.
The difference is located at the top and bottom in a UUID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants