Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extended filesystem attributes on Mac/Linux (xattr/mdls) such as Source/Quelle #86

Open
tayfuuun opened this issue Sep 29, 2020 · 12 comments
Labels

Comments

@tayfuuun
Copy link
Contributor

Hello,

a screenshot from the PDF file before I use your tool:
image

a screenshot after I use your tool:
image

Everything is removed only the Source is left. Can you check if its possible to remove the source of the PDF file?

tayfuuun added a commit to tayfuuun/exifcleaner that referenced this issue Sep 29, 2020
german translation added ;) 
have fun and please check my issue szTheory#86
It will be very nice when you add the option to remove the source.
tayfuuun added a commit to tayfuuun/exifcleaner that referenced this issue Sep 29, 2020
german translation added ;) 
have fun and please check my issue szTheory#86
It will be very nice when you add the option to remove the source.
tayfuuun added a commit to tayfuuun/exifcleaner that referenced this issue Sep 29, 2020
german translation added ;) 
have fun and please check my issue szTheory#86
It will be very nice when you add the option to remove the source.
@szTheory szTheory added the enhancement New feature or request label Sep 29, 2020
@szTheory
Copy link
Owner

Thanks, good catch! I'll check it out. In the meantime if you could possibly find the exiftool command line options that take remove it, it will be even easier for me to modify ExifCleaner to use those options automatically. The easiest way to do this would be to run the exiftool command and verify that it removed the Source/Quelle field in your sample PDF. That will help me get this feature ready faster. If not it's OK I can figure it out. It just might take me a bit longer.

@szTheory szTheory changed the title Do not remove the Source / Quelle Remove the Source / Quelle metadata from PDFs Sep 29, 2020
@szTheory szTheory changed the title Remove the Source / Quelle metadata from PDFs Remove the Source/Quelle metadata from PDFs Sep 29, 2020
@tayfuuun
Copy link
Contributor Author

@szTheory sorry no time for this one.
Good luck and thank you.

Thanks, good catch! I'll check it out. In the meantime if you could possibly find the exiftool command line options that take remove it, it will be even easier for me to modify ExifCleaner to use those options automatically. The easiest way to do this would be to run the exiftool command and verify that it removed the Source/Quelle field in your sample PDF. That will help me get this feature ready faster. If not it's OK I can figure it out. It just might take me a bit longer.

@tayfuuun
Copy link
Contributor Author

tayfuuun commented Oct 1, 2020

@szTheory can you please check the source for the other file formats too? JPG, PNG.

@tayfuuun
Copy link
Contributor Author

tayfuuun commented Nov 3, 2020

@szTheory any update?

@szTheory
Copy link
Owner

szTheory commented Nov 4, 2020

Sorry no, can you provide some generic example documents and images with a Source/Quelle that is not being erased? I also recently released a new version of ExifCleaner. It's a long shot but maybe you can download that and see if it helps, since I did fix a few bugs.

@tayfuuun
Copy link
Contributor Author

tayfuuun commented Nov 5, 2020

@szTheory files and tests with the latest version 3.4.0

[Edit: files removed]

Results

Also with the newest version the source is not removed from PDF and PNG files.

@szTheory
Copy link
Owner

szTheory commented Nov 7, 2020

Interesting, even if I run exiftool directly on those files even with the -v verbose flag it's not picking up the Source/Quelle metadata, but when I tested it on a Mac it shows the field in the file info window for both the PDF and the PNG. I'll have to look into this more.

@szTheory szTheory changed the title Remove the Source/Quelle metadata from PDFs Source/Quelle metadata on Mac is not removed from PDF or PNG Nov 7, 2020
@szTheory szTheory changed the title Source/Quelle metadata on Mac is not removed from PDF or PNG Source/Quelle metadata is not removed from PDF or PNG Nov 7, 2020
@szTheory szTheory added bug Something isn't working high priority and removed enhancement New feature or request labels Nov 7, 2020
@szTheory szTheory pinned this issue Nov 7, 2020
@szTheory
Copy link
Owner

szTheory commented Nov 7, 2020

If you run mdls myfile.png it shows what looks like some Mac-specific metadata like kMDItemProfileName and kMDItemWhereFroms that exiftool is not picking up on. Will have to see how to add support if it's built into exiftool already and just need some different command line flags, or if ExifCleaner has to bolt on extra functionality. In the meantime you can remove the metadata with xattr -c myfilehere.pdf (the -c flag means clear) and confirm afterwards by running mdls again on the file. See this link for more info.

@szTheory szTheory added enhancement New feature or request mac high priority and removed bug Something isn't working high priority labels Nov 7, 2020
@szTheory szTheory changed the title Source/Quelle metadata is not removed from PDF or PNG Add ability to remove Mac HFS+ extended attributes filesystem metadata (xattr/mdls) such as Source/Quelle Nov 7, 2020
@szTheory szTheory changed the title Add ability to remove Mac HFS+ extended attributes filesystem metadata (xattr/mdls) such as Source/Quelle Add ability to remove extended filesystem attributes on Mac/Linux (xattr/mdls) such as Source/Quelle Nov 7, 2020
@tayfuuun
Copy link
Contributor Author

tayfuuun commented Nov 9, 2020

@szTheory xattr -c myfilehere.pdf working! Nice. When you implement this in your tool, you would make me very happy.

@szTheory
Copy link
Owner

szTheory commented Dec 29, 2020

Current plan for Mac

  • spin up an extra mdls process to read extended filesystem attributes in the "# exif before" column, then another one with the -c flag to clear them, then another one to populate the "# exif after" column.
  • If possible figure out a way to keep the mdls process alive in a process pool and keep them alive to process multiple files to minimize process overhead, like is done with exiftool.
  • Or pass multiple files at once to a single process per-CPU core.
  • Investigate if there are any extended filesystem attributes that mdls -c still leaves behind and how to deal with them.

Current plan for Linux

  • research the extended filesystem attributes more. There is probably variation between the Linux filesystems.
  • If possible find a single tool that deals with all the Linux file systems uniformly

Current plan for Windows

  • Find an existing command tool, perhaps C/C++ or Powershell that cleans Windows extended filesystem attrs

Current plan for all OS targets

  • Extract the extended filesystem attribute cleanup into a single NPM package, or C/C++ tool with Node CAPI extension.

Without help this is likely going to take more than a year of low time comittment work. If someone provides a drop-in solution like there is with exiftool then it will go faster.

@szTheory szTheory changed the title Add ability to remove extended filesystem attributes on Mac/Linux (xattr/mdls) such as Source/Quelle Remove extended filesystem attributes on Mac/Linux (xattr/mdls) such as Source/Quelle May 2, 2021
@tayfuuun
Copy link
Contributor Author

tayfuuun commented Jun 8, 2021

@szTheory one small update
the commend xattr -c myfilehere.pdf working fine for images (.png, .jpg), but when I use it for PDF files, not every information is deleted on a macOS.

image

@szTheory
Copy link
Owner

szTheory commented Jun 8, 2021

Thanks yeah I noticed that too. I'm not sure what to do about it. Maybe it's a bug in xattr. I couldn't find any guide that mentions this failing. Everything just recommended to use xattr which clearly is not doing enough, even after I played around with all of its command line options.

I don't know enough about these filesystems to find a comprehensive solution, so hopefully someone can recommend a starting point. Ideally there would be one tool that gets rid of all the extended filesystem attributes. Better yet, one that works for all filesystem types, across all the major operating systems. Then that tool could be vetted and integrated with ExifCleaner for a single drag and drop that gets rid of everything, instead of being a patchwork process that depends on your environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants