Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpret UserComment character code #277

Merged

Conversation

patricksadowski
Copy link
Contributor

Reads first 8 bytes for character encoding recognition to comply with
EXIF standard.

It's not necessary to read 10 bytes at the beginning. Reading these two
extra bytes could lead to missing bytes and the end. Switching the byte
order for UNICODE to big-endian solves the issue with 8 and 10 bytes
reads.

Only for ASCII NULL termination is done, "since the type is not ASCII,
NULL termination is not necessary".

Closes #173

Reads first 8 bytes for character encoding recognition to comply with
EXIF standard.

It's not necessary to read 10 bytes at the beginning. Reading these two
extra bytes could lead to missing bytes and the end. Switching the byte
order for UNICODE to big-endian solves the issue with 8 and 10 bytes
reads.

Only for ASCII NULL termination is done, "since the type is not ASCII,
NULL termination is not necessary".
@patricksadowski
Copy link
Contributor Author

If a run MetadataExtractor.Tools.FileProcessor on metadata-extractor-images I get following changes:

jpg/metadata/dotnet/Issue 308.jpg.txt
@@ -34 +34 @@ TYPE: JPEG
-[Exif SubIFD - 0x9286] User Comment = ASCII
+[Exif SubIFD - 0x9286] User Comment =
b/jpg/metadata/dotnet/Samsung SM-C101 (Galaxy S4 Zoom).jpg.txt
@@ -71 +71 @@ TYPE: JPEG
-[GPS - 0x001b] GPS Processing Method = ASCII
+[GPS - 0x001b] GPS Processing Method =

and the sample image of #173 works too.

@drewnoakes
Copy link
Owner

Thanks, this looks great. I'll add the image from 173 to the library too, as it's a more interesting example than the ones in the diff above.

Seems tests are failing. I don't think they're all related to this PR, so I'll fix those up after merging this.

@drewnoakes drewnoakes merged commit 760e73e into drewnoakes:master Nov 18, 2020
drewnoakes added a commit that referenced this pull request Nov 18, 2020
@drewnoakes
Copy link
Owner

I patched up the unit tests.

With respect the UNICODE byte order (i.e. change to big-endian in this PR), it appears that the byte order of this UNICODE data is determined by the JPEG file's own byte order (i.e. II/MM). So this change isn't technically correct, but neither was the previous code.

@patricksadowski
Copy link
Contributor Author

Seems tests are failing.

Sorry, I forgot the tests. Next time I start with the tests. Thanks for fixing.

So this change isn't technically correct

I'm aware of this. It's a lot closer to the specification than before (8 bytes vs. 10 bytes). The specification is not clear about the endianess. I can't find any hint. Maybe I find a solution.

@patricksadowski patricksadowski deleted the exif-usercomment-encoding branch January 18, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jpeg EXIF comment missing last character
2 participants