Interpret UserComment character code #277

patricksadowski · 2020-11-16T16:29:22Z

Reads first 8 bytes for character encoding recognition to comply with
EXIF standard.

It's not necessary to read 10 bytes at the beginning. Reading these two
extra bytes could lead to missing bytes and the end. Switching the byte
order for UNICODE to big-endian solves the issue with 8 and 10 bytes
reads.

Only for ASCII NULL termination is done, "since the type is not ASCII,
NULL termination is not necessary".

Closes #173

Reads first 8 bytes for character encoding recognition to comply with EXIF standard. It's not necessary to read 10 bytes at the beginning. Reading these two extra bytes could lead to missing bytes and the end. Switching the byte order for UNICODE to big-endian solves the issue with 8 and 10 bytes reads. Only for ASCII NULL termination is done, "since the type is not ASCII, NULL termination is not necessary".

patricksadowski · 2020-11-16T16:33:01Z

If a run MetadataExtractor.Tools.FileProcessor on metadata-extractor-images I get following changes:

jpg/metadata/dotnet/Issue 308.jpg.txt
@@ -34 +34 @@ TYPE: JPEG
-[Exif SubIFD - 0x9286] User Comment = ASCII
+[Exif SubIFD - 0x9286] User Comment =
b/jpg/metadata/dotnet/Samsung SM-C101 (Galaxy S4 Zoom).jpg.txt
@@ -71 +71 @@ TYPE: JPEG
-[GPS - 0x001b] GPS Processing Method = ASCII
+[GPS - 0x001b] GPS Processing Method =

and the sample image of #173 works too.

drewnoakes · 2020-11-18T00:37:32Z

Thanks, this looks great. I'll add the image from 173 to the library too, as it's a more interesting example than the ones in the diff above.

Seems tests are failing. I don't think they're all related to this PR, so I'll fix those up after merging this.

drewnoakes · 2020-11-18T02:29:58Z

I patched up the unit tests.

With respect the UNICODE byte order (i.e. change to big-endian in this PR), it appears that the byte order of this UNICODE data is determined by the JPEG file's own byte order (i.e. II/MM). So this change isn't technically correct, but neither was the previous code.

patricksadowski · 2020-11-19T16:32:31Z

Seems tests are failing.

Sorry, I forgot the tests. Next time I start with the tests. Thanks for fixing.

So this change isn't technically correct

I'm aware of this. It's a lot closer to the specification than before (8 bytes vs. 10 bytes). The specification is not clear about the endianess. I can't find any hint. Maybe I find a solution.

drewnoakes merged commit 760e73e into drewnoakes:master Nov 18, 2020

drewnoakes added a commit that referenced this pull request Nov 18, 2020

Update unit tests after changes in #277

46ccdd4

patricksadowski deleted the exif-usercomment-encoding branch January 18, 2021 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpret UserComment character code #277

Interpret UserComment character code #277

patricksadowski commented Nov 16, 2020

patricksadowski commented Nov 16, 2020

drewnoakes commented Nov 18, 2020

drewnoakes commented Nov 18, 2020

patricksadowski commented Nov 19, 2020

Interpret UserComment character code #277

Interpret UserComment character code #277

Conversation

patricksadowski commented Nov 16, 2020

patricksadowski commented Nov 16, 2020

drewnoakes commented Nov 18, 2020

drewnoakes commented Nov 18, 2020

patricksadowski commented Nov 19, 2020