Support for transparent images #38

fbarrella · 2019-05-20T19:20:32Z

I'm having a unusual problem in my project where I'm getting the same hash to two different images. Those are the mentioned images:

The project is simply using the hash(BufferedImage image) method from the API and I'm not sure if there are different approaches to do that, but is there any possible solution to this problem? Thanks in advance!

KilianB · 2019-05-20T21:17:11Z

Hash collisions happen by design due to the fact that you map arbitrarily many images to a fixed length hash.

For example using java's default hashcode implementation converts the string Hash to a numeric value of 2241838. Just looking at 4 character strings, any of the following word have the exact same hashcode

Hc5h
ID5h
IBsh
HbTh
ICTh
Hc6I
ID6I
HatI
IBtI
HbUI
ICUI

This is also true for images. A solution would be to use a secondary hash function looking at different features of the image (e.g. average hash and perceptive hash) to confirm your classification. If they agree the image is most likely a duplicate.

The chain algorithm example show how you might approach this:

https://github.com/KilianB/JImageHash/blob/master/src/main/java/com/github/kilianB/examples/ChainAlgorithms.java

fbarrella · 2019-05-23T19:27:02Z

Thanks!!! But sadly I might say I'm having a bad time at trying to find a solution to this. Oddly enough, I've tried to use multiple of the available hashing methods, but always kept ending up with the same digits for both of 'em images. Even changing the "bit resolution" didn't worked; What intrigues me the most is the fact that, as you can see from the images uploaded in my prior comment, they are indeed different images. I would love to find a way to show that for the hash methods as well, hahaha. There are any other possibilities in the game? Anyway, thank you very much for the help!

fbarrella · 2019-05-23T19:32:12Z

I'm also going to leave here the actual piece of code I'm using to test the similarity between the images! Maybe I'm doing something I can't see right now! Would appreciate some insights!

@PostMapping(value = "/v1.0/hashTest")
public ResponseEntity getImageHashTest(@RequestParam(name="file") MultipartFile file,
                                       @RequestParam(name="file2") MultipartFile file2){
    SingleImageMatcher matcher = new SingleImageMatcher();
    Map<String, String> hashMap = new HashMap<>();

    try {
        BufferedImage image = ImageIO.read(file.getInputStream());
        BufferedImage image2 = ImageIO.read(file2.getInputStream());

        matcher.addHashingAlgorithm(new AverageHash(8), 0.4);
        matcher.addHashingAlgorithm(new AverageHash(32), 0.4);
        matcher.addHashingAlgorithm(new AverageHash(64), 0.4);

        matcher.addHashingAlgorithm(new PerceptiveHash(32), 0.4);
        matcher.addHashingAlgorithm(new PerceptiveHash(64), 0.4);

        matcher.addHashingAlgorithm(new MedianHash(32), 0.4);
        matcher.addHashingAlgorithm(new MedianHash(64), 0.4);

        matcher.addHashingAlgorithm(new DifferenceHash(64, DifferenceHash.Precision.Simple), 0.4);
        matcher.addHashingAlgorithm(new DifferenceHash(32, DifferenceHash.Precision.Triple), 0.4);

        if(matcher.checkSimilarity(image, image2))
            hashMap.put("similarity", "yes");
        else
            hashMap.put("similarity", "no");

        return ResponseEntity.ok(hashMap);
    } catch (IOException e) {
        e.printStackTrace();
    }
        
    return ResponseEntity.noContent().build();
}

KilianB · 2019-05-23T22:03:32Z

I did some testing and indeed those images will result in the same hash no matter what you try. Upon further investigation the issue arises due to the alpha channel. The black parts of the image are solid black, the white parts simply have an opacity of 0.
As far as the program is aware, computing the luminosity values only takes the rgb values into account which are the same for each pixel.
Are there any guidelines how transparency should be regarded when calculating Y in the YCbCr color model? I assume that for this trivial case an alpha of 0 can
be assumed as white, but this isn't entirely correct for every single use case.
For now an ugly work around would be to replace the opaque pixels with a white color until I can figure out how to correctly compute luminosity. (Is there a formula how to handle alpha? Always assume white?)

KilianB · 2019-05-23T22:22:55Z

Yes, choosing a different hash method will not make a difference since the issue resides at the hash precalculation step.
I see where you are coming from and this indeed is an issue. Semantically there isn't a valid solution I am afraid. We never know what color a missing pixel (invisible) will be.
More often than not those pixels are displayed as white, as seen in the above images. I will add an option to let people choose how to handle transparency. This will fix your current issue.

fbarrella · 2019-05-23T22:42:34Z

That's awesome! I was actually going to come back with this exactly answer! Going through a little search, I've noticed how much the transparency affected the hash calculation by the API and ended up with the idea of simply modifying the original BufferedImage with an white background and then generating an hash over it. The only problem I see is when the actual image is an white png icon. Maybe we could iterate over this solution to get to a good place.

KilianB · 2019-05-24T11:25:03Z

Everything is pretty much implemented I just need to do some unit tests in order to ensure I didn't mess up anything else.
The heavy lifting is done at the utility code repository.

From now on you can define:

   HashingAlgorithm aHasher = new AverageHash(64);
   //Define how to handle opaque pixel
   double alphaThreshold = 0;
   aHasher.setOpaqueHandling(Color.white,alphaThreshold);
		
   //Proceed as normal`

Will this suit you or do you have any other ideas? By default I will retain the old behavior to not break backwards compatibility.
For strictly black and white images with transparent background we simply use an arbitrary color and handle both use cases.

fbarrella · 2019-05-24T14:03:45Z

Ok, if I got it right, the hasher will by default treat the image with a white background while also letting me choose another color/threshold if demanded, right? If so, it is amazing! It solves the problem as we can get even more preciser hashes! About the black and white w/ no background: what if you calculate the bg color over the luminance of the predominant image color? Maybe so we can avoid as much as possible making the user set the color manually!

KilianB · 2019-05-26T23:20:10Z

While refactoring the utility code I changed a few design decisions which takes a while longer than expected. I really wanted to get the new version released this night but sadly it will take a tiny bit.

fbarrella · 2019-05-27T18:22:44Z

Cool! Man, I would like to report a new ununsual case after the solution of adding a white bg to transparent backgrounds... For some reason my code resulted to generate equal codes (once again) when trying to hash those two following images using new PerceptiveHash(32) (respectively, one being the transparent png and the other being just a regular jpeg):

Would you please try to hash 'em so we can test if the anomaly isn't only at my side?

KilianB added bug Something isn't working help wanted Extra attention is needed labels May 23, 2019

KilianB added a commit that referenced this issue May 24, 2019

Add the ability to define a default color for opaque pixels. #38

201b24b

KilianB added a commit that referenced this issue May 24, 2019

remove dependency not part of this fork #38

e60eefb

KilianB added a commit that referenced this issue May 24, 2019

prevent object modification after hashes have been created #38

27b47ef

KilianB mentioned this issue Jan 20, 2020

Logging #43

Open

KilianB added a commit to KilianB/UtilityCode that referenced this issue Jun 15, 2021

allow replacing opaque pixels with specified color KilianB/JImageHash#38

ef26ce7

KilianB self-assigned this Jun 17, 2021

KilianB added enhancement New feature or request and removed help wanted Extra attention is needed labels Jun 18, 2021

KilianB changed the title ~~Different images resulting into same hash~~ Support for transparent images Jun 18, 2021

KilianB added a commit that referenced this issue Jun 20, 2021

Move files to package matching new group id #38

f0f7832

KilianB closed this as completed in a9f0d28 Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for transparent images #38

Support for transparent images #38

fbarrella commented May 20, 2019

KilianB commented May 20, 2019

fbarrella commented May 23, 2019 •

edited

Loading

fbarrella commented May 23, 2019 •

edited

Loading

KilianB commented May 23, 2019

KilianB commented May 23, 2019

fbarrella commented May 23, 2019

KilianB commented May 24, 2019 •

edited

Loading

fbarrella commented May 24, 2019

KilianB commented May 26, 2019

fbarrella commented May 27, 2019

Support for transparent images #38

Support for transparent images #38

Comments

fbarrella commented May 20, 2019

KilianB commented May 20, 2019

fbarrella commented May 23, 2019 • edited Loading

fbarrella commented May 23, 2019 • edited Loading

KilianB commented May 23, 2019

KilianB commented May 23, 2019

fbarrella commented May 23, 2019

KilianB commented May 24, 2019 • edited Loading

fbarrella commented May 24, 2019

KilianB commented May 26, 2019

fbarrella commented May 27, 2019

fbarrella commented May 23, 2019 •

edited

Loading

fbarrella commented May 23, 2019 •

edited

Loading

KilianB commented May 24, 2019 •

edited

Loading