-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for transparent images #38
Comments
Hash collisions happen by design due to the fact that you map arbitrarily many images to a fixed length hash. For example using java's default hashcode implementation converts the string
This is also true for images. A solution would be to use a secondary hash function looking at different features of the image (e.g. average hash and perceptive hash) to confirm your classification. If they agree the image is most likely a duplicate. The chain algorithm example show how you might approach this: |
Thanks!!! But sadly I might say I'm having a bad time at trying to find a solution to this. Oddly enough, I've tried to use multiple of the available hashing methods, but always kept ending up with the same digits for both of 'em images. Even changing the "bit resolution" didn't worked; What intrigues me the most is the fact that, as you can see from the images uploaded in my prior comment, they are indeed different images. I would love to find a way to show that for the hash methods as well, hahaha. There are any other possibilities in the game? Anyway, thank you very much for the help! |
I'm also going to leave here the actual piece of code I'm using to test the similarity between the images! Maybe I'm doing something I can't see right now! Would appreciate some insights! @PostMapping(value = "/v1.0/hashTest")
public ResponseEntity getImageHashTest(@RequestParam(name="file") MultipartFile file,
@RequestParam(name="file2") MultipartFile file2){
SingleImageMatcher matcher = new SingleImageMatcher();
Map<String, String> hashMap = new HashMap<>();
try {
BufferedImage image = ImageIO.read(file.getInputStream());
BufferedImage image2 = ImageIO.read(file2.getInputStream());
matcher.addHashingAlgorithm(new AverageHash(8), 0.4);
matcher.addHashingAlgorithm(new AverageHash(32), 0.4);
matcher.addHashingAlgorithm(new AverageHash(64), 0.4);
matcher.addHashingAlgorithm(new PerceptiveHash(32), 0.4);
matcher.addHashingAlgorithm(new PerceptiveHash(64), 0.4);
matcher.addHashingAlgorithm(new MedianHash(32), 0.4);
matcher.addHashingAlgorithm(new MedianHash(64), 0.4);
matcher.addHashingAlgorithm(new DifferenceHash(64, DifferenceHash.Precision.Simple), 0.4);
matcher.addHashingAlgorithm(new DifferenceHash(32, DifferenceHash.Precision.Triple), 0.4);
if(matcher.checkSimilarity(image, image2))
hashMap.put("similarity", "yes");
else
hashMap.put("similarity", "no");
return ResponseEntity.ok(hashMap);
} catch (IOException e) {
e.printStackTrace();
}
return ResponseEntity.noContent().build();
} |
I did some testing and indeed those images will result in the same hash no matter what you try. Upon further investigation the issue arises due to the alpha channel. The black parts of the image are solid black, the white parts simply have an opacity of 0. |
Yes, choosing a different hash method will not make a difference since the issue resides at the hash precalculation step. |
That's awesome! I was actually going to come back with this exactly answer! Going through a little search, I've noticed how much the transparency affected the hash calculation by the API and ended up with the idea of simply modifying the original BufferedImage with an white background and then generating an hash over it. The only problem I see is when the actual image is an white png icon. Maybe we could iterate over this solution to get to a good place. |
Everything is pretty much implemented I just need to do some unit tests in order to ensure I didn't mess up anything else. From now on you can define: HashingAlgorithm aHasher = new AverageHash(64);
//Define how to handle opaque pixel
double alphaThreshold = 0;
aHasher.setOpaqueHandling(Color.white,alphaThreshold);
//Proceed as normal` Will this suit you or do you have any other ideas? By default I will retain the old behavior to not break backwards compatibility. |
Ok, if I got it right, the hasher will by default treat the image with a white background while also letting me choose another color/threshold if demanded, right? If so, it is amazing! It solves the problem as we can get even more preciser hashes! About the black and white w/ no background: what if you calculate the bg color over the luminance of the predominant image color? Maybe so we can avoid as much as possible making the user set the color manually! |
While refactoring the utility code I changed a few design decisions which takes a while longer than expected. I really wanted to get the new version released this night but sadly it will take a tiny bit. |
Cool! Man, I would like to report a new ununsual case after the solution of adding a white bg to transparent backgrounds... For some reason my code resulted to generate equal codes (once again) when trying to hash those two following images using new PerceptiveHash(32) (respectively, one being the transparent png and the other being just a regular jpeg): Would you please try to hash 'em so we can test if the anomaly isn't only at my side? |
I'm having a unusual problem in my project where I'm getting the same hash to two different images. Those are the mentioned images:


The project is simply using the hash(BufferedImage image) method from the API and I'm not sure if there are different approaches to do that, but is there any possible solution to this problem? Thanks in advance!
The text was updated successfully, but these errors were encountered: