Alpha-transparency support #8

addyosmani · 2021-01-11T05:00:37Z

Great work on Upscaler, @thekevinscott! I've recently been using it for a small side-project. When processed through the library, I observed that PNG images with an alpha channel / which are transparent appear to get a solid black color background. This appears to happen with all models.

I wanted to ask if there was a small fix possible for this upstream in Upscale. My alternative workaround is likely to involve pre or post-processing (e.g allow the user to customize the solid background color) but preserving their input would of course be ideal if possible :)

Below is an example of a transparent PNG that demonstrates this behavior with the demo:

Demo images

Input:

Upscaler output:

thekevinscott · 2021-01-12T16:38:20Z

Really interesting question! My gut says your workaround of pre/post-processing is going to be the easiest solution here, as it would mean, like you say, tackling it upstream, specifically re-training a model from scratch to support 4 channels instead of 3 (similar logic to why you need different models for different scale sizes).

If you do want to go down that path: all of the models are trained using this python implementation of ESRGAN (this repo's Javascript code is model-agnostic, and built to support other implementations and algorithms but I haven't converted any other implementations yet).

This particular Python implementation seems to explicitly only support three channels (here's a related issue I found). However, there's no theoretical reason (I think!) that transparency couldn't be supported, as it's just another channel.

One option would be to look for an alternative python implementation that supports alpha transparency and convert it to TFJS. I left some notes on how to go about picking an implementation here if you're interested in that route, but the TLDR is look for Tensorflow implementations (ideally without custom layers).

Alternatively, you could modify image-super-resolution to explicitly support 4 channels. I think it may be as easy as modifying the c_dim parameter in the rdn.py or rrdn.py models, though I'm not totally sure.

You'd also need a dataset that had images with alpha transparency. A google search led me to some datasets that seem more aimed at training a matting model, but you may be able to re-purpose their datasets for this use case. Alternatively, I believe you could take a bunch of images (for instance, start with Flickr faces dataset), matte out the backgrounds, and make them transparent. You might even have good luck by randomly turning parts of images transparent, or setting certain colors to transparent - this would be a really interesting research avenue to explore!

The original Python implementation (as well as the models in this repo) were trained on the DIV2K dataset, which has 800 images (plus I think 200 for validation / test) so I'd shoot for something in that ballpark.

The Python repo has good information on how to train.

Once you've got a trained model you'd need to make some changes to this repo, specifically on the input and output tensors. As of now, it assumes three channels. It'd be neat to have a model-per config that describes channels (I could also see single-channel black+white as being useful), for which I would welcome a PR, or help you with one! I'm also not 100% sure if the 4-channel tensor -> canvas image would need some massaging as well, but if so that'd be a straight forward fix.

In general, the Javascript changes should be fairly easy to make, the model-training-from-scratch (and dataset collection) I'd imagine to be significantly more work.

Hope that helps. It's a very interesting problem!

addyosmani · 2021-01-17T04:57:57Z

Thank you for the helpful detailed reply, @thekevinscott!

Really interesting question! My gut says your workaround of pre/post-processing is going to be the easiest solution here, as it would mean, like you say, tackling it upstream, specifically re-training a model from scratch to support 4 channels instead of 3 (similar logic to why you need different models for different scale sizes).

I appreciate the validation on approach! As my use-case is upscaling a class of images that often has 4 channels (Memoji, stickers) I've gone ahead and implemented a pre-processing step to address this issue for now. Images are written to <canvas> where a solid color is required (defaults to white, with support for user customization) in order to meet the 3 channel constraint. This appears to work as an interim solution with DIV2K :) I'll be curious to hear how important users need transparency on these types of photos.

One option would be to look for an alternative python implementation that supports alpha transparency and convert it to TFJS. I left some notes on how to go about picking an implementation here if you're interested in that route, but the TLDR is look for Tensorflow implementations (ideally without custom layers).

This is very helpful. Thank you! https://thekevinscott.com/super-resolution-with-js/#hearing-it-through-the-grapevine is a great starting point for me to dig into deeper.

Alternatively, you could modify image-super-resolution to explicitly support 4 channels. I think it may be as easy as modifying the c_dim parameter in the rdn.py or rrdn.py models, though I'm not totally sure.

I greatly appreciate the pointer to c_dim. From a cursory look through the image-super-resolution source, it would appear that there are no hard-coded limitations that would prevent modifying it to support more than 3 channels. I'll give this a try and see how it goes :)

You'd also need a dataset that had images with alpha transparency. A google search led me to some datasets that seem more aimed at training a matting model, but you may be able to re-purpose their datasets for this use case.

That's a great tip. If the matting model sets appear like they aren't sufficiently close, it shouldn't be a huge effort to create a new dataset with Google's image search and the transparency option.

Once you've got a trained model you'd need to make some changes to this repo, specifically on the input and output tensors. As of now, it assumes three channels.

I'll be sure to report back here if I get as far as a trained model that appears to work sufficiently. Once again, thank you for the pointers!

addyosmani closed this as completed Jan 29, 2021

thekevinscott mentioned this issue Nov 25, 2023

Support for transparent images #1271

Closed

thekevinscott mentioned this issue Mar 15, 2024

Real-Ergan Model #1273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alpha-transparency support #8

Alpha-transparency support #8

addyosmani commented Jan 11, 2021 •

edited

Loading

thekevinscott commented Jan 12, 2021

addyosmani commented Jan 17, 2021

Alpha-transparency support #8

Alpha-transparency support #8

Comments

addyosmani commented Jan 11, 2021 • edited Loading

thekevinscott commented Jan 12, 2021

addyosmani commented Jan 17, 2021

addyosmani commented Jan 11, 2021 •

edited

Loading