-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I reproduce the quantitative experiment results in the paper? #7
Comments
There are some differences between the cleaned code and the original code indeed. But I do think that it would be better rather than worse. |
thanks for your attention and your quick reply, I will look forward to your reply! |
@HyZhu39 Hello, how you get the FID between images generated by 5 style codes and the real images? |
Actually, I did put them in one folder and calculated two folders' FID as the result, and for disentanglement experiments, I just selected from test images with bangs as reference images. Thanks for pointing out that, I'll have a try as you said and tell you the results. |
You're welcomed. Since there are same identities in one folder, the FID (which uses the variance of the image features) would definitely become bigger. |
Sorry for bothering again. I tried to put generated images with 5 different style codes separately by style code they used and tested, but it seems that the results are getting worse... that's wired, I think. experiment 1: disentanglement: experiment 2: disentanglement:same setting as experiment 1; I resized and saved the images that calculated FID with as "easy_use.py" did: |
Actually, you need to randomly sample the reference images for each source image. If you sample only one reference image to translate all the source images into 'with_bangs', the bangs in the translated folder will be the same, right?
So the problem may be that you put |
Thank you very much for your patience and help, I will try again as soon as possible and give you feedback. |
Sorry for mistakes I made and my misunderstanding of your experiment settings, I think I understand your experiment settings actually now. I randomly sample the reference images for each source image as your proposed logic. realism: distanglement: However, the results are still much worse than the paper's, I think that might be something wrong with my training stage, I think maybe I should re-train my model and have another try. In fact, I don't know much about image translation, I'm just a beginner of image translation researchers in a way, I hope you don't get bored because of my ignorance. |
It's always encouraged to ask in research. |
Many thanks for your help. I have packed some qualitative experiment results and the images of my quantitative experiment (if you needed) in the following Baidu Yun link. Thank you for your willingness to help. |
The qualitative results seem to be promising.
So there may be a proprecessing (a simple normalization) which you need to add in your code. Let me know the results and I think we are close to make it. |
Sorry for delaying the reply. Actually, without this preprocessing caused the much lower FID results, with StarGANv2's script, the FID results of my latest released results improved to : |
I do think this is acceptable.
I've change the README to clarify the corrected FID script I use in the quantitative results, thank you for your enthusiastic reproduction! |
Thank you for your help again. It's your selfless help that I can successfully reproduce your experiment results. |
你也是~ |
Could you share the images of your quantitative experiment again because the Baidu Yun link is invalid? I am also reproducing the quantitative experiment results in the paper, following your issue but I can not get the result close to the paper. |
@oldrive What‘s your detailed setting for your reproduction. |
config: celeba-hq.yaml realism fid of G: |
The result of the realism_fid as follows: Group 2: |
What's the command you used when you calculate the FID? |
Just like this: |
The "args.img_size" is set to be 128, right? |
right. |
What about the qualitative results. |
the results have replied in above mention. |
I mean the visual results. |
Every image in a fold has a different style of bangs. |
The visual results seems normal. Please change the image size used in FID to 256 or 224. I don't quite remember the setting here, since that the inception network is trained at a specific resolution. |
I'll have a try as you said and tell you the results. Thanks for your reply! |
Sorry for bothering again. I tried to compute the realism_fid with two groups. Group1 with the argument(--img_size = 256), and compute fid between the fake images(256256, generated with the 256.config and 256.checkpoint) and real images, group2 with the same argument(--img_size = 256), and compute fid between the fake images(128128, generated with the 128.config and 128.checkpoint) and real images, but it seems that the results are getting worse... That is so wired. realism_fid(256*256 fake_images and real_images, fid(fake_images, real_images, arg.img_size = 256)): realism_fid(128*128 fake_images and real_images, fid(fake_images, real_images, arg.img_size = 256)): |
Could you share some real images in test_bangs_with.txt? |
@oldrive I don't know if this is the reason. In my experiments, the real images are also resized to specific resolution first and saved in a folder just like @HyZhu39 did:
|
Before computing fid use real images, I did not resize them or save them to a folder, I'll have a try. |
Oh, the reason is that, and I get the result of realism fid and disentangle fid closer to the paper. disentangle_fid: |
Thank you for your patient help and quick reply again. With your help can I reproduce the quantitative experiment results in the paper. |
Ideally it should be the same for these two resizing steps. I think the reason maybe the the transform.resize module. As this link says, when inputing a PIL image, the resize function would use antialias mode by default. |
Sorry to disturb you, I am reshowing the experimental results of this paper. There are 568 real images with bangs, and 2432 images without bangs. After translation, I will get 2432 imgs with bangs. May I ask whether I should directly calculate FID for these two photo sets or select 568 images from 2432 images for calculation? Looking forward to your reply.Thank you! |
Yes. The FID evaluation separately calculates the distribution mean and var of two folders, so you don't need to worry about the different number of images. @zhushuqi2333 |
Thank you for your reply~I have carefully read all the answers to this question and conducted relevant experiments. All my experiments are carried out under 128 x 128 pictures. Because I trained the model by using celeba-hq.yaml,the resolution of which is 128 x128 I have one difference from the above content, which is the calculation of FID -- img_ size=128. All the resolution of real_imgs is 128 X128, and the size of all translated pictures is also 128 X128. realism: disentanglement: compared to the paper's results: L is a little bigger than the paper's, G is too big. Can you give some suggestions?Looking forward to your reply~ |
I think the difference between these two results is acceptable if you only calculate once. You can try:
|
Thank you for your reply~Results above are average,I will try many different seeds and use different checkpoint. |
First of all, congratulations on the results of the research,
and thank you for the concise and understandable code implementation.
But I still encountered some problems when trying to reproduce the quantitative experiment results in the paper, I did as follow:
Realism:
Disentanglement:
Then I got FID results:
L:25.05 R:25.21 G:0.16 in the "Realism" experiment
L:85.75 R:84.45 G:1.30 in the "Disentanglement" experiment
While In the paper:
L:21.37 R:21.49 G:0.12 in the "Realism" experiment
L:71.85 R:71.48 G:0.37 in the "Disentanglement" experiment
Although there are several random factors in many places in the experiment, it is normal for the FID results to have fluctuations, but these results are too bad.
I think there must be something wrong with my data processing, or the training method.
So, could you please explain the data used in the quantitative experiment and the method of data processing in detail? If possible, could you please release the model of the paper's experiment config?
The text was updated successfully, but these errors were encountered: