Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-Task Results on CelebA Dataset #2

Closed
bhsimon0810 opened this issue Dec 15, 2023 · 13 comments
Closed

Single-Task Results on CelebA Dataset #2

bhsimon0810 opened this issue Dec 15, 2023 · 13 comments

Comments

@bhsimon0810
Copy link

Hi, could you provide the single-task results on the 40-task CelebA dataset? I am running experiments of an MTL method and want to compare the $\Delta_m$ with FAMO and other baselines. Unlike Cityscapes, NYUv2, and QM9, the single-task results on CelebA seem not to be contained in the codes in this repo. Although I can run single-task experiments, the obtained results may be slightly different from the results in your paper. Then, the comparison of $\Delta_m$ may be unfair. I would greatly appreciate it if you could provide me with more details about the single-task results on CelebA. Thanks in advance!

@Cranial-XIX
Copy link
Owner

Hi, here are the single-task results for the 40 tasks:

[0.6736886  0.68121034 0.81524944 0.5760289  0.7205613  0.8555076
 0.38203922 0.58225113 0.787647   0.8321292  0.5029583  0.68694085
 0.6781237  0.5240381  0.5161666  0.95694304 0.6968786  0.67976356
 0.8808315  0.8582131  0.97034    0.93267566 0.5057539  0.40307626
 0.9703734  0.48644206 0.60786104 0.5261031  0.56907415 0.59815097
 0.6858371  0.924108   0.5424991  0.7406311  0.71019936 0.87365365
 0.9305602  0.33704284 0.7647628  0.91907   ]

Please let me know if you have any further questions :)

@bhsimon0810
Copy link
Author

Thanks! That helps a lot!

@bhsimon0810
Copy link
Author

Sorry to bother you again. Which epoch do you choose to compute the final $\Delta_m$? Could you let me know if you used the results from the last epoch or the best epoch (obtained from the best averaged validation accuracy over 40 tasks)?

@Cranial-XIX
Copy link
Owner

I used the best epoch.

@bhsimon0810
Copy link
Author

Thanks again!

@bhsimon0810
Copy link
Author

Hi, sorry to bother you. Could you provide the per-task results on CelebA of the other 11 baselines and your FAMO, as listed in Table 3 in your paper? I am trying a MTL method and want to compare the Mean Rank (MR). But the computation of MR involves the per-task results, so I am reaching out to request these data. Thanks for your understanding.

@bhsimon0810 bhsimon0810 reopened this Jan 13, 2024
@zzzx1224
Copy link

Hi, here are the single-task results for the 40 tasks:

[0.6736886  0.68121034 0.81524944 0.5760289  0.7205613  0.8555076
 0.38203922 0.58225113 0.787647   0.8321292  0.5029583  0.68694085
 0.6781237  0.5240381  0.5161666  0.95694304 0.6968786  0.67976356
 0.8808315  0.8582131  0.97034    0.93267566 0.5057539  0.40307626
 0.9703734  0.48644206 0.60786104 0.5261031  0.56907415 0.59815097
 0.6858371  0.924108   0.5424991  0.7406311  0.71019936 0.87365365
 0.9305602  0.33704284 0.7647628  0.91907   ]

Please let me know if you have any further questions :)

Hi, sorry to bother you. I'm wondering if these results are single-task learning or FAMO. Because I rerun the FAMO code on celebA and got a very low delta compared with these numbers, which is around 0.15%. Thanks!

@Cranial-XIX
Copy link
Owner

Hi, here are the single-task results for the 40 tasks:

[0.6736886  0.68121034 0.81524944 0.5760289  0.7205613  0.8555076
 0.38203922 0.58225113 0.787647   0.8321292  0.5029583  0.68694085
 0.6781237  0.5240381  0.5161666  0.95694304 0.6968786  0.67976356
 0.8808315  0.8582131  0.97034    0.93267566 0.5057539  0.40307626
 0.9703734  0.48644206 0.60786104 0.5261031  0.56907415 0.59815097
 0.6858371  0.924108   0.5424991  0.7406311  0.71019936 0.87365365
 0.9305602  0.33704284 0.7647628  0.91907   ]

Please let me know if you have any further questions :)

Hi, sorry to bother you. I'm wondering if these results are single-task learning or FAMO. Because I rerun the FAMO code on celebA and got a very low delta compared with these numbers, which is around 0.15%. Thanks!

Yes, they are the single-task learning results I got on my side. FAMO may get a better result on your side :) I am averaging over 3 seeds so maybe you are lucky this time? Can you run another baseline like CAGrad or NashMTL to confirm?

@Cranial-XIX
Copy link
Owner

Cranial-XIX commented Jan 17, 2024

Hi, sorry to bother you. Could you provide the per-task results on CelebA of the other 11 baselines and your FAMO, as listed in Table 3 in your paper? I am trying a MTL method and want to compare the Mean Rank (MR). But the computation of MR involves the per-task results, so I am reaching out to request these data. Thanks for your understanding.

Here is the link to all results, you can use torch.load to load it and then it is a dictionary with self-explanatory key/values.

@zzzx1224
Copy link

Hi, here are the single-task results for the 40 tasks:

[0.6736886  0.68121034 0.81524944 0.5760289  0.7205613  0.8555076
 0.38203922 0.58225113 0.787647   0.8321292  0.5029583  0.68694085
 0.6781237  0.5240381  0.5161666  0.95694304 0.6968786  0.67976356
 0.8808315  0.8582131  0.97034    0.93267566 0.5057539  0.40307626
 0.9703734  0.48644206 0.60786104 0.5261031  0.56907415 0.59815097
 0.6858371  0.924108   0.5424991  0.7406311  0.71019936 0.87365365
 0.9305602  0.33704284 0.7647628  0.91907   ]

Please let me know if you have any further questions :)

Hi, sorry to bother you. I'm wondering if these results are single-task learning or FAMO. Because I rerun the FAMO code on celebA and got a very low delta compared with these numbers, which is around 0.15%. Thanks!

Yes, they are the single-task learning results I got on my side. FAMO may get a better result on your side :) I am averaging over 3 seeds so maybe you are lucky this time? Can you run another baseline like CAGrad or NashMTL to confirm?

Thanks a lot for the reply! I ran my evaluation again with the single-task learning results and (famo, 20000) from the results you shared, but still got a very low delta, around 1.75. Here is my evaluation function according to the equations in the paper.

def calculate_delta(famo, stl):
    sum = 0
    for i in range(famo.shape[0]):
        sum += -1 * (famo[i] - stl[i]) / stl[i] * 100

    return sum / famo.shape[0]

I think I might make some wrong in the evaluation function. Could you share your evaluation function? I would much appreciate that.

@bhsimon0810
Copy link
Author

Hi, sorry to bother you. Could you provide the per-task results on CelebA of the other 11 baselines and your FAMO, as listed in Table 3 in your paper? I am trying a MTL method and want to compare the Mean Rank (MR). But the computation of MR involves the per-task results, so I am reaching out to request these data. Thanks for your understanding.

Here is the link to all results, you can use torch.load to load it and then it is a dictionary with self-explanatory key/values.

Thanks very much! Appreciate your help!

@zzzx1224
Copy link

Hi, here are the single-task results for the 40 tasks:

[0.6736886  0.68121034 0.81524944 0.5760289  0.7205613  0.8555076
 0.38203922 0.58225113 0.787647   0.8321292  0.5029583  0.68694085
 0.6781237  0.5240381  0.5161666  0.95694304 0.6968786  0.67976356
 0.8808315  0.8582131  0.97034    0.93267566 0.5057539  0.40307626
 0.9703734  0.48644206 0.60786104 0.5261031  0.56907415 0.59815097
 0.6858371  0.924108   0.5424991  0.7406311  0.71019936 0.87365365
 0.9305602  0.33704284 0.7647628  0.91907   ]

Please let me know if you have any further questions :)

Hi, sorry to bother you. I'm wondering if these results are single-task learning or FAMO. Because I rerun the FAMO code on celebA and got a very low delta compared with these numbers, which is around 0.15%. Thanks!

Yes, they are the single-task learning results I got on my side. FAMO may get a better result on your side :) I am averaging over 3 seeds so maybe you are lucky this time? Can you run another baseline like CAGrad or NashMTL to confirm?

Thanks a lot for the reply! I ran my evaluation again with the single-task learning results and (famo, 20000) from the results you shared, but still got a very low delta, around 1.75. Here is my evaluation function according to the equations in the paper.

def calculate_delta(famo, stl):
    sum = 0
    for i in range(famo.shape[0]):
        sum += -1 * (famo[i] - stl[i]) / stl[i] * 100

    return sum / famo.shape[0]

I think I might make some wrong in the evaluation function. Could you share your evaluation function? I would much appreciate that.

Thanks for your time and I think I have solved the problem. There is a typo in the paper. The numbers in ''MR'' column and ''delta m'' column are swapped in CelebA table. The calculated number 1.75% is correct.

@Cranial-XIX
Copy link
Owner

Close the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants