Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation #11

Open
Ha0Tang opened this issue Nov 25, 2020 · 13 comments
Open

Evaluation #11

Ha0Tang opened this issue Nov 25, 2020 · 13 comments

Comments

@Ha0Tang
Copy link

Ha0Tang commented Nov 25, 2020

Hi, can you give instructions on how to evaluate the model with the three metrics you used? Thanks.

@ennauata
Copy link
Owner

ennauata commented Nov 25, 2020

See more details on Section 3 [here] and Section 1-2 [here]

The realism score comes from a user study, compatibility from the graph edit distance (See evaluation_parallel.py) and diversity from FID (See compute_FID.py - I recommend using this [here])

@Ha0Tang
Copy link
Author

Ha0Tang commented Nov 25, 2020

Thank you so much:)

@Ha0Tang Ha0Tang closed this as completed Nov 25, 2020
@Ha0Tang
Copy link
Author

Ha0Tang commented Nov 27, 2020

@ennauata after I running python compute_FID.py, I got two folders, i.e., the fake folder containing 50000 images, and the real folder containing only 5000 images, dose this correct?

@Ha0Tang Ha0Tang reopened this Nov 27, 2020
@ennauata
Copy link
Owner

This sounds correct @Ha0Tang. One way to compute FID would be generating one sample for each graph (5K fake) and comparing with the corresponding GT (5k real). Another way is generate multiple samples for each graph (say 10x5k=50k fake) and compare with GT graph (5k real). The later one is a bit trickier, but maybe the best we can do for measuring diversity for the same graphs, because we have only one GT for each graph.

@Ha0Tang
Copy link
Author

Ha0Tang commented Nov 29, 2020

I see @ennauata, now I have another question. Do we need to train 5 models to evaluate 5 different groups? Specifically,

  • To evaluate 1-3, we need to train a model on 4-6, 7-9, 10-12, and 13+;
  • To evaluate 4-6, we need to train a model on 1-3, 7-9, 10-12, and 13+;
  • To evaluate 7-9, we need to train a model on 1-3, 4-6, 10-12, and13+;
  • To evaluate 10-12, we need to train a model on 1-3, 4-6, 7-9, and13+;
  • To evaluate 13+, we need to train a model on 1-3, 4-6, 7-9, and 10-12.

Does this correct?

@ennauata
Copy link
Owner

Yes, one for targeting each group.

@Ha0Tang
Copy link
Author

Ha0Tang commented Nov 30, 2020

@ennauata thanks, now I have two more questions. What does --num_variations mean? and how to select an image to compare with other methods in Fig. 6 of your paper since you generate 4 images for each input graph?

@ennauata
Copy link
Owner

ennauata commented Nov 30, 2020

--num_variations is the number of samples per input graph to generate. You could control num_variations for creating an image like Figure 6. For selecting some samples, you could filter out undesired samples. A quick way to do that is:

target_graphs = [1, 3, 4]
if g not in target_graphs:
    continue
...


 

@Ha0Tang
Copy link
Author

Ha0Tang commented Nov 30, 2020

Thanks again. Can you share the generated results of other baselines (i.e., CNN-only, GCN, Ashual et al., and Johnson et al.) with me? My email is [email protected], in this way, I can directly compare these methods in my paper. Thanks a lot.

@Ha0Tang
Copy link
Author

Ha0Tang commented Dec 8, 2020

@ennauata when I run python evaluate_parallel.py, I get the following error:

Traceback (most recent call last):
File "evaluate_parallel.py", line 239, in
run_parallel(graphs)
File "evaluate_parallel.py", line 225, in run_parallel
results += Parallel(n_jobs=num_cores)(delayed(processInput)(G_pred, G_true, _id) for G_pred, G_true, _id in graphs[lower:upper])
File "/home/ht/anaconda3/envs/pytorch/lib/python3.6/site-packages/joblib/parallel.py", line 1061, in call
self.retrieve()
File "/home/ht/anaconda3/envs/pytorch/lib/python3.6/site-packages/joblib/parallel.py", line 940, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/ht/anaconda3/envs/pytorch/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/home/ht/anaconda3/envs/pytorch/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/ht/anaconda3/envs/pytorch/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
RuntimeError: can't start new thread

Any ideas to fix this bug?

@ennauata
Copy link
Owner

ennauata commented Dec 8, 2020

@Ha0Tang, this is a workaround I found for computing the metrics in parallel. My knowledge is limited in this area and I am not sure why this code is not working on your machine. I would recommend removing the parallel computation to make it work or find some python library that works in your machine for computing the metrics in parallel.

@spencer0shaw
Copy link

Hello! When I try to run evaluate_parrallel.py and variation_bbs_with_target_graph_segments_suppl.py. why the output is blank even using the model provided by you? Thank you!

@mikrocosmoss
Copy link

Hello,author,Can you share the generated results of other baselines (i.e., CNN-only, GCN, Ashual et al., and Johnson et al.) with me? My email is [email protected]!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants