You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for publishing the synthetic dataset for cut vertex and edge.
I'm trying to run the experiment but found one issue that I would like to seek clarification from you.
In synthetic_wrapper.py, you generate a replacement graph for every item in PygPCQM4Mv2Dataset (line 395), so in total there are 3746620 graphs. But because the new graphs are generated from a small set of parameters using four generators counter_example_1_1, counter_example_1_2, counter_example_2_1, counter_example_2_2 and counter_exmple_3, most of the 3746620 graphs are actually duplicate.
For example, I counted unique graphs by comparing the edge_index tensor, only 1024 out of 3746620 are unique, which means only 0.27% of graphs are useful. I think I will find more duplicates if I ran more serious isomorphic tests on the 1024 graphs.
It feels like a waste of time to benchmark on the total 3746620 graphs, why not create a smaller set of unique graphs and just benchmark on it?
The text was updated successfully, but these errors were encountered:
Also just to add to my question, this setup might leak test samples to training, as what is in the test set might be duplicates of that in training set
Thank you for publishing the synthetic dataset for cut vertex and edge.
I'm trying to run the experiment but found one issue that I would like to seek clarification from you.
In
synthetic_wrapper.py
, you generate a replacement graph for every item inPygPCQM4Mv2Dataset
(line 395), so in total there are 3746620 graphs. But because the new graphs are generated from a small set of parameters using four generatorscounter_example_1_1
,counter_example_1_2
,counter_example_2_1
,counter_example_2_2
andcounter_exmple_3
, most of the 3746620 graphs are actually duplicate.For example, I counted unique graphs by comparing the
edge_index
tensor, only 1024 out of 3746620 are unique, which means only 0.27% of graphs are useful. I think I will find more duplicates if I ran more serious isomorphic tests on the 1024 graphs.It feels like a waste of time to benchmark on the total 3746620 graphs, why not create a smaller set of unique graphs and just benchmark on it?
The text was updated successfully, but these errors were encountered: