-
Notifications
You must be signed in to change notification settings - Fork 0
/
03_results_exp1.tex
62 lines (50 loc) · 5.08 KB
/
03_results_exp1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
% Discussion
Table \ref{tab:ex1-baselines} shows that our method clearly outperforms all baselines. Best results on both datasets were found with the no-BP cost matrix, MLP-norm, and small weight decay for regularization. Interestingly, the only difference between both configurations is the Sinkhorn regularization constant $\lambda$, which is smaller for the synthetic graphs. However, somewhat concerning is the gap between the validation and test performance on the real-world dataset, which also appears with the model of \cite{bai2019}. This phenomenon will be discussed in the ablation study.
\begin{table}[htbp]
\addtolength{\tabcolsep}{-1pt}
\fontsize{9pt}{10.25pt}\selectfont
\centering
\renewcommand{\arraystretch}{1.2}
\begin{tabular}{|l|c|c|c|c|}
\hline
\multirow{2}{*}{} & \multicolumn{2}{c|}{Pref-Attachment} & \multicolumn{2}{c|}{AIDS} \\ \hhline{|~|-|-|-|-|}
& Val & Test & Val & Test \\ \hhline{|=|=|=|=|=|}
\cite{riba2018} & $12.2 \pm 0.2$ & $12.1 \pm 0.6$ & $15.5 \pm 0.3$ & $15.6 \pm 0.3$ \\ \hline
\cite{bai2019} & $7.7 \pm 1.0$ & $9.6 \pm 2.5$ & $4.2 \pm 0.3$ & $8.7 \pm 0.1$ \\ \hline
\cite{li2019} & $5.5 \pm 0.1 $ & $7.8 \pm 0.3$ & $10.6 \pm 0.3$ & $11.7 \pm 0.9$ \\ \hline
GDN & $4.2 \pm 0.1$ & $\boldsymbol{4.5 \pm 0.2}$ & $3.3 \pm 0.04$ & $\boldsymbol{6.2 \pm 1.0}$ \\ \hline
\end{tabular}
\caption{RMSE to ground truth GED with standard deviation across 3 runs.}
\label{tab:ex1-baselines}
\end{table}
The ablation study in Table \ref{tab:ex1-ablation} indicates that:
\begin{itemize}
\itemsep0em
\item \textbf{The Euclidean norm is superior to the manhattan norm, and a trainable MLP can improve results even further.}
\item \textbf{There comes no immediate benefit from the full-size cost matrix.} In all but one case the no-BP cost matrix is at least as good as the BP variant in validation and test performance. Even though intuitively it feels convenient to have the option to explicitly add or remove nodes, it may be the case that we can get the same result by substituting one node for another in the case of the GED. However, using the no-BP matrix could certainly lead to problems when attempting to learn "sparse matchings" where only a few nodes have a match in the other graph. Moreover, it is not surprising that the results of both variants are very similar because the Sinkhorn normalization will place a lot of weight onto the 0 entries in the cost matrix. It does that to a point where summing over the part of the rows (or columns depending on which graph has more nodes) which had 0 entries before normalization yields values close to 1. Therefore, the values that contribute to the distance come mostly from entries which also show up in our no-BP variant matrix.
\item \textbf{The AIDS validation set does not properly represent the entire distribution.} The difference in validation and test performance on the synthetic graphs can be explained by the variation between consecutive epochs. Since we stop on the best validation result it is no surprise that the test result at that exact point is slightly worse on average. In contrast, the massive gap in performance, and the high standard deviation on the AIDS test set indicate a shift of distributions between the validation and the test set. Some runs generalized well from the training data to the validation set, but they consistently performed worse on the test set across all epochs. Interestingly, not all runs behaved that way. We also collected many runs where the test performance is close to validation performance\footnote{Often these runs had higher Sinkhorn regularization but we did not run further experiments to verify that impression.}. Apparently the manhattan norm is more susceptible to overfit in this manner than the MLP norm.
\end{itemize}
\begin{table}[htbp]
\addtolength{\tabcolsep}{-1pt}
\fontsize{9pt}{10.25pt}\selectfont
\centering
\renewcommand{\arraystretch}{1.2}
\begin{tabular}{|c|c|c|c|c|c|}
\hline
\multicolumn{2}{|c|}{} & \multicolumn{2}{c|}{Pref-Attachment} & \multicolumn{2}{c|}{AIDS} \\ \hline
BP & Norm & Val & Test & Val & Test \\ \hhline{|=|=|=|=|=|=|}
\multirow{3}{*}{yes} & $p=1$ & $5.2 \pm 0.1$ & $6.4 \pm 0.7$ & $3.9 \pm 0.1$ & $17.1 \pm 10.9$ \\ \hhline{|~|-|-|-|-|-|}
& $p=2$ & $4.8 \pm 0.1$ & $5.7 \pm 0.2$ & $3.9 \pm 0.2$ & $8.6 \pm 2.6$ \\ \hhline{|~|-|-|-|-|-|}
& MLP & $4.5 \pm 0.1$ & $\boldsymbol{4.5 \pm 0.3}$ & $3.5 \pm 0.03$ & $\boldsymbol{4.7 \pm 0.1}$ \\ \hline
\multirow{3}{*}{no} & $p=1$ & $4.9 \pm 0.2$ & $5.8 \pm 0.2$ & $3.9 \pm 0.3$ & $16.2 \pm 6.0$ \\ \hhline{|~|-|-|-|-|-|}
& $p=2$ & $4.8 \pm 0.1$ & $5.7 \pm 0.6$ & $3.7 \pm 0.2$ & $6.8 \pm 3.4$ \\ \hhline{|~|-|-|-|-|-|}
& MLP & $4.2 \pm 0.1$ & $\boldsymbol{4.5 \pm 0.2}$ & $3.3 \pm 0.04$ & $6.2 \pm 1.0$ \\ \hline
\end{tabular}
\caption{Ablation Study: RMSE with standard deviation across 3 runs.}
\label{tab:ex1-ablation}
\end{table}
% Note bai has those 115 / 54 mse's
% Best settings?
% 0.0005 weight decay (not 0!)
% big learning rate
% sinkhorn_reg=0.2