Currently we use R^2 when tuning the models and then log R^2 and MSE when evaluating model performance.
The users should have an option to choose which metrics to choose. This should include an option to choose metrics that take predictive uncertainty into account (e.g., pick model with best calibration).