Some things to think about: - what can we parallelise - how do we exploit locality, data-reuse, caching etc. Also, we've already made use of some GPyTorch features but is there anything else to consider. This might become especially important as we start thinking about ensembles.