Overparameterized Neural Networks: Theory and Empirics

Here we highlight work done on the large width limit of neural networks by the creators of the Neural Tangents library, and other close collaborators within Google.

Dynamics and Generalization

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent, NeurIPS 2019.
Synopsis: Infinitely-wide deep networks behave exactly as linear models under SGD.

Finite Versus Infinite Neural Networks: an Empirical Study, NeurIPS 2020 Spotlight.

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks, NeurIPS 2020.
Synopsis: Large width network learns a linear model of data in the early stage of training.

The large learning rate phase of deep learning: the catapult mechanism, in submission.
Synopsis: Gradient descent dynamics under small vs large learning rate is separated by a phase transition as networks get wider.

Disentangling trainability and generalization in deep neural networks, ICML 2020.

Sensitivity and generalization in neural networks: an empirical study, ICLR 2018.

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization, ICML 2020.

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition, NeurIPS 2020.

Temperature check: theory and practice for training models with softmax-cross-entropy losses, preprint.

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study, ICML 2019.

Explaining Neural Scaling Laws, under review.

Correspondences with Gaussian Processes

Deep Neural Networks as Gaussian Processes, ICLR 2018.
Synopsis: Infinitely-wide deep neural networks (fully-connected) are in exact correspondence with GPs. Predictions can be obtained via Bayesian inference.

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes, ICLR 2019.
Synopsis: same result for CNNs.

Infinite attention: NNGP and NTK for deep attention networks, ICML 2020.
Synopsis: same result for attention.

On the infinite width limit of neural networks with a standard parameterization, preprint 2020.

Exact posterior distributions of wide Bayesian neural networks, ICML 2020 UDL Workshop; BayLearn 2020.
Synopsis: Infinitely-wide deep neural networks are in exact correspondence with GPs, after Bayesian inference.

Software

Neural Tangents: Fast and Easy Infinite Neural Networks in Python (github), ICLR 2020 spotlight.

Fast Finite Width Neural Tangent Kernel (github), ICML 2022.

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling (github), ICML 2022.

Fast Neural Kernel Embeddings for General Activations (github), NeurIPS 2022.

Mean Field Theory of Signal Propagation

Deep Information Propagation, ICLR 2017.

Mean Field Residual Networks: On the Edge of Chaos, NeurIPS 2017.

Dynamical Isometry and a Mean Field Theory of CNNs, ICML 2018.

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks, ICML 2018.

A Mean Field Theory of Batch Normalization, ICLR 2019.

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs, preprint 2019.

Applications

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit, ICLR 2021.

Towards NNGP-guided Neural Architecture Search, preprint 2020.

Dataset Meta-Learning from Kernel-Ridge Regression, ICLR 2021.

Random Matrix Theory and Deep Neural Networks

The Emergence of Spectral Universality in Deep Networks, AISTATS 2018.

Nonlinear random matrix theory for deep learning, NeurIPS 2017.

A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions, preprint.

Geometry of neural network loss surfaces via random matrix theory, ICML 2017.

Resurrecting the Sigmoid in Deep Learning Through Dynamical Isometry: Theory and Practice, NeurIPS 2017.

Other papers

Exponential expressivity in deep neural networks through transient chaos, NeurIPS 2016.

On the expressive power of deep neural networks, ICML 2017.

Statistical mechanics of deep learning, Annual Review of Condensed Matter Physics 2020.
Synopsis: A review paper covering error landscapes, mean field theory of signal propagation, infinite-width networks, and probabilistic models.

A Correspondence Between Random Neural Networks and Statistical Field Theory, preprint.

Information in Infinite Ensembles of Infinitely-Wide Neural Networks, Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly