TRAINABILITY AND ACCURACY OF ARTIFICIAL NEURAL NETWORKS -ERIC VANDEN-EIJNDEN - THE COURANT INSTITUTE OF MATHEMATICAL SCIENCES, NEW YORK UNIVERSITY
The methods and models of machine learning (ML) are rapidly becoming de-facto tools for the analysis and interpretation of large data sets. Complex classification tasks such as speech and image recognition, automatic translation, decision making, etc. that were out of reach a decade ago are now routinely performed by computers with a high degree of reliability using (deep) neural networks. These performances suggest that DNNs may approximate high-dimensional functions with controllably small errors, potentially outperforming standard interpolation methods based e.g. on Galerkin truncation or finite elements that have been the workhorses of scientific computing. In support of this prospect, in this talk, Prof. Vanden-Eijnden will present results about the trainability and accuracy of neural networks, obtained by mapping the parameters of the network to a system of interacting particles relaxing on a potential determined by the loss function. This mapping can be used to prove a dynamical variant of the universal approximation theorem showing that the optimal neural network representation can be attained by (stochastic) gradient descent, with an approximation error scaling as the inverse of the network size. He will also show how these findings can be used to accelerate the training of networks and optimize their architecture, using e.g. nonlocal transport involving birth/death processes in parameter space.