Skip to content

Tags

On this page

Overparameterized Regression

src: (Arora et al., 2018)Arora:2018vn, section 3 on lpl_p regression.

Summary

They show that in the simple scalar linear regression problem, i.e. loss function

E[(x,y)S]1p(yxw)p\mathbb{E}[(x,y)\sim S]{\frac{1}{p}(y - \mathbf{x}^\top \mathbf{w})^p}

if you overparameterize ever so slightly to

E[(x,y)S]1p(yxw1w2)p,\mathbb{E}[(x,y)\sim S]{\frac{1}{p}(y - \mathbf{x}^\top \mathbf{w}_1 w_2)^p},

then gradient descent turns into an accelerated version.

Thoughts

I remember Arora presenting on this particular result a few years back (or something like this), and finding it very intriguing. I think this goes nicely with the theme of [[statistics-vs-ml]], though it's a slightly different angle. Essentially: we statisticians are afraid of overparameterization, because it removes specificity/leads to ambiguity. But actually, when it comes to these gradient methods, it actually helps to overparameterize!


  1. Arora, S., Cohen, N. & Hazan, E., 2018. On the optimization of deep networks: Implicit acceleration by overparameterization. In 35th International Conference on Machine Learning, ICML 2018. Institute for Advanced Studies, Princeton, United States, pp. 372–389.
Edit this page
Last updated on 1/29/2022