Overparameterized Regression

src: (Arora et al., 2018)^Arora:2018vn, section 3 on $l_p$ regression.

Summary

They show that in the simple scalar linear regression problem, i.e. loss function

\mathbb{E}[(x,y)\sim S]{\frac{1}{p}(y - \mathbf{x}^\top \mathbf{w})^p}

if you overparameterize ever so slightly to

\mathbb{E}[(x,y)\sim S]{\frac{1}{p}(y - \mathbf{x}^\top \mathbf{w}_1 w_2)^p},

then gradient descent turns into an accelerated version.

Thoughts

I remember Arora presenting on this particular result a few years back (or something like this), and finding it very intriguing. I think this goes nicely with the theme of [[statistics-vs-ml]], though it's a slightly different angle. Essentially: we statisticians are afraid of overparameterization, because it removes specificity/leads to ambiguity. But actually, when it comes to these gradient methods, it actually helps to overparameterize!

Arora, S., Cohen, N. & Hazan, E., 2018. On the optimization of deep networks: Implicit acceleration by overparameterization. In 35th International Conference on Machine Learning, ICML 2018. Institute for Advanced Studies, Princeton, United States, pp. 372–389.↩

Tags

Overparameterized Regression

Summary

Thoughts

Tags