Implicit stochastic gradient descent (ISGD) is a variant to standard SGD, whereby we use implicit updates in the main update of the algorithm. This adds numerical stability and robustness to learning rate specification, which is a known problem with the standard method. The contrast SGD-ISGD has natural connections to (i) forward-backward Euler methods in numerical analysis; (ii) standard-proximal methods in optimization; and (iii) LMS-NLMS in signal processing.

Tutorials

Please check the “SGD series” of my Github pages for practical tutorials in SGD.

Papers

Toulis, P., Horel, T., Airoldi, EM. (2020). The proximal Robbins-Monro method. Journal of the Royal Statistical Society, Series B (to appear). pdf code slides www arxiv | (winner of IBM 2018 award)

Chee, J. and Toulis, P. (2018). Convergence diagnostics for stochastic gradient descent with constant step size. AISTATS'18, oral. pdf slides arxiv

Toulis, P. and Airoldi, EM. (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics, 45(4), pp. 1694-1727. pdf code slides errata supplement errata arxiv

Tran, D., Toulis, P. and Airoldi, EM. (2016). Towards stability and optimality in stochastic gradient descent. AISTATS'16. pdf arxiv www

Toulis, P. and Airoldi, EM. (2015). Scalable estimation strategies based on stochastic approximations: classical results and new insights. Statistics and Computing, 25(4), pp. 781-795. pdf www

Toulis, P., Rennie, J., and Airoldi, EM. (2014). Statistical analysis of stochastic gradient methods for generalized linear models. ICML’14, oral. pdf slides www


Code

sgd R package:

— on CRAN: https://cran.r-project.org/package=sgd

— on GitHub: https://github.com/airoldilab/sgd