method for nonsmooth nonconvex optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 5564–5574. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/ 7800-a-simple-proximal-stochastic-gradient-method-for-nonsmooth-n pdf. Y. Liu, F. Shang, and J. Cheng. Accelerated variance reduced stochastic admm. In Thirty-First AAAI Conference on Artificial Intelligence, pages 2287–2293, 2017. S. Minsker. On some extensions of Bernstein’s inequality for self-adjoint operators. Statistics & Probability Letters, 127:111–119, 2017. J. F. Mota, J. M. Xavier, P. M. Aguiar, and M. P¨ uschel. A proof of convergence for the alternating direction method of multipliers applied to polyhedral-constrained functions. arXiv preprint arXiv:1112.2295, 2011. T. Murata and T. Suzuki. Gradient descent in rkhs with importance labeling, 2020. L. M. Nguyen, J. Liu, K. Scheinberg, and M. Tak´ aˇ c. SARAH: A novel method for machine learning problems using stochastic recursive gradient. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2613–2621, International Convention Centre, Sydney, Australia, 06–11 42 / 42