Divergence Minimization for Deep Direct Density Ratio Estimation,โ ,โ in International Conference on Machine Learning. โข Kato, M., Imaizumi, M., McAlinn, K., Yasui, S., and Kakehi, H. (2022), โLearning Causal Relationships from Conditional Moment Restrictions by Importance Weighting,โ in International Conference on Learning Representations. โข Kato, M., Imaizumi, M., and Minami, K. (2022), โUnified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics,โ . โข Kanamori, T., Hido, S., and Sugiyama, M. (2009), โA least-squares approach to direct importance estimation.โ Journal of Machine Learning Research, 10(Jul.):1391โ1445. โข Kiryo, R., Niu, G., du Plessis, M. C., and Sugiyama, M. (2017), โPositive-Unlabeled Learning with Non-Negative Risk Estimator,โ in Conference on Neural Information Processing Systems. โข Imbens, G. W. and Lancaster, T. (1996), โEfficient estimation and stratified sampling,โ Journal of Econometrics, 74, 289โ318. โข Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J.(2018), โDouble/debiased machine learning for treatment and structural parameters,โ Econometrics Journal, 21, C1โC68. โข Good, I. J. and Gaskins, R. A. (1971), โNonparametric Roughness Penalties for Probability Densities,โ Biometrika, 58, 255โ277. โข Sugiyama, M., Nakajima, S., Kashima, H., von Bรผnau, P., and Kawanabe, M. (2007). Direct importance estimation with model selection and its application to covariate shift adaptation. In Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS'07). Curran Associates Inc., Red Hook, NY, USA, 1433โ1440. โข Sugiyama, M., Suzuki, T., and Kanamori, T. (2011), โDensity Ratio Matching under the Bregman Divergence: A Unified Framework of Density Ratio Estimation,โ Annals of the Institute of Statistical Mathematics, 64.โ (2012), Density Ratio Estimation in Machine Learning, New York, NY, USA: Cambridge University Press, 1st ed. โข Sugiyama, M., (2016), โIntroduction to Statistical Machine Learning.โ โข Silverman, B. W. (1982), โOn the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method,โ The Annals of Statistics, 10, 795 โ 810. 2 โข Suzuki, T., Sugiyama, M., Sese, Jun., and Kanamori, T. (2008). Approximating mutual information by maximum likelihood density ratio estimation. In Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008,volume 4 of Proceedings of Machine Learning Research, pp. 5โ20. PMLR. โข Uehara, M., Sato, I., Suzuki, M., Nakayama, K., and Matsuo, Y. (2016), โGenerative Adversarial Nets from a Density Ratio Estimation Perspective.โ โข Tran, D., Ranganath, R., and Blei, D. M. (2017), โHierarchical Implicit Models and Likelihood-Free Variational Inference,โ in International Conference on Neural Information, Red Hook, NY, USA, p. 5529โ 5539. โข Nguyen, X., Wainwright, M. J., and Jordan, M. (2008), โEstimating divergence functionals and the likelihood ratio by penalized convex risk minimization,โ in Conference on Neural Information Processing Systems, vol. 20. โข Whitney K. Newey and James L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71(5):1565โ1578, 2003. โข Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. (2011), โStatistical outlier detection using direct density ratio estimation,โ Knowledge and Information Systems, 26, 309โ336 โข Lai, T. and Robbins, H. (1985), โAsymptotically efficient adaptive allocation rules,โ Advances in Applied Mathematics โข Kaufmann, E., Cappe, O., and Garivier, A. (2016), โOn the Complexity of Best-Arm Identification in Multi-Armed ยด Bandit Models,โ Journal of Machine Learning Research, 17, 1โ42 โข Fan, X., Grama, I., and Liu, Q. (2013), โCramer large deviation expansions for martingales under Bernsteinโs condi- ยด tion,โ Stochastic Processes and their Applications, 123, 3919โ3942. โข Fan, X., Grama, I., and Liu, Q. (2014), โA generalization of Cramer large deviations for martingales,โ ยด Comptes Rendus Mathematique, 352, 853โ 858. โข Shimodaira, H. (2000), โImproving predictive inference under covariate shift by weighting the log-likelihood function,โ Journal of statistical planning and inference, 90, 227โ244. 52