Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Collaborative Topic Models for Users and Texts

Collaborative Topic Models for Users and Texts

Chong Wang, senior research scientist at Baidu, presents his research on probabilistic topic models.

Video here: https://www.hakkalabs.co/articles/collaborative-topic-models-for-users-and-texts

Hakka Labs

May 13, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Research

Transcript

  1. Collaborative Topic Models for Users and Texts Chong Wang Work

    done when at Princeton Univ. with David Blei Current Affilication: Baidu AI Lab Mar 26, 2015 Some materials were adapted from David Blei’s slides Chong Wang Collaborative Topic Models for Users and Texts 1
  2. Outline Introduction to Topic Modeling Collaborative Topic Models for Users

    and Texts An Interactive Demonstration Chong Wang Collaborative Topic Models for Users and Texts 2
  3. Some of My Work related to Topic Modeling Chong Wang

    Collaborative Topic Models for Users and Texts 6
  4. Text data—hierarchical organization       

                                                                                                                                                                                             (Wang & Blei, NIPS 2009) (Paisley, Wang, Blei & Jordan, PAMI 2014) Chong Wang Collaborative Topic Models for Users and Texts 7
  5. Image data—classification and annotation (Wang, Blei & Fei-Fei, CVPR 2009)

    class: snowboarding annotations: skier, ski, tree, water, boat, building, sky, residential area predicted class: snowboarding predicted annotations: athlete, sky, tree, water, plant, ski, skier (b) of our model. Nodes represent random variables; edges denote possible depende d structure. Note that in this model, the image class c and image annotation wm Chong Wang Collaborative Topic Models for Users and Texts 8
  6. Usage data—document/music recommendation (Wang & Blei, KDD 2011) Document recommendation

    (Weston,Wang, Weiss & Berenzweig, ICML 2012) Music recommendation Chong Wang Collaborative Topic Models for Users and Texts 9
  7. Network data—community detection (Gopalan, Wang & Blei, NIPS 2013) BARABASI,

    A JEONG, H NEWMAN, M SOLE, R PASTORSATORRAS, R HOLME, P NETSCIENCE COLLABORATION NETWORK POLITICAL BLOG NETWORK AMP MMSB AMP Figure 1: We visualize the discovered community structure and node popularities in a giant component of the netscience collaboration network [22] (Left). Each link denotes a collaboration between two authors, colored Chong Wang Collaborative Topic Models for Users and Texts 10
  8. Topic modeling Documents exhibit multiple topics (themes). gene 0.04 dna

    0.02 genetic 0.01 … gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Topics Documents Topic proportions and assignments Figure 1: The intuitions behind latent Dirichlet allocation. We assume that some number of “topics,” which are distributions over words, exist for the whole collection (far left). Each document is assumed to be generated as follows. First choose a distribution over the topics (the histogram at right); then, for each word, choose a topic assignment (the colored coins) and choose the word from the corresponding topic. The topics and topic assignments life 0.02 evolve 0.01 organism 0.01 … data 0.02 number 0.02 computer 0.01 … Topics Documents Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 12
  9. Latent Dirichlet allocation (LDA), Blei, et al., 2003 b K

    Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Chong Wang Collaborative Topic Models for Users and Texts 13
  10. Latent Dirichlet allocation (LDA), Blei, et al., 2003 b a

    ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions D Chong Wang Collaborative Topic Models for Users and Texts 13
  11. Latent Dirichlet allocation (LDA), Blei, et al., 2003 z w

    b N a ✓ K Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 13
  12. LDA model: inference z w b N a ✓ K

    Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14
  13. LDA model: inference z w b N a ✓ K

    Bayesian model inference ….. input output system ….. cortex cortical areas ….. Topics Topic proportions Topic assignments D Chong Wang Collaborative Topic Models for Users and Texts 14
  14. Example: a 200-topic LDA model Data: article titles+abstracts from CiteUlike.

    16,980 articles 1.6M words 8K unique terms Chong Wang Collaborative Topic Models for Users and Texts 15
  15. Learned topics gene genes expression tissues regulation coexpression tissuespecific expressed

    tissue regulatory nodes wireless protocol routing protocols node sensor peertopeer scalable hoc distribution random probability distributions sampling stochastic markov density estimation statistics learning machine training vector learn machines kernel learned classifiers classifier wireless gene probability classifier Chong Wang Collaborative Topic Models for Users and Texts 16
  16. Learned topic proportions for one article relative importance give original

    respect obtain ranking large small numbers larger extremely amounts smaller web semantic pages page metadata standards rdf xml topic proportions Chong Wang Collaborative Topic Models for Users and Texts 17
  17. Learned topic proportions for another one estimate estimates likelihood maximum

    estimated missing distribution random probability distributions sampling algorithm signal input signals output exact performs topic proportions Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. Chong Wang Collaborative Topic Models for Users and Texts 18
  18. People read documents These user-text data tell us how people

    read documents. Chong Wang Collaborative Topic Models for Users and Texts 20
  19. People read documents These user-text data tell us how people

    read documents. Given these data, we hope to Help people find documents that they are interested in Learn about what the documents mean to the people who read them Learn about the people reading the documents. Chong Wang Collaborative Topic Models for Users and Texts 20
  20. Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION

    Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23
  21. Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION

    Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23
  22. Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION

    Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23
  23. Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION

    STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23
  24. Collaborative topic models (Wang & Blei, KDD 2011) STATS VISION

    STATS VISION Maximum Likelihood from Incomplete Data via the EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood estimates when the observations can be viewed as incomplete data. Since each iteration of the algorithm consists of an expectation step followed by a maximization step we call it the EM algorithm. The EM process is remarkable in part because of the simplicity and generality of the associated theory, and in part because of the wide range of examples which fall under its umbrella. When the underlying complete data come from an exponential family whose maximum-likelihood estimates are easily computed, then each maximization step of an EM algorithm is likewise easily computed. The term "incomplete data" in its general form implies the existence of two sample spaces %Y and X and a many-one mapping from3 to Y. The observed data y are a realization from C Y . The corresponding x in X is not observed directly, but only indirectly through y. More specifically, we assume there is a mapping x+ y(x) from X to Y, and that x is known only to lie in X(y), the subset of X determined by the equation y = y(x), where y is the observed data. We refer to x as the complete data even though in certain examples x includes what are traditionally called parameters. We postulate a family of sampling densities f(x I +) depending on parameters and derive its corresponding family of sampling densities g(y[+). The complete-data specification f(...1 ...) is related to the incomplete-data specification g( ...I ...) by (1.1) The EM algorithm is directed at finding a value of + which maximizes g(y 1 +) g'iven an observed y, but it does so by making essential use of the associated family f(xl+). Notice that given the incomplete-data specification g(y1 +), there are many possible complete-data specificationsf(x)+) that will generate g(y 1 +). Sometimes a natural choice will be obvious, at other times there may be several different ways of defining the associated f(xl+). Each iteration of the EM algorithm involves two steps which we call the expectation step (E-step) and the maximization step (M-step). The precise definitions of these steps, and their associated heuristic interpretations, are given in Section 2 for successively more general types of models. Here we shall present only a simple numerical example to give the flavour of the method. Recommend to STATS people Recommend to both STATS and VISION people The EM paper Chong Wang Collaborative Topic Models for Users and Texts 23
  25. Collaborative topic models STATS VISION STATS VISION Topic proportions +

    Corrections = Article representation The user behavior changes the way we should look at the data. Chong Wang Collaborative Topic Models for Users and Texts 24
  26. Collaborative topic models z w b N a ✓ I

    J ✏ u r v u K Topic proportions Corrections User preferences Document words Ratings Topic Modeling Matrix Factorization Topics C. Wang and D. Blei, KDD 2011 P. Gopalan, L. Charlin and D. Blei, NIPS 2014 (a better formulation) Chong Wang Collaborative Topic Models for Users and Texts 25
  27. The data From citeulike.org 5.5K users and 17K research articles

    with abstracts From mendeley.com 80K users and 261K research articles with abstracts Chong Wang Collaborative Topic Models for Users and Texts 26
  28. Two types of recommendations Users Articles Maximum likelihood from incomplete

    data via the EM algorithm Conditional random fields Introduction to variational methods for graphical models The mathematics of statistical machine translation Your new article In-matrix prediction Out-of-matrix prediction Chong Wang Collaborative Topic Models for Users and Texts 27
  29. Recommendation performance—CiteULike in−matrix out−of−matrix 0.2 0.4 0.6 0.8 q q

    q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 28
  30. Recommendation performance—Mendeley in−matrix out−of−matrix 0.00 0.05 0.10 0.15 0.20 0.25

    q q q q q q q q q q q q q q q q q q q q 50 100 150 200 50 100 150 200 number of recommended articles recall method q CoTM LDA MF Chong Wang Collaborative Topic Models for Users and Texts 29
  31. More than recommendation Maximum Likelihood from Incomplete Data via the

    EM Algorithm By A. P. DEMPSTER, N. M. LAIRD and D. B. RDIN Harvard University and Educational Testing Service [Read before the ROYAL STATISTICAL at a meeting organized by the RESEARCH SOCIETY SECTION on Wednesday, December 8th, 1976, Professor S. D. SILVEY in the Chair] A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis. Keywords : MAXIMUM LIKELIHOOD ;INCOMPLETE DATA ;EM ALGORITHM ;POSTERIOR MODE 1. INTRODUCTION THIS paper presents a general approach to iterative computation of maximum-likelihood Chong Wang Collaborative Topic Models for Users and Texts 30
  32. Before users read it, CiteULike 0.0 0.1 0.2 0.3 0

    100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efficient, problems Chong Wang Collaborative Topic Models for Users and Texts 31
  33. After users read it, CiteULike 0.0 0.1 0.2 0.3 0

    100 200 300 400 500 Topic Weight estimation, likelihood, maximum, parameters, methods, estimators algorithm, algorithms, optimization, problem, efficient, problems image, images, segmentation, algorithm, registration, camera bayesian, model, inference, models, probability, probabilistic Chong Wang Collaborative Topic Models for Users and Texts 32
  34. Another article from CiteULike Phase-of-firing coding of natural visual stimuli

    in primary visual cortex. Topic Weight 0.0 0.1 0.2 0.3 0.4 0.5 • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • 0 50 100 150 200 neurons, responses, neuronal, spike, cortical, stimuli, stimulus Chong Wang Collaborative Topic Models for Users and Texts 33
  35. More than recommendation Users Articles Maximum likelihood from incomplete data

    via the EM algorithm Conditional Random Fields Introduction to Variational Methods for Graphical Models The Mathematics of Statistical Machine Translation We can look at posterior estimates to find Widely read articles in a field Articles in a field that are widely read in other fields Articles from other fields that are widely read in a field These are possible through interpretable latent topics. Chong Wang Collaborative Topic Models for Users and Texts 34
  36. Topic: Maximum Likelihood Topic estimates, likelihood, maximum, parameters, method about

    this topic, popular in this topic Maximum Likelihood Estimation of Population Parameters Bootstrap Methods: Another Look at the Jackknife R. A. Fisher and the Making of Maximum Likelihood about this topic, popular in other topics Maximum Likelihood from Incomplete Data with the EM Algorithm Bootstrap Methods: Another Look at the Jackknife Tutorial on Maximum Likelihood Estimation NOT about this topic, popular in this topic Random Forests Identification of Causal Effects Using Instrumental Variables Matrix Computations Chong Wang Collaborative Topic Models for Users and Texts 35
  37. Topic: Network science Topic networks, topology, connected, nodes, links, degree

    about this topic, popular in this topic Assortative Mixing in Networks Characterizing the Dynamical Importance of Network Nodes and Links Subgraph Centrality in Complex Networks about this topic, popular in other topics Assortative Mixing in Networks The Structure and Function of Complex Networks Statistical Mechanics of Complex Networks NOT about this topic, popular in this topic Power Law Distributions in Empirical Data Graph Structure in the Web The Orgins of Bursts and Heavy Tails in Human Dynamics Chong Wang Collaborative Topic Models for Users and Texts 36
  38. The “corrections” phenomenon is not alone (Neiswanger, Wang, Ho &

    Xing, UAI 2014) Initial Topics built, side, large, design italy, italian, china, russian church, christ, jesus, god 0 20 40 60 80 100 120 140 160 0.05 0.00 0.05 0.10 0.15 0.20 0.25 church, christ, jesus, god Topics after Random Offsets built, side, large, design english, knight, translated, restoration italy, italian, china, russian 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Italy italy, italian, china, russian building, built, tower, architecture built, side, large, design church, christ, jesus, god built, side, large, design church, christ, jesus, god Offsets Learned from Links (Random Offsets) english, knight, translated, restoration 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Chapel built, side, large, design church, christ, jesus, god building, built, tower, architecture Text: "The Sistine Chapel is a large chapel in the Vatican Palace, the place in Italy where the Pope lives. The Chapel was built between 1473 and 1481 by Giovanni dei Dolci for Pope Sistus IV...The Sistine Chapel is famous for its fresco paintings by the Renaissance painter Michelangelo..." Sistine Chapel (Simple English Wikipedia) In-Links (Citing Documents): (1) Raphael, (2) Ten Commandments, (3) Chapel, (4) Apostolic Palace, (5) St. Peter's Basilica Predicted Links: (1) Chapel, (2) Christian, (3) Italy 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Christian philosophy, ideas, study, knowledge people, make, place, live church, christ, jesus, god Similar observation in citation networks. Chong Wang Collaborative Topic Models for Users and Texts 37
  39. Summary Collaborative topic models : Blend content-based and rating-based recommendations

    Discover patterns in how people read / how documents are read Suggest new ways of doing document recommendations Chong Wang Collaborative Topic Models for Users and Texts 38
  40. Thank you! BTW: Baidu AI lab is hiring research scientists

    and software engineers! chongwang@baidu.com Chong Wang Collaborative Topic Models for Users and Texts 40