Topic model an introduction

  • Published on
    11-Aug-2014

  • View
    74

  • Download
    4

DESCRIPTION

This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~

Transcript

  • Topic Model Text Mining Yueshen Xu xyshzjucs@zju.edu.cn Middleware, CCNT, ZJU Middleware, CCNT, ZJU6/11/2014 Text Mining&NLP&ML 1, Yueshen Xu
  • Outline Basic Concepts Application and Background Famous Researchers Language Model Vector Space Model (VSM) Term Frequency-Inverse Document Frequency (TF-IDF) Latent Semantic Indexing (LSA) Probabilistic Latent Semantic Indexing (pLSA) Expectation-Maximization Algorithm (EM) & Maximum- Likelihood Estimation (MLE) 6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu
  • Outline Latent Dirichlet Allocation (LDA) Conjugate Prior Possion Distribution Variational Distribution and Variational Inference (VD &VI) Markov Chain Monte Carlo (MCMC) Metropolis-Hastings Sampling (MH) Gibbs Sampling and GS for LDA Bayesian Theory v.s. Probability Theory 6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu
  • Concepts Latent Semantic Analysis Topic Model Text Mining Natural Language Processing Computational Linguistics Information Retrieval Dimension Reduction Expectation-Maximization(EM) 6/11/2014 Middleware, CCNT, ZJU Information Retrieval Computational Linguistics Natural Language Processing LSA/Topic Model Text Mining LSA/Topic Model Data Mining Reduction Dimension Machine Learning EM 4 Machine Translation Aim:find the topic that a word or a document belongs to Latent Factor Model , Yueshen Xu
  • Application LFM has been a fundamental technique in modern search engine, recommender system, tag extraction, blog clustering, twitter topic mining, news (text) summarization, etc. Search Engine PageRank How important.this web page? LFM How relevance.this web page? LFM How relevancethe users query vs. one document? Recommender System Opinion Extraction Spam Detection Tag Extraction 6/11/2014 5 Middleware, CCNT, ZJU Text Summarization Abstract Generation Twitter Topic Mining Text: Steven Jobs had left us for about two years..the apples price will fall down. , Yueshen Xu
  • Famous Researcher 6/11/2014 6 Middleware, CCNT, ZJU David Blei, Princeton, LDA Chengxiang Zhai, UIUC, Presidential Early Career Award W. Bruce Croft, UMA Language Model Bing Liu, UIC Opinion Mining John D. Lafferty, CMU, CRF&IBM Thomas Hofmann Brown, pLSA Andrew McCallum, UMA, CRF&IBM Susan Dumais, Microsoft, LSI , Yueshen Xu
  • Language Model Unigram Language Model == Zero-order Markov Chain Bigram Language Model == First-order Markov Chain N-gram Language Model == (N-1)-order Markov Chain Mixture-unigram Language Model 6/11/2014 Middleware, CCNT, ZJU sw i i MwpMwp )|()|( Bag of Words(BoW) No order, no grammar, only multiplicity sw ii i MwwpMwp )|()|( ,1 8 w N M w N M z = () =1 ( |) , Yueshen Xu
  • 9 Vector Space Model A document is represented as a vector of identifier Identifier Boolean: 0, 1 Term Count: How many times Term Frequency: How frequentin this document TF-IDF: How importantin the corpus most used Relevance Ranking First used in SMART(Gerard Salton, Cornell) 6/11/2014 Middleware, CCNT, ZJU ),,,( ),,,( 21 21 tqqq tjjjj wwwq wwwd Gerard Salton Award(SIGIR) qd qd j j cos , Yueshen Xu
  • TF-IDF Mixture language model Linear combination of a certain distribution(Gaussian) Better Performance TF: Term Frequency IDF: Inversed Document Frequency TF-IDF 6/11/2014 Middleware, CCNT, ZJU k kj ij ij n n tf Term i, document j, count of i in j ) |}:{|1 log( dtDd N idf i i N documents in the corpus iijjij idftfDdtidftf ),,( How important in this document How important in this corpus 10, Yueshen Xu
  • Latent Semantic Indexing Challenge Compare document in the same concept space Compare documents across languages Synonymy, ex: buy - purchase, user - consumer Polysemy, ex; book - book, draw - draw Key Idea Dimensionality reduction of word-document co-occurrence matrix Construction of latent semantic space 6/11/2014 Middleware, CCNT, ZJU Defects of VSM Word Document Word DocumentConcept VSM LSI 11, Yueshen Xu Aspect Topic Latent Factor
  • Singular Value Decomposition LSI ~= SVD U, V: orthogonal matrices :the diagonal matrix with the singular values of N 6/11/2014 Middleware, CCNT, ZJU12 T VUN U t * m Document Terms t * d m* m m* d N U V k < m || k 3000 Hierarchical Bayesian model; Bayesian pLSI 6/11/2014 20 Middleware, CCNT, ZJU z w N M Iterative times Generative Process of a document d in a corpus according to LDA Choose N ~ Poisson(); Why? For each document d={1, 2 } Choose ~(); Why? For each of the N words in d: a) Choose a topic ~ Why? b) Choose a word from , , a multinomial probability conditioned on Why ACM-Infosys Awards , Yueshen Xu
  • Latent Dirichlet Allocation LDA(Cont.) 6/11/2014 21 Middleware, CCNT, ZJU z w N M K Generative Process of a document d in LDA Choose N ~ Poisson(); Not important For each document d={1, 2 } Choose ~(); = 1, 2 , = , K is fixed, 1 = 1, ~ For each of the N words in d: a) Choose a topic ~ b) Choose a word from , , a multinomial probability conditioned on one word one topic one document multi-topics = 1, 2 z= 1, 2 For each word there is a pLSA: the number of p(z|d) is linear to the number of documents overfitting Regularization M+K Dirichlet-Multinomial , Yueshen Xu
  • Latent Dirichlet Allocation 6/11/2014 22 Middleware, CCNT, ZJU, Yueshen Xu
  • Conjugate Prior & Distributions Conjugate Prior: If the posterior p(|x) are in the same family as the p(), the prior and posterior are called conjugate distributions, and the prior is called a conjugate prior of the likelihood p(x|) : p(|x) p(x|)p() Distributions Binomial Distribution Beta Distribution Multinomial Distribution Dirichlet Distribution Binomial & Beta Distribution Binomial Bin(m|N,)=C(m,N)m(1-)N-m :likelihood C(m,N)=N!/(N-m)!m! Beta(|a,b) 6/11/2014 23 Middleware, CCNT, ZJU 11- )1( )()( )( ba ba ba 0 1 )( dteta ta Why do prior and posterior need to be conjugate distributions? , Yueshen Xu
  • Conjugate Prior & Distributions 6/11/2014 24 Middleware, CCNT, ZJU 11- )1( )()( )( )1(),(),,,|( ba lm ba ba lmmCbalmp 11- )1( )()( )( ),,,|( blam blam blam balmp Beta Distribution! Parameter Estimation Multinomial & Dirichlet Distribution x/ is a multivariate, ex, = (0,0,1,0,0,0): event of 3 happens The probabilistic distribution of in only one event : = =1 , = (1, 2 , ) , Yueshen Xu
  • Conjugate Prior & Distributions Multinomial & Dirichlet Distribution (Cont.) Mult(1, 2, , |, )= ! 1!2! ! 1 1 2 12 3 =1 1 =1 : the likelihood function of 6/11/2014 25 Middleware, CCNT, ZJU Mult: The exact probabilistic distribution of and In Bayesian theory, we need to find a conjugate prior of for Mult, where 0 < < 1, =1 = 1 Dirichlet Distribution = (0) 1 =1 1 a vector Hyper-parameter: parameter in probabilistic distribution function (pdf) , Yueshen Xu
  • Conjugate Prior & Distributions Multinomial & Dirichlet Distribution (Cont.) , (|) =1 + 1 6/11/2014 26 Middleware, CCNT, ZJU Dirichlet? , = + = (0+) 1+1 + =1 + 1 Why? Gamma is a mysterious function Dirichlet! ~ , = 0 1 + 1(1 ) 1 = + ~ = 1 =1 , 2 =1 , , =1 , Yueshen Xu
  • Poisson Distribution Why Poisson distribution? The number of births per hour during a given day; the number of particles emitted by a radioactive source in a given time; the number of cases of a disease in different towns For Bin(n,p), when n is large, and p is small p(X=k) ! , = 1 () = + 1 = ! ( + 1 = !) (Poisson discrete; Gamma continuous) 6/11/2014 27 Middleware, CCNT, ZJU Poisson Distribution | = ! Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume, length .etc , Yueshen Xu
  • Solution for LDA LDA(Cont.) , : corpus-level parameters : document-level variable z, w:word-level variables Conditionally independent hierarchical models Parametric Bayes model 6/11/2014 28 Middleware, CCNT, ZJU knkk ppp ppp ppp 21 n22221 n112111 2 1 1 2 2 p , , , = (|) =1 ( | , ) Solving Process ( = ) p , = (|) =1 ( | , ) multiple integral p , = =1 ( |) =1 ( | , ) d , Yueshen Xu
  • Solution for LDA 6/11/2014 29 Middleware, CCNT, ZJU The most significant generative model in Machine Learning Community in the recent ten years , = ( ) () =1 1 =1 =1 =1 ( ) p , = (|) =1 ( | , ) Rewrite in terms of model parameters = 1, 2, ; :What we need to solve out Variational Inference Gibbs Sampling Deterministic Inference Stochastic Inference Why variational inference?Simplify the dependency structure Why sampling? Approximate the statistical properties of the population with those of samples , Yueshen Xu
  • Variational Inference Variational Inference (Inference through a variational distribution), VI VI aims to use an approximating distribution that has a simpler dependency structure than that of the exact posterior distribution 6/11/2014 30 Middleware, CCNT, ZJU (|) () true posterior distribution variational distribution Dissimilarity between P and Q? Kullback-Leibler Divergence (| = , = , + () , =< (, ) >Q(H) + Entropy of Q , Yueshen Xu
  • Variational Inference 6/11/2014 31 Middleware, CCNT, ZJU = , , , , = , , = = (|) =1 ( | ) , = arg min(( , , || , , , ))but we dont know the exact analytical form of the above KL log , = , , , = , , , (, ) (, ) , , , , (, ) = , , , , = (, ; , ) log , = , ; , + KL minimize KL == maximize L ,z: independent (approximately) for facilitating computation , Yueshen Xu variational distribution
  • Variational Inference 6/11/2014 32 Middleware, CCNT, ZJU , ; , = + + , [()] = =1 1 + =1 =1 () = ( =1 ) = =1 =1 [] = =1 =1 ( ( =1 ) ) , = =1 =1 =1 [] = =1 =1 =1 , Yueshen Xu
  • Variational Inference 6/11/2014 33 Middleware, CCNT, ZJU is much like = =1 =1 Maximize L with respect to : = ( ( =1 ))+ - log + ( =1 1) Lagrangian Multiplier Taking derivatives with respect to : = ( ( =1 ))+-log 1 + =0 exp( =1 ) , Yueshen Xu
  • Variational Inference You can refer to more in the original paper. Variational EM Algorithm Aim: ( , )=arg max =1 |, Initialize , E-Step: compute , through variational inference for likelihood approximation M-Step: Maximize the likelihood according to , End until convergence 6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu
  • Markov Chain Monte Carlo MCMC Basic: Markov Chain (First-order) Stationary Distribution Fundament of Gibbs Sampling General: + = 1, 2, = (+ = |) First-Order: +1 = 1, 2, = (+1 = |) One-step transition probabilistic matrix 6/11/2014 35 Middleware, CCNT, ZJU |)||(|...)2|(|)1|(| )12(p...)22(p)12(p |)|1(...)21()11(p SSpSpSp Spp P Xm Xm+1 , Yueshen Xu
  • Markov Chain Monte Carlo Markov Chain Initialization probability: 0 = {0 1 , 0 2 , , 0(||)} = 1 = 2 2 = = 0 : Chapman-Kolomogrov equation Central-limit Theorem: Under the premise of connectivity of P, lim = ; = =1 || lim 0 = (1) (||) (1) (||) = { 1 , 2 , , , , (||)} 6/11/2014 36 Middleware, CCNT, ZJU Stationary Distribution 0~0 1~1 ~ +1~ +2~ sample Convergence Stationary Distribution , Yueshen Xu
  • Markov Chain Monte Carlo MCMC Sampling We should construct the relationship between () and MC transition process Detailed Balance Condition In a common MC, if for , , = (j) , , () is the stationary distribution of this MC Prove: =1 = =1 = = is the solution of the equation = Done For a common MC(q(i,j), q(j|i), q(ij)), and for any probabilistic distribution p(x) (the dimension of x is arbitrary) Transformation 6/11/2014 37 Middleware, CCNT, ZJU , , = (, )(, ) Q(i,j) Q(j,i) , = (, ), , = (, ), necessary condition , Yueshen Xu
  • Markov Chain Monte Carlo MCMC Sampling(cont.) Step1: Initialize: 0 = 0 Step2: for t = 0, 1, 2, = , (|) ( ) sample u from Uniform[0,1] If < , = , Xt+1 = y else Xt+1 = xt 6/11/2014 38 Middleware, CCNT, ZJU Metropolis-Hastings Sampling Step1: Initialize: 0 = 0 Step2: for t = 0, 1, 2, n, n+1, n+2 = , on Burn-in Period Convergence , Yueshen Xu
  • Gibbs Sampling sample u from Uniform[0,1] If < , = { , 1} , Xt+1 = y else Xt+1 = xt 6/11/2014 39 Middleware, CCNT, ZJU Not suitable with regard to high dimensional variables Gibbs Sampling(Two Dimensions,(x1,y1)) A(x1,y1), B(x1,y2) 1, 1 2 1 = 1 1 1 (2|1) 1, 2 1 1 = 1 2 1 (1|1) 1, 1 2 1 = 1, 2 1 1 2 1 = 1 1 A(x1,y1) B(x1,y2) C(x2,y1) D 2 1 = 1 1 , Yueshen Xu
  • Gibbs Sampling Gibbs Sampling(Cont.) We can construct the transition probabilistic matrix Q accordingly = ( |1), if = = 1 = ( |1), if = = 1 = 0, else 6/11/2014 40 Middleware, CCNT, ZJU A(x1,y1) B(x1,y2) C(x2,y1) D Detailed Balance Condition: = ( ) Gibbs Sampling(in two dimension) Step1: Initialize: 0 = 0, 0 = 0 Step2: for t = 0, 1, 2, 1....

Recommended

View more >