Topic model, LDA and all that

  • Published on
    13-Apr-2017

  • View
    2.647

  • Download
    3

Embed Size (px)

Transcript

Topic Models, LDA and all that

Topic Models, LDA and all that2011-03-23

Just make sure we are on the same pageLDALinear Discriminant AnalysisFisher Linear Discriminant AnalysisLatent Dirichlet Allocation

Latent Dirichlet Allocation LDA --- MedLDARoadmap

posteriori approximation sampling variational optimizationKeywords

Approximation methods is useful!Gibbs SamplingVariational Methods(Convex) Optimization is Useful!Math is almighty!My afterthoughts

BetaGammaDirichletOverview

Chain rule (conditional independence)

Bayes rule

Marginal distribution

Probability Recap

Gamma

Beta

GammaBeta

Gamma and Beta function

Dirichlet

Dirichlet Distribution

In probability and statistics, the Dirichlet distribution (after Johann Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parametrized by a vector of positive reals. It is the multivariate generalization of the beta distribution. Dirichlet distributions are very often used as prior distributions in Bayesian statistics, and in fact the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. That is, its probability density function returns the belief that the probabilities of K rival events are xi given that each event has been observed i 1 times.10

Multinomial Distribution

Conjugrate distribution

Very Important!

In Bayesian probability theory, if the posterior distributions p(|x) are in the same family as the prior probability distribution p(), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood.

Let the likelihood function be considered fixed; the likelihood function is usually well-determined from a statement of the data-generating process. It is clear that different choices of the prior distribution p() may make the integral more or less difficult to calculate, and the product p(x|) p() may take one algebraic form or another. For certain choices of the prior, the posterior has the same algebraic form as the prior (generally with different parameter values). Such a choice is a conjugate prior.A conjugate prior is an algebraic convenience, giving a closed-form expression for the posterior: otherwise a difficult numerical integration may be necessary. Further, conjugate priors may give intuition, by more transparently showing how a likelihood function updates a distribution.

12

Conjugrate distribution

David Barber: Bayesian Reasoning and Machine Learning

Bayesian Network

()Bayesian Network

representation inference learningBayesian Network : problems to solve

David BarberBayesian Reasoning and Machine LearningDaphne KollerNir FriedmanProbabilistic Graphical ModelBishopPattern Recognition and Machine Learning Ch8Eric XingProbabilistic Graphical ModelsBayesian Network : where to learn more

:LDA InferenceMedLDALDATopic Model Overview

Topic Model OverviewLDA based Topic ModelsHyperspace Analogue to Language (HAL) (Lund and Burgess, 1996)Bound Encoding of the Aggregate Language Environment(BEAGLE) (Jones and Mewhort, 2007)

Researchers in Topic Model

D Blei

Andrew McCallum

Michael I. Jordan

Andrew NgJohn Lafferty

Eric Xing

Fei-Fei Li

Mark Steyvers

Researchers in Topic Model

Hanna WallachYee Whye Teh

Jun Zhu

David Mimno

Why Latent?

(latent variables )(hyperparameters)(tractable)From Radford Neals CSC2541 Bayesian Methods for Machine Learning

The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments.GoalGoal and Motivation of Topic Model

tf-idf1983tf-idf Previous Work : tf-idf

LSI1990tf-idf1983LSI: Latent Semantic Indexing(term-by-document)SVDtf-idf Previous work : LSI

LSI1990tf-idf1983pLSI1999pLSI (aka Aspect Model )Previous work : pLSI

LSI1990tf-idf1983LDA2003pLSI1999LDAbag-of-word

LDA

LDALDA in graphical model

Why it is named33

LDAAn analog : Modeling Shared Tastes in Online Communities - Laura Dietz NIPS 09 workshop

LDA

LDA

LDA procedure

LDA

LDA : to see a document

LDA

LDA : to see a document

From Jerry Zhus CS 731 Advanced Artificial IntelligenceFrequentistBayesian

LDA by Human and Computer

LDA : TopicLDA : Five topics from a 50-topic LDA model fit to Science from 1980 2002Five topics from a 50-topic LDA model fit to Science from 1980 2002

LDA : PersonasLDA : PersonasDemohttp://personas.media.mit.edu/personasWeb.html

LDA LDA : where to learn more --- Surveys David M Blei, Andrew Y Ng, and Michael I Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:9931022, 2003 David M Blei and John D Lafferty. Topic models. Taylor and Francis, 2009.Ali Daud, Juanzi Li, Lizhu Zhou, and Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 4(2):280301,January 2010.Mark Steyvers and Tom Griffith. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, July 2006.

LDA ---

LDAVariational InferenceGibbs SamplingInference --- get important parameters in LDA

Inference Methods Overview

()MCMC, Metropolis-Hasting, Gibbs, etc ()Mean Field, Belief PropagationVariational Bayes, Expectation PropagationInference Methods : Comparison of two major methods

Variational InferenceVariational OptimizationVariational Convex OptimizationThe basic idea of convexity-based variational inference is to make use of Jensens inequality to obtain an adjustable lower bound on the log likelihood. Essentially, one considers a family of lower bounds, indexed by a set of variational parameters. The variational parameters are chosen by an optimization procedure that attempts to find the tightest possible lower bound.Variational Inference

Mean field KL

Mean field variational inference

LDAVariational inference in LDA Overview

LDAVariational Inference : Beautiful math

Jensen

LDAVariational Inference : Beautiful math

LDAVariational Inference : Beautiful math

LDAVariational Inference : Beautiful math

LDAVariational Inference : Review

: Variational Inference : Where to learn moreMartin Wainwright. Graphical models and variational methods: Message-passing, convex relaxations, and all that. ICML2008 TutorialM. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, Vol. 1, Numbers 1--2, pp. 1--305, December 2008

MCMC in LDAMCMC OverviewSampling in generalWhy sampling is necessary and why it is hardImportance sampling, rejection samplingMarkov Chain Monte CarloMetropolis-Hasting, Gibbs samplingCollapsed Gibbs in LDA

Pioneers to push samplingMCMC Overview

Nicholas C. Metropolis

Josiah W. GibbsAndrey Markov

200631620051%17051.31%6764150834673540706Sampling Example : Population statistics

1 Sampling

2

Sampling Why it is so damn hard?

Rejection samplingSampling Rejection sampling

AcceptReject

Importance samplingSampling Importance sampling

In Rejection sampling, throwing away an x seems a waste, and the rejection is the only thing we know about the original distribution.

Metropolis-Hasting MethodSampling : Metropolis-Hasting Method

Markovt

Gibbs SamplingSampling : Gibbs Sampling1 2 i

Gibbs Sampling in LDA : Joint distributionGibbs Sampling in LDA

Gibbs Sampling in LDA : Joint distributionGibbs Sampling in LDA

Collapsed

Gibbs Sampling in LDA : Joint distributionGibbs Sampling in LDA

Gibbs Sampling in LDA : Marginal dist.Gibbs Sampling in LDA

Gibbs Sampling in LDA in Python CodeSampling : Gibbs Sampling code in Python for m in xrange(n_docs): for i, w in enumerate(word_indices(matrix[m, :])): z = np.random.randint(self.n_topics) self.nmz[m,z] += 1 self.nm[m] += 1 self.nzw[z,w] += 1 self.nz[z] += 1 self.topics[(m,i)] = zReally simple!

D.J.C. MacKay. Information theory, inference, and learning algorithms. Cambridge Univ Pr,2003.Gregor Heinrich. Parameter estimation for text analysis. Technical Report, 2009.Michael I. Jordan and Yair Weiss. Graphical models: Probabilistic inference.Christophe Andrieu, N De Freitas, A Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning, pages 543, 2003. Yi Wang. Distributed Gibbs Sampling of Latent Dirichlet Allocation : The Gritty Details. Technical Report, 2007.Gibbs SamplingGibbs Sampling where to learn more

Evolution of Topic Models Correlated topic models Dynamic topic models Temporal topic models

David M. Blei and John D Lafferty. Correlated Topic Models. In Advances in Neural Information Processing Systems 18, 2006.David M. Blei and John D Lafferty. A correlated topic model of Science. The Annals of Applied Statistics, 1(1):1735, 2007.David M. Blei and John D Lafferty. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning - ICML 06, pages 113120, 2006.Correlated + Dynamic TMCorrelated + Dynamic Topic Models

Correlated + Dynamic TMCorrelated + Dynamic Topic Models

Correlated TMCorrelated Topic Models

Correlated TMCorrelated Topic Models

Correlated TMCorrelated Topic Models

Dynamic TMDynamic Topic Models

DTM

Dynamic TMDynamic Topic Models Top10 words of Science and example articles of Science

Topics over timeTopics over time

Published in: KDD '06 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Topics over timeTopics over time

Topics over time : Topic discoveryTopics over time : State-of-the-Union AddressesTOTLDA

Topics over time : Topic evolutionTopics over time : Topic evolution on NIPS data

Topics over time : Co-occuring TopicsTopics over time : Topic evolution on NIPS data

Topics over time : ReviewTopics over time : Review LDA @ 2010 11

Work by Hanna WallachWork by Hanna Wallach

NIPS 09

ICML 09

Rethinking LDA : Why priors matter

Rethinking LDA : Why priors matter

David Blei and Jon D. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems, pages 122, 2008.Daniel Ramage, David Hall, Ramesh Nallapati, and C.D. Manning. Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, pages 248256. Association for Computational Linguistics, 2009.Jun Zhu, Amr Ahmed, and Eric Xing. MedLDA: Maximum Margin Supervised Topic Models. Journal of Machine Learning Research, 1:148, 2010.Supervised LDASupervised LDA

Supervised LDA Supervised LDALDA

Nave

Supervised topic modelsSupervised LDA

(Generalized Linear Model)

Supervised topic modelsSupervised LDA : Why use GLM

GLMGammaWeibull

Semi-Supervised LDASemi-Supervised LDA

UnlabeledLabeled

MedLDAMaximum Entropy Discrimination Latent Dirichlet Allocationmaximum entropy discrimination latent Dirichlet allocation

MedTMMaximum Entropy Discrimination Topic Models

MedLDALDAMedLDA : topic discovery2D embedding on 20Newsgroups data

MedLDA : classificationClassfication on 20Newsgroups data

NIPS 09 Workshop on Applications for Topic Models: Text and Beyond (@RT)

Reconstructing Pompeian Households

Reconstructing Pompeian Households

Reconstructing Pompeian Households

Reconstructing Pompeian Households

Is it so?

Reconstructing Pompeian Households

Reconstructing Pompeian Households

Reconstructing Pompeian Households

Software Analysis with Unsupervised Topic Models12,151 Java projects from Sourceforge and Apache 4,632 projects366,287 source files38.7 million lines of codewritten by 9,250 developers

Software Analysis with Unsupervised Topic Models

Software Analysis with Unsupervised Topic Models

Future DirectionLDAk-meansMedLDA LDA+ Asymmetric Dirichlet Prior(Data Augmentation)()LDA

Not covered topicsHierarchical Topic Modelsnested Chinese Restaurant ProcessIndian Buffet Process

Topic Models BackgroundGeneralTopic Models BackgroundD. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:9931022, January 2003.D. Blei and M. Jordan. Variational inference for Dirichlet process mixtures. Journal of Bayesian Analysis, 1:121144, 2006.M. Steyvers and T. Griffiths. Probabilistic Topic Models. In Latent Semantic Analysis: A Road to Meaning, T. Landauer, Mcnamara, S. Dennis, and W. Kintsch eds. Laurence Erlbaum, 2006.Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566-1581, 2006.J. Zhu, A. Ahmed and E. P. Xing. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. The 26th International Conference on Machine Learningy, 2009.

Inference and EvaluationTopic Models BackgroundAsuncion, M. Welling, P. Smyth, and Y. Teh. On Smoothing and Inference for Topic Models. In Uncertainty in Artificial Intelligence, 2009.W. Li, and A. McCallum. Pachinko Allocation: DAG-structured Mixture Models of Topic Correlations. In International Conference on Machine Learning, 2006.Porteous, A. Ascuncion, D. Newman, A. Ihler, P. Smyth, and M. Welling. Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation. In Knowledge Discovery and Data Mining, 2008.H. Wallach, I. Murray, R. Salakhutdinov and D. Mimno. Evaluation Methods for Topic Models. In International Conference on Machine Learning, 2009.M. Welling, Y. Teh and B. Kappen. Hybrid Variational/Gibbs Inference in Topic Models. In Uncertainty in Artificial Intelligence, 2008.

BiologyTopic Models BackgroundE. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed-membership stochastic blockmodels. Journal of Machine Learning Research, 9: 1981-2014, 2008.P. Agius, Y. Ying, and C. Campbell. Bayesian Unsupervised Learning with Multiple Data Types. Statistical Applications in Genetic and Molecular Biology, 3(1):27, 2009.P. Flaherty, G. Giaever, J. Kumm, Michael I. Jordan, Adam P. Arkin. A Latent Variable Model for Chemogenomic Profiling. Bioinformatics 2005 Aug 1;21(15):3286-93.S. Shringarpure and E. P. Xing. mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations. Genetics, Vol 182, issue 2, 2009.

Natural Language ProcessingTopic Models BackgroundJ. Boyd-Graber and D. Blei. Syntactic topic models. In Neural Information Processing Systems, 2009.T. G...

Recommended

View more >