Konstantion Vorontsov - Additive regularization of matrix decompositons and probabilistic topic modeling

  • Published on
    07-Jul-2015

  • View
    918

  • Download
    1

Embed Size (px)

Transcript

  • 1. Additive Regularization of Matrix Factorization for Probabilistic Topic Modeling ( ) Konstantin Vorontsov Yandex CC RAS MIPT HSE MSU Analysis of Images, Social Networks and Texts Ekaterinburg, 1012 April 2014 (voron@forecsys.ru) 1 / 51

2. 1 , , PLSA LDA. - 2 - 3 , , , (voron@forecsys.ru) 2 / 51 3. , , PLSA LDA. - () , . : W , () D (, ) ndw w W d D : p(w|t) w t p(t|d) t d : p(w|d) (voron@forecsys.ru) 3 / 51 4. , , PLSA LDA. - () , , , : (expert search), , , , , , (voron@forecsys.ru) 4 / 51 5. , , PLSA LDA. - : : d: {w1, . . . , wnd } , : , , , .. / : , () , , , , (voron@forecsys.ru) 5 / 51 6. , , PLSA LDA. - : ? ? ? ? ? ? ? ? ? ? ? (voron@forecsys.ru) 6 / 51 7. , , PLSA LDA. - , , . . , ( , 1987.) 1 . 2 . 3 , , . 4 . (voron@forecsys.ru) 7 / 51 8. , , PLSA LDA. - d p(w|d) = tT p(w|t)p(t|d) - . GC- GA- . , ( , ) . . , . ( ). )|( :)|( , , , , : 0.018 0.013 0.011 0.023 0.016 0.009 0.014 0.009 0.006 (voron@forecsys.ru) 8 / 51 9. , , PLSA LDA. - : D (di , wi , ti )n i=1 p(d, w, t) di , wi , ti : p(w|d, t) = p(w|t) : p(w|d) = tT p(w|t) wt p(t|d) td wt p(w|t) t T; td p(t|d) d D. : wt, td d. : p(w|d) ndw nd wt, td . (voron@forecsys.ru) 9 / 51 10. , , PLSA LDA. - : L (, ) = dD wd ndw ln tT wttd max , , wt 0; wW wt = 1; td 0; tT td = 1 : F W D W T TD F = p(w|d) W D , = wt W T wt =p(w|t), = td TD td =p(t|d). (voron@forecsys.ru) 10 / 51 11. , , PLSA LDA. - Probabilistic Latent Semantic Analysis [Hofmann, 1999] , , E-: ndwt = ndw wttd sT ws sd ; M-: wt = nwt nt ; nwt = dD ndwt; nt = wW nwt; td = ntd nd ; ntd = wd ndwt; nd = tT ntd ; EM- E- M- . . : ! (voron@forecsys.ru) 11 / 51 12. , , PLSA LDA. - - - : p(t|d, w) = wttd sT ws sd ndwt = ndw p(t|d, w) (d, w, t) - : wt = nwt nt dD ndwt dD wd ndwt , td = ntd nd wd ndwt wW tT ndwt , : p(t|d, w) wttd ; wt nwt; td ntd ; (voron@forecsys.ru) 12 / 51 13. , , PLSA LDA. - - : E- - : D, |T|, imax; : ; 1 wt, td d D, w W , t T; 2 i = 1, . . . , imax 3 nwt, ntd , nt, nd := 0 d D, w W , t T; 4 d D w d 5 p(t|d, w) = wttd s ws sd t T; 6 nwt, ntd , nt, nd += ndw p(t|d, w) t T; 7 wt := nwt/nt w W , t T; 8 td := ntd /nd d D, t T; (voron@forecsys.ru) 13 / 51 14. , , PLSA LDA. - EM- ( ) 1 wt w W , t T; 2 nwt := 0, nt := 0 w W , t T; 3 Dj, j = 1, . . . , J 4 nwt := 0, nt := 0 w W , t T; 5 d Dj 6 td t T; 7 8 p(t|d, w) = wttd s wssd w d, t T; 9 td := 1 nd wd ndw p(t|d, w) t T; 10 d ; 11 nwt, nt += ndw p(t|d, w) w d, t T; 12 nwt := j nwt + nwt; nt := j nt + nt w W , t T; 13 wt := nwt/nt w W , t T; (voron@forecsys.ru) 14 / 51 15. , , PLSA LDA. - [Blei, Ng, Jordan, 2003] wt p(w|t), td p(t|d): PLSA : wt = nwt nt , td = ntd nd LDA : wt = nwt + w nt + 0 , td = ntd + t nd + 0 . Asuncion A., Welling M., Smyth P., Teh Y. W. On smoothing and inference for topic models // Intl conf. on Uncertainty in Articial Intelligence, 2009. .., .. EM- // , 2013. T. 1, 6. . 657686. (voron@forecsys.ru) 15 / 51 16. , , PLSA LDA. - , Jianwen Zhang, Yangqiu Song, Changshui Zhang, Shixia Liu Evolutionary Hierarchical Dirichlet Processes for Multiple Correlated Time-varying Corpora // KDD10, July 2528, 2010. (voron@forecsys.ru) 16 / 51 17. , , PLSA LDA. - Weiwei Cui, Shixia Liu, Li Tan, Conglei Shi, Yangqiu Song, Zekai J. Gao, Xin Tong, Huamin Qu TextFlow: Towards Better Understanding of Evolving Topics in Text // IEEE Transactions On Visualization And Computer Graphics, Vol. 17, No. 12, December 2011. (voron@forecsys.ru) 17 / 51 18. , , PLSA LDA. - n- Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304. (voron@forecsys.ru) 18 / 51 19. , , PLSA LDA. - n- Shoaib Jameel, Wai Lam. An N-Gram Topic Model for Time-Stamped Documents // 35th ECIR 2013, Moscow, March 2427. pp. 292304. (voron@forecsys.ru) 19 / 51 20. , , PLSA LDA. - , Laura Dietz, Steen Bickel, Tobias Scheer. Unsupervised prediction of citation inuences // ICML-2007, Pp. 233240. (voron@forecsys.ru) 20 / 51 21. , , PLSA LDA. - D. Blei, J. Laerty. A correlated topic model of Science // Annals of Applied Statistics, 2007. Vol. 1, Pp. 17-35. (voron@forecsys.ru) 21 / 51 22. , , PLSA LDA. - I. Vulic, W. De Smet, J. Tang, M.-F. Moens. Probabilistic topic modeling in multilingual settings: a short overview of its methodology with applications // NIPS, 78 December 2012. Pp. 111. (voron@forecsys.ru) 22 / 51 23. , , PLSA LDA. - A. Chaney, D. Blei. Visualizing topic models // International AAAI Conference on Social Media and Weblogs, 2012. (voron@forecsys.ru) 23 / 51 24. , , PLSA LDA. - Jason Chuang, Christopher D. Manning, Jerey Heer. Termite: Visualization Techniques for Assessing Textual Topic Models // Advanced Visual Interfaces, 2012 (voron@forecsys.ru) 24 / 51 25. , , PLSA LDA. - , , ... Ali Daud, Juanzi Li, Lizhu Zhou, Faqir Muhammad. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, Vol. 4, No. 2., 2010, Pp. 280301. ( www.MachineLearning.ru) Topic Modeling Bibliography: http://mimno.infosci.cornell.edu/topics.html (voron@forecsys.ru) 25 / 51 26. - : = (S)(S1 ) = STT , , . . , : PLSA LDA PLSA LDA 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 , = 0.01 D D D 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 , = 0,01 D D D 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , = 0.1 D D D 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , = 0.1 D D D : ! (voron@forecsys.ru) 26 / 51 27. - , , n Ri (, ), i = 1, . . . , n . . : dD wd ndw ln tT wttd log-likelihood L (,) + n i=1 i Ri (, ) R(,) max , , wt 0; wW wt = 1; td 0; tT td = 1 i > 0 . (voron@forecsys.ru) 27 / 51 28. - - PLSA , , E-: ndwt = ndw wttd sT ws sd ; M-: wt = nwt nt ; nwt = dD ndwt + wt R wt + ; nt = wW nwt; td = ntd nd ; ntd = wd ndwt + td R td + ; nd = tT ntd R(, ) = 0 EM- PLSA. (voron@forecsys.ru) 28 / 51 29. - . P = (pi )n i=1 Q = (qi )n i=1: KL(P Q) KLi (pi qi ) = n i=1 pi ln pi qi . 1. KL(P Q) 0; KL(P Q) = 0 P = Q; 2. KL : KL(P Q()) = n i=1 pi ln pi qi () min n i=1 pi ln qi () max . 3. KL(P Q) < KL(Q P), P Q, Q P: 0 50 100 150 200 0 0.01 0.02 0.03 0.04 0 50 100 150 200 0 0.005 0.010 0.015 0.020 0 50 100 150 200 0 0.005 0.010 0.015 0.020 P PP Q Q Q KL(P Q) = 0.442 KL(Q P) = 2.966 KL(P Q) = 0.444 KL(Q P) = 0.444 KL(P Q) = 2.969 KL(Q P) = 2.969 (voron@forecsys.ru) 29 / 51 30. - 1: ( LDA) : wt w td t tT KLw (w wt ) min ; dD KLt(t td ) min . : R(, ) = 0 tT wW w ln wt + 0 dD tT t ln td max . , - LDA: wt nwt + 0w , td ntd + 0t. D.Blei, A.Ng, M.Jordan. Latent Dirichlet allocation // Journal of Machine Learning Research, 2003. Vol. 3. Pp. 9931022. (voron@forecsys.ru) 30 / 51 31. - 2: ( LDA) : 1) Td T d D0, 2) Wt W t T0. 0 wt , Wt 0 td , Td : R(, ) = 0 tT0 wWt 0 wt ln wt + 0 dD0 tTd 0 td ln td max , LDA: wt nwt + 00 wt td ntd + 00 td Nigam K., McCallum A., Thrun S., Mitchell T. Text classication from labeled and unlabeled documents using EM // Machine Learning, 2000, no. 23. (voron@forecsys.ru) 31 / 51 32. - 2: ( LDA) : R(, ) = 0 tT0 wWt 0 wt(wt)+0 dD0 tTd 0 td (td ) max . , LDA: wt nwt + 00 wtwt (wt) td ntd + 00 td td (td ). (z) = z cov(0 d , d ). : 0 td Td , td Td . (voron@forecsys.ru) 32 / 51 33. - 3: ( LDA) : wt, td . , . . w , t (?) wt, td : R(, ) = 0 tT wW w ln wt 0 dD tT t ln td max . , -LDA: wt nwt 0w + , td ntd 0t + . Varadarajan J., Emonet R., Odobez J.-M. A sparsity constraint for topic models application to temporal activity mining // NIPS-2010 Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions. (voron@forecsys.ru) 33 / 51 34. - 4: : , . p(t) = d p(d)td, KL- p(t) : R() = tT ln dD p(d)td max . , : td ntd nd nt td + . t, , nt = d w ndwt. (voron@forecsys.ru) 34 / 51 35. - 5: : , . - t: R() = 2 tT sTt wW wtws max . , : wt nwt wt sTt ws + . Tan Y., Ou Z. Topic-weak-correlated latent Dirichlet allocation // 7th Intl Symp. Chinese Spoken Language Processing (ISCSLP), 2010. Pp. 224228. (voron@forecsys.ru) 35 / 51 36. - 6: : , ( ) u, w W . Cuw , p(w|u) = Nuw Nu . wt p(w|t) , p(w|t) = u p(w|u)p(u|t) = 1 nt u Cuw nut; R(, ) = tT nt wW p(w|t) ln wt max . , : wt nwt + uW w Cuw nut. Mimno D., Wallach H. M., Talley E., Leenders M., McCallum A. Optimizing semantic coherence in topic models // Empirical Methods in Natural Language Processing, EMNLP-2011. Pp. 262272. (voron@forecsys.ru) 36 / 51 37. - 7: : ndc d c, d c. - d , c: R(, ) = d,cD ndccov(d, c) max, , : td ntd + td cD ndc tc. Dietz L., Bickel S., Scheer T. Unsupervised prediction of citation inuences // ICML 2007. Pp. 233240. (voron@forecsys.ru) 37 / 51 38. - 8: C (, , , , ,. . . ) : d : p(c|d) = tT p(c|t)p(t|d) = tT cttd . p(c|d) mdc : R(, ) = dD cC mdc ln tT cttd max . Rubin T. N., Chambers A., Smyth P., Steyvers M. Statistical topic models for multi-label document classication // Machine Learning, 2012, no. 12. (voron@forecsys.ru) 38 / 51 39. - 8: - ct. -. : p(t|d, w) = wttd sT ws sd p(t|d, c) = cttd sT cs sd -. : wt nwt nwt = dD ndw p(t|d, w) td ntd + mtd ntd = wW ndw p(t|d, w) mtd = cC mdc p(t|d, c) ct mct mct = dD mdc p(t|d, c) (voron@forecsys.ru) 39 / 51 40. - 9: : R(, ) = dD cC mdc ln tT cttd max : : mdc = nd 1 |Cd |[c Cd ] : R(, ) = dD cC mdc tT cttd max ct = c = c (t) , c (t) = arg max cC dD mdc td : c . (voron@forecsys.ru) 40 / 51 41. - 10: Y (, ), y(d) d, Dy D , y Y . 1: p(t|y) = dDy td p(d) : R1() = 1 yY tT ln p(t|y) max . t p(t|y(d)): td ntd 1 td p(d) p(t|y(d)) + . 2: p(t|y) , : R2() = 2 yY tT p(t|y) p(t|y1) max . (voron@forecsys.ru) 41 / 51 42. - : : : ;) (voron@forecsys.ru) 42 / 51 43. - : R(, ) = n i=1 i Ri (, ) : = (i )n i=1 : Regularization Path L1- L2- (Elastic Net) : 1) , 2) , 3) PLSA, 4) , 5) , (voron@forecsys.ru) 43 / 51 44. , , , 1 , , , . 2 , . W T- TD- (voron@forecsys.ru) 44 / 51 45. , , , , : , : 1 , 3 , 4 5 : NIPS (Neural Information Processing System) |D| = 1566 NIPS ; n 2.3 106, |W | 1.3 104. : |D| = 174. (voron@forecsys.ru) 45 / 51 46. , , , : P = exp(L /N) : : [Newman, 2010] : |Wt | = # w : p(t|w) > 0.25 : tWt p(w|t) : 1 |Wt | tWt p(t|w) : Newman D., Lau J.H., Grieser K., Baldwin T. Automatic evaluation of topic coherence // Human Language Technologies, HLT-2010, Pp. 100108. (voron@forecsys.ru) 46 / 51 47. , , , , - ( PLSA, ARTM) 2 000 2 200 2 400 2 600 2 800 3 000 3 200 3 400 3 600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 99.0 99.2 99.4 99.6 99.8 100.0 100.2 100.4 100.6 100.8 101.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 140 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , 0 10 20 30 40 50 60 70 80 90 0 0.2 0.4 0.6 0.8 1.0 1.2 0 0.05 0.10 0.15 0.20 0.25 -10 -100 (voron@forecsys.ru) 47 / 51 48. , , , , - ( PLSA, ARTM) 2 000 2 200 2 400 2 600 2 800 3 000 3 200 3 400 3 600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 20 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 140 160 180 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 , 0 10 20 30 40 50 60 70 80 90 0 0.2 0.4 0.6 0.8 1.0 1.2 0 0.05 0.10 0.15 0.20 0.25 0.30 -10 -100 (voron@forecsys.ru) 48 / 51 49. , , , : ( 98%) () ( ) 50150 () : 10-20 , (voron@forecsys.ru) 49 / 51 50. , , , , (, ) - BigARTM (voron@forecsys.ru) 50 / 51 51. , , , voron@yandex-team.ru www.MachineLearning.ru: :Vokov ( , . . ) . . // , 3, 2014. Vorontsov K. V., Potapenko A. A., Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization // AIST14. Springer. 2014. (voron@forecsys.ru) 51 / 51

Recommended

View more >