Automatic summarization

  • Published on
    06-Jul-2015

  • View
    5.990

  • Download
    1

Embed Size (px)

Transcript

  • 1. @hitoshi_ni2014/01/271

2. 2 . . 213. 2013/09/13 . . 91 . 2013/11/292014/01/272 3. 2014/01/273 4. 2014/01/274 5. 1. 2. 3. 2014/01/275 6. 2014/01/276 7. (DARPA TIDES program)2014/01/277 8. 2014/01/278 9. (Mani01) 2014/01/279 10. 3 2014/01/2710 11. 3 2014/01/2711 12. 2014/01/2712 13. 2014/01/2713 14. 2014/01/2714 15. 4 1. 2. 3. 4. /2014/01/2715 16. 1. or e.g. e.g.2. or e.g. 2014/01/2716 17. 3. or 1 14. or 2014/01/2717 18. 2.3.4.1.2014/01/2718 19. 1. 2. 3. 4. 5. ROUGE 2014/01/2719 20. 2014/01/272 23 20 21. 1. 2. 2014/01/272 2 3 3. 2 4. 23 21 22. 3. 2 2014/01/273a.2 3b. 3c. 22 23. 1. 2. 1. 2014/01/273. 2 4. 23 4. 23 23 24. 1. 4. 23 4. 23 1. 2014/01/27 24 25. 22 2014/01/2725 26. ROUGE (Lin04) n-gram 2 3 2014/01/27 23 4 26 27. 5 1. 2. 3. 4. 5. Amazon Mechanical Turk 2014/01/2727 28. 5 1. 2. 3. 4. 5. ROUGE 2014/01/2728 29. D U S S = arg max f (S; D,U) S arg max f (S;w) S S 2014/01/27 29 30. (Paice+90, Gillick+09) Sequential Labeling (Hirao+10) (Jing00;Clarke+07;No moto+07;Zajic+07), STSG ( Cohn+07;Cohn+08), QSG (Woodsend+10) (Paice+90), SVM (Gillick+09) N/A Dynamic Programing (Cohn+07;Nomoto+07; Cohn+09;Hirao+09), ILP (Clarke+06;Woodsend +10) 2014/01/27 (Jing00) (Clarke+06) CRF (Nomoto+07) Structured SVM (Cohn+07) (Fillatova+04;Yih+07; (Althaus+04) Gillick+09;+08) (Nishikawa+10) (McDonald+07; +09) ( +10) ROUGE (Lin04) ( +06) Pyramid (Nenkova+07) ( +07) Nave Bayes (Kupeic+95), Maximum Entropy (Osborne02) Logistic Regression (Yih+07) SVM (Hirao+02) n-gram (Lin04), Summary Content Unit (Nenkova+07) (Barzilay+02;Okazaki +04) (Lapata+03) SVM (Bollegala+06) (Filatova+04), Greedy (Lapata+03) N/A Stack Decoder A* (Soricut+06) (Yih+07) ILP ILP (McDonald07) (Althaus+04;NIshikaw Lagrange Relaxation a+10) (Nishikawa+12;Almeid a+13;Nishino+13)30 31. (Paice+90, Gillick+09) Sequential Labeling (Fillatova+04;Yih+0 (Hirao+10) 7;Gillick+09; (Jing00;Clarke+07;No +08) moto+07;Zajic+07), STSG (McDonald+07; +09) ( Cohn+07;Cohn+08), QSG (Woodsend+10) ( +10) (Paice+90), SVM (Gillick+09) N/A Dynamic Programing (Cohn+07;Nomoto+0 7; Cohn+09;Hirao+09), ILP (Clarke+06;Woodsen d+10) 2014/01/27 (Jing00) (Clarke+06) CRF (Nomoto+07) Structured SVM (Cohn+07) Nave Bayes (Kupeic+95), Maximum Entropy (Osborne02) Logistic Regression (Yih+07) SVM (Hirao+02) (Althaus+04) (Nishikawa+10) ROUGE (Lin04) (+06) Pyramid (Nenkova+07) ( +07) (Barzilay+02;Okazak i+04) (Lapata+03) SVM (Bollegala+06) n-gram (Lin04), Summary Content Unit (Nenkova+07) (Filatova+04), Greedy (Lapata+03) Stack Decoder A* (Soricut+06) (Yih+07) ILP ILP (McDonald07) (Althaus+04;NIshika Lagrange wa+10) Relaxation (Nishikawa+12;Alme ida+13;Nishino+13)N/A31 32. 1. 2. 3. 2014/01/2732 33. 2014/01/27 33 34. (Luhn58;Edmundson69;Pollock75) (Luhn58;Aone+98) (Furui+04;Lin+09;Xie+09;Higashinaka+10) (Muresan+01;Sandu+10) (Carenini+06;Lerman+09) (Sharifi+10;Takamura+11;+13)2014/01/2734 35. (Barzilay+97;Radev+04) 1. 2. 3. 2014/01/2735 36. Filatova (Filatova+04) 2014/01/2736 37. (McDonald07; +09) n s1, s2, sn l1, l2, ln, L L s 2014/01/2737 38. 2014/01/2738 39. (Filatova+04) 2 2 3 23 4 2014/01/2739 40. 143AB31 2 2014/01/27AA1C22DCBCD10 40 41. 143AB31 3 2014/01/27AA1C22DCBC9 41 42. n m 1 n M = ( c1,1, c1,2, , cn,m-1, cn,m ) w1, , wm l1, l2, ln L L (Filatova+04) (Yih+07) (+08;Gillick+09) 2014/01/2742 43. (+10) 4 2014/01/272 23 43 44. (+10) 4 2014/01/272 23 44 45. (+10) 4 2014/01/272 23 45 46. a b (Dagan+06)2 123255 12 12 325 5 2014/01/2746 47. NP n i j e [0, 1] M = ( e1,2, , en-1,n ) l1, l2, ln, L L 2014/01/2747 48. (Lin+10;Lin+11;Morita+13) (Nishikawa+12;Almeida+13;Nishino+13) 2014/01/2748 49. (Luhn58;Edmundson69) 90 Nave Bayes (Kupiec+95), Maximum Entropy Classifier (Osborne02), SVM (Hirao+02), Logistic Regression (Yih+07) bag-of-words 2014/01/2749 50. Aspect-Polarity (Carenini+06;Lerman+09) Deep learning 2014/01/2750 51. Structured SVM (Takamura+10;Berg-Kirkpatrick+11; Lee+12, Almeida+13) ROUGE MERT (Sandu+10;Xie+10;Lee+13)2014/01/2751 52. (Jing00) (Jing00;Zajic+07) (Clarke+06), (Turner+05) Sequential Labeling (Hirao+10) TurnerandCharniak:Supervisedand unsupervisedlearningforsentence compression.ACL2005.2014/01/2752 53. 6 N700A568 2014/01/2753 54. N700A41 2014/01/2754 55. 102014/01/27 N700A 8 1555 56. tf-idf N700A8 pdep(|,) n-gram pn-gram(|,)2014/01/2715 56 57. T v1, , vn e1, , em w1, , wn c1, , cn l1, l2, ln L 2014/01/2757 58. (Galley+07) (Cohn+09) CohnandLapata: SentenceCompression asTreeTransduction. JAIR34,pp.637674, 2009. (Woodsend+10) STSG syntax-based MT 2014/01/2758 59. CohnandLapata: SentenceCompression asTreeTransduction. JAIR34,pp.637674, 2009.2014/01/2759 60. 1. 23 2. 3. 4 1. 4 2. 3. 23 2014/01/2760 61. 22 3311223 1131232014/01/2761 62. 1. 23 23 2. 3. 4 (Lapata03, Barzilay+05) 2014/01/2762 63. (Althaus+04) n s1, , sn s0 sn+1 M = (c0,1, c0,2, , cn-1,n+1, cn,n+1) 2014/01/2763 64. (Martins+09;+09;Woodsend+10;BergKirkpatrick+11;Woodsend+12;Morita+13) (Almeida+13) (Nishikawa+10;Christensen+13) 2014/01/2764 65. 2014/01/2765 66. Jurafsky and Martins. Speech and Language Processing (2 nd eds.). Prentice Hall, 2008. Mani. Automatic Summarization. John Benjamins Pub co, 2001. Mani and Maybury (eds.). Advances in Automatic Text Summarization. MIT Press, 1999. Nenkova and McKeown. Automatic Summarization. now Publishers Inc., 2011. and . . , 9(4):97 116, 2002. and . . , 2005. (eds.). . . 1989. Sparck-Jones and Endres-Niggemeyer. Automatic Summarizing. Information Processing and Management, 31(5):625630, 1995.2014/01/2766 67. Althaus et al. Computing Locally Coherent Discourse. ACL 2004. Almeida et al. Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning. ACL 2013. Aone et al. Trainable, Scalable Summarization Using Robust NLP and Machine Learning. Coling 1998. Barzilay et al. Using Lexical Chains for Text Summarization. ISTS 1997. Barzilay et al. Inferring Strategies for Sentence Ordering in Multidocument News Summarization. JAIR, 17, 2002. Barzilay et al. Modeling Local Coherence: An Entity-based Approach. ACL 2005 Berg-Kirkpatrick et al. Jointly Learning to Extract and Compress. ACL 2011 Bollegala et al. A Bottom-up Approach to Sentence Ordering for Multi-document Summarization. COLING/ACL 2006. Carenini et al. Multi-document summarization of evaluative text. EACL 2006. Christensen et al. Towards Coherent Multi-Document Summarization. NAACL 2013. Clarke et al. Constraint-based Sentence Compression An Integer Programming Approach. COLING/ACL 2006. Cohn et al. Large Margin Synchronous Generation and its Application to Sentence Compression. EMNLP/CoNLL 2007. Cohn et al. Sentence Compression as Tree Transduction. JAIR, 34, 2009. Dagan et al. The PASCAL Recognising Textual Entailment Challenge. Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. 2006. Edmundson. New Methods in Automatic Extracting. Journal of ACM, 16(2), 1969. Filatova et al. A formal model for information selection in multi-sentence text extraction. COLING 2004. Furui et al. Speech-to-Text and Speech-to-Speech Summarization. IEEE Trans. on Speech and Audio Processing, 12(4), 2004.2014/01/2767 68. Galley et al. Lexicalized Markov Grammars for Sentence Compression. NAACL-HLT 2007. Gillick. Sentence Boundary Detection and the Problem with the U. S. NAACL-HLT 2009. Gillick et al. A Scalable Global Model for Summarization. NAACL-HLT Workshop on ILP for NLP 2009. Higashinaka et al. Improving HMM-based Extractive Summarization for Multi-Domain Contact Center Dialogues. SLT 2010. Hirao et al. Extracting Important Sentences with Support Vector Machines. COLING 2002. et al. . , 47(6), 2006. et al. . , 22(2), 2007. Hirao et al. A Syntax-Free Approach to Japanese Sentence Compression. ACL-IJCNLP 2009. Jing. Sentence reduction for automatic text summarization. ANLP 2000. et al. Twitter . 2013. Kupiec et al. A Trainable Document Summarizer. SIGIR 1995. Lapata. Probabilistic Text Structuring: Experiments with Sentence Ordering. ACL 2003. Lee et al. Unsupervised Domain Adaptation for Spoken Document Summarization with Structured Support Vector Machine. ICASSP 2013. Lerman et al. Sentiment Summarization: Evaluating and Learning User Preferences. EACL 2009. Lin. ROUGE: A Package for Automatic Evaluation of Summaries. ACL Workshop on Text Summarization Branches Out 2004. Lin. Graph-based Submodular Selection for Extractive Summarization. ASRU 2009. Lin. Multi-document Summarization via Budgeted Maximization of Submodular Functions. NAACL 2010. Lin. A Class of Submodular Functions for Document Summarization. ACL 2011. Luhn. The automatic creation of literature abstracts. IBM Journal of Research Development, 2(2). 1958 Martins et al. Summarization with a Joint Model for Sentence Extraction and Compression. NAACL Workshop on ILP for NLP, 2009. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization. ECIR 2007.2014/01/2768 69. Morita et al. Subtree Extractive Summarization via Submodular Maximization. ACL 2013. Muresan et