# Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory

27-Jan-2016

Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory. Yuru Jiang Rou Song Beijing University of Technology. Punctuation Clause. Example . . c 1 : c 2 : c 3 : - PowerPoint PPT Presentation

• Yuru Jiang Rou Song

Beijing University of Technology

• Example c1: c2: c3: c4:

PClause Sequence

• c1: c2: c3: c4:

t1 t2 t3 t4

What we have done

• Identification ProcessIdentification AlgorithmCTCs Scoring Function

• Example2c1: c2: c3: c4:

t1= c1t2=

• ift1: c2: thent2=

c2CTCs

• t1CTCs of c2Topic Clause of C3C3

• ifCTCs of c2: c3: thent3=

CTCs of c2

• ifone CTC of c2: c3: then one group CTCs of c3 is:

• t1c2CTCsc3CTCs

• How to choose the best path?

• Question1How to calculate the value of each node in the CTC treeCTCs Scoring Function

Question2 How to calculate the path value of each leaf node to the root nodeSum of the node value

• Given a CTC d of PClause c, a topic clause most similar to d is found from the corpus, whose similarity is marked as sim_CT(d). For any two strings x and y, given that their similarity is sim(x,y). sim_CT(d) is defined as

Topic Clause Corpus

• CTset(c) is the CTCs set of c, then the topic clause of c is

Accuracy rate is 0.6499

ReferenceYuru Jiang, Rou Song: Topic Clause Identification Based On Generalized Topic Theory. Journal of Chinese Information Processing. 26(5), (2012)

• Accuracy rate is 0.7625>0.6499>baseline

• Example3d_tcpreA H H C d_c t1A H st1A C H t2A H H C

t_tcpreA B C C t_c tA B C C

• CorpusEvaluation CriteriaExperiment ResultAnalysis

• 202 texts about fish in the Biology volume of China Encyclopedia

15 texts are used for test in the experiment

K-1 test are used

• For N PClauses, if the number of PClauses whose topic clauses are correctly identified is hitN, then the identification accuracy rate is hitN/N.

• Fig. 2. PClause Count and Accuracy Rate for Topic Clause Identification about 15 texts

• CTCs Scoring Function

CTC Tree

Extend to other text

