Ch1 - Kho DL Va Khai Pha DL

  • Published on
    06-Aug-2015

  • View
    28

  • Download
    0

Embed Size (px)

Transcript

<p>KHAI PH D LIUChng 1: TO KHO D LIU &amp; KHAI PH D LIUIntroduction to Data Warehousing &amp; Data Mining</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p>1. Tng quan 2. To kho d liu 3. H tr quyt nh &amp; X l phn tch trc tuyn (OLAP) 4. Khai ph d liu</p> <p>Page 2</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU D liu (Data), Thng tin (Information), Tri thc (Knowlegde) D liu l tp cc s kin th v chng c t chc cc dng logic. Thnh phn nh nht ca d liu c tha nhn bi my tnh l cc k t n, v d: ch A, s 1, k t *Mt k t c biu din bi 8 bt. Cc bits thng c s dng o thng tin. Tri thc c xem nh l cc thng tin tch hp, bao gm cc s kin v mi quan h gia chng. Tri thc c th c coi l d liu mc cao ca s tru tng v tng qut. Khm ph tri thc hay pht hin tri thc l mt quy trnh nhn bit cc mu hoc cc m hnh trong d liu vi cc tnh nng: Phn tch, tng hp, hp thc, kh ch v c th hiu c.</p> <p>Page 3</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> To kho d liu (Data Warehousing)Mt qu trnh chuyn i d liu thnh thng tin v lm cho n c sn cho ngi dng mt cch kp thi, to s khc bit [Forrester Research, 4/1996]</p> <p>Page 4</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p>Kho d liu (Data Warehouse) l g? W.H.Inmon: Mt kho d liu l mt tp hp d liu tch hp hng ch c tnh n nh, cp nht theo thi gian nhm h tr cho vic ra quyt nh.</p> <p>Mt kho d liu bao gm: Mt hoc nhiu cng c chit xut d liu C s d liu tch hp hng ch n nh c tng hp bng cch thit lp cc bng d liu.</p> <p>Page 5</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Mc ch ca kho d liu:Mc tiu chnh ca kho d liu : Phi c kh nng p ng mi yu cu v thng tin ca NSD</p> <p>H tr cc nhn vin ca t chc thc hin tt, hiu qu cng vic ca mnh, nh c nhng quyt nh hp l, nhanh v bn c nhiu hng hn, nng sut cao hn, thu c li nhun cao hn, v.v.Gip cho t chc, xc nh, qun l v iu hnh cc d n, cc nghip v mt cch hiu qu v chnh xc. Tch hp d liu v cc siu d liu t nhiu ngun khc nhau</p> <p>Page 6</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Cc gii php Kho d liu t mc cho Nng cao cht lng d liu bng cc phng php lm sch v tinh lc d liu theo nhng hng ch nht nh o Tng hp v kt ni d liu o ng b ho cc ngun d liu vi DW o Phn nh v ng nht cc h qun tr c s d liu tc nghip nh l cc cng c chun phc v cho DW.</p> <p>o Qun l siu d liuo Cung cp thng tin c tch hp, tm tt hoc c lin kt, t chc theo cc ch o Dng trong cc h thng h tr quyt nh (Decision suport system DSS), cc h thng thng tin tc nghip hoc h tr cho cc truy vn c bit.Page 7</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Thuc tnh ca kho d liu:Tnh tch hp (Integration)</p> <p>D liu gn thi gian v c tnh lch s D liu c tnh n nh (nonvolatility) D liu khng bin ng D liu tng hp</p> <p>Page 8</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p>Page 9</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Kho d liu bao gm 7 thnh phn: D liu ngun v cc cng c chit xut, lm sch v chuyn i d liu. Kho siu d liu (MetaData)</p> <p> Cc k thut to lp kho Kho d liu theo ch (Data marts): Vi cc kho d liu ny, c th tng hp thnh mt kho d liu thng minh. Ngc li, mt kho d liu c th c phn tch thnh nhiu kho d liu thng minh.</p> <p> Cc cng c truy vn (query), bo co (reporting), phn tch trc tuyn (OLAP) v khai ph d liu (data mining) l cc k thut khai thc kho d liu em li nhng tri thc.. Qun tr kho d liu.</p> <p> H thng phn phi thng tin.</p> <p>Page 10</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIUKho d liu l CSDL rt ln35%</p> <p>30%25% Respondents 20% 15% 10% Initial 5% 0%</p> <p>Projected 2Q96Source: META Group, Inc.</p> <p>5GBPage 11</p> <p>10-19GB5-9GB</p> <p>50-99GB</p> <p>250-499GB500GB-1TB</p> <p>20-49GB</p> <p>100-249GB</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Terabytes -- 10^12 bytes: Petabytes -- 10^15 bytes: Exabytes -- 10^18 bytes: Zettabytes -- 10^21 bytes: Zottabytes -- 10^24 bytes:</p> <p>Walmart -- 24 Terabytes Geographic Information Systems National Medical Records Weather images Intelligence Agency Videos</p> <p>Page 12</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIUS khc bit gia cc h thao tc CSDL &amp; cc h thng tin</p> <p>c trngc im</p> <p>Thao tc CSDLX l thao tc</p> <p>H thng tinX l thng tin</p> <p>HngNgi dng</p> <p>Giao dchNhn vin, qun tr CSDL, chuyn vin CSDL</p> <p>Phn tchNgi qun l, phn tch vin, ngi iu hnh</p> <p>Chc nngData Khung nhn Thit k CSDL n vPage 13</p> <p>Thao tc hng ngyHin hnh Chi tit, t quan h Hng ng dng Giao dch .gin, ngn c/Ghi</p> <p>H tr quyt nhMang tnh lch s (lu di) Tng hp, a chiu Hng ch (Subject) Truy vn phc tp Hu nh ch c</p> <p>Truy cp</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIUS khc bit gia cc h thao tc CSDL &amp; cc h thng tin</p> <p>c trngCh trng S lng bn ghi truy cp S lng ngi dng Kch thc d liu u im (Priority) o (Metric)</p> <p>Thao tc CSDLD liu vo Bi s ca 10 Hng ngn 100MB n GB</p> <p>H thng tinThng tin ra Bi s ca triu Hng trm 100 GB n TB</p> <p>Hiu nng cao, tnh sn Linh ng cao, ngi sng cao s dng ch ng Tc x l giao dch Tc truy vn</p> <p>Page 14</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> To kho d liu:Thc hin cc k thut hp nht v qun l d liu t nhiu ngun khc nhau. Mc ch tr li cc cu hi tc nghip, h tr cho cc quyt nh, m trc khng th thc hin c. Mt CSDL h tr quyt nh c to lp v duy tr ring bit vi c s d liu hot ng ca mt t chc</p> <p>Page 15</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> Khai thc kho d liu theo 3 cch chnh:1. Khai thc truyn thng Truy vn, bo co.. D liu tinh 2. X l phn tch trc tuyn (OLAP) Phn tch, kim nh gi thuyt, cha a c cc gi thuyt 3. Khai ph d liu To d liu tri thc</p> <p>Page 16</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> X L PHN TCH TRC TUYN (OLAP) H tr Quyt nh chuyn su 04 c im chnh</p> <p> Phn tch d liu a chiu H tr c s d liu tin tin Giao din d dng cho ngi s dng</p> <p> H tr kin trc Client / Server D liu trong kho d liu c th hin di dng a chiu (Multi Dimension) gi l khi (cube). Mi chiu m t mt c trng no ca d liu.Page 17</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU CC K THUT PHN TCH D LIU A CHIU Cc chc nng biu din d liu tin tino ha 3-D, Pivot Tables, Crosstabs. o Tng thch vi Spreadsheets v gi thng k o Tng hp d liu tin tin, cng c v phn loi trn kch thc thi gian o Cc chc nng tnh ton nng cao o Chc nng m hnh ha d liu tin tin</p> <p> H TR CSDL TIN TIN Cc c trng ca x l CSDL tin tino Truy cp nhiu loi ca DBMS, cc tp tin nn (flat), v cc d liu trong &amp; ngoi h thng o Truy cp vo kho d liu tng hp.</p> <p>o nh hng D liu tin tin (drill downs v roll-ups)o C kh nng nh x yu cu ngi s dng n cc ngun d liu thch hp o H tr c s d liu rt lnPage 18</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU GIAO DIN D DNG CHO NGIS DNG o Giao din ha o C nhiu tin ch truy xut d liu d dng CU TRC CLIENT/SERVER oLm nn tng thit k, ci t, pht trin cho nhiu h thng mi oChia h thng OLAP thnh nhiu thnh phn c nh kin trc:Trn cng mt my</p> <p>Phn tn trn nhiu my</p> <p>Page 19</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU KIN TRC CA OLAP 03 thnh phn chnh: Giao din ha (GUI) Phn tch d liu logic X l d liu logic</p> <p> OLAP QUAN H (Relational OLAP) X l phn tch trc tuyn quan h (Relational Online Analytical Processing) OLAP s dng CSDL quan h v h cc cng c truy vn lu tr v phn tch d liu a chiu H tr lc CSDL a chiu</p> <p> C truy vn v ngn ng truy xut d liu hiu nng H tr CSDL lnPage 20</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU H TR LC CSDL A CHIU D liu h tr quyt nh liu c xu hng c o Khng chun ha (Nonnormalized) oTrng lp oTng hp (Preaggregate) Cc m hnh d liu s dng trong OLAP M hnh dng sao (Star Schema) M hnh chm sao s kin (Fact Constellation Schema) M hnh bng tuyt (Snowflake Schema) Thit k k thut c bit cho biu din d liu a chiu Ti u ha hot ng truy vn d liu thay v d liu cp nht hot ng</p> <p>Page 21</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU</p> <p> M HNH SAO -Thit k chuyn bit biu din d liu a chiu - Ti u ha cc thao tc truy vn d liu thay cho cc thao tc cp nht d liu - nh x d liu h tr quyt nh vo m hnh d liu quan h 4 thnh phn S kin (Facts) Chiu (Dimensions) Thuc tnh (Attributes) Phn cp thuc tnh (Attribute Hierarchies)Page 22</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU S KIN (Facts) o (gi tr) s biu din cho mt kha cnh kinh doanh hoc mt hot ng c th Lu tr trong mt bng s kin ti trung tm ca m hnh sao Cha cc s kin c lin kt vi cc chiu ca chng C th c tnh ton hoc c suy dn lc thc hin Cp nht nh k vi cc d liu t cc thao tc c s d liu Bng s kin (Fact Table): dng theo di cc bin ng ca d liu, cu trc ca Fact table gm cc kha ngoi l cc kha chnh ca cc bng chiu (Dimension table). o (Measure): L i lng c th tnh ton c trn cc thuc tnh ca fact table.</p> <p>Page 23</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU CHIU (Dimensions) Mi chiu m t mt c trng no ca d liu. Dimension Table l cc bng m t cc c trng ca cc chiu nh chiu thi gian, chiu khch hng, chiu hng ha,</p> <p>Page 24</p> <p>Chng 1: TO KHO D LIU &amp; KHAI PH D LIU THUC TNH (Attributes) Cc bng chiu cha cc thuc tnh Cc thuc tnh c s dng nghin cu, lc v phn lp cc s kin. Chiu m t cc c trng ca cc s kin thng qua cc thuc tnh. Khng c hn ch v mt ton hc v s lng chiu (3-D c m hnh ha d dng)</p> <p> PHN CP THUC TNH (Attribute Hierarchies) Khi nim ny m t s phn cp th bc (mc chi tit ca d liu). V d i vi chiu thi gian, ta c thc bc nh sau: day</p>