Вторая PostgreSQL-встреча: полнотекстовый поиск. Часть 1

  • Published on
    11-Apr-2015

  • View
    1.964

  • Download
    3

Embed Size (px)

DESCRIPTION

PostgreSQL-: . 1: PostgreSQL 8.3, ( )

Transcript

<p> P</p> <p> PostgreSQL 8.3</p> <p> , </p> <p>2008</p> <p> - Prerequisites</p> <p> PostgreSQL 8.3+ ( contrib/tsearch2) contrib/pg_trgm PgSQL</p> <p> , </p> <p>2008</p> <p> P , RDBMS (ACID) , (online) /</p> <p> , </p> <p>2008</p> <p>id --Title -----Abstract ----------Keywords ------------Body</p> <p>Did ---Aid ----</p> <p>id -Author ---------</p> <p>Title || Abstract || Keywords || Body || Author</p> <p> , </p> <p>2008</p> <p>tsvector tsquery (&amp;, |, !) tsquery @@ tsvector</p> <p> , </p> <p>2008</p> <p> tsvector</p> <p>':1 :2 :3 :4'::tsvector ':1 :2 :3'::tsvector to_tsvector([CFG,], TEXT):select to_tsvector('simple', ' '); to_tsvector --------------------------------'':3 '':2 '':1</p> <p> , </p> <p>2008</p> <p> tsquery</p> <p>' &amp; '::tsquery to_tsquery([CFG,] TEXT):select to_tsquery('simple', ' &amp; '); to_tsquery ----------------'' &amp; ''</p> <p>plainto_tsquery([CFG,] TEXT):select plainto_tsquery('simple', ' '); plainto_tsquery ----------------'' &amp; ''</p> <p> , </p> <p>2008</p> <p>tsvector @@ tsquery</p> <p># select to_tsvector('simple', ' ') @@ to_tsquery('simple', ' &amp; '); ?column? ---------t # select count(*) from posts where fts @@ plainto_tsquery(' ?')</p> <p> , </p> <p>2008</p> <p> posts ( , ) tsvector ! ?!</p> <p> , </p> <p>2008</p> <p> - </p> <p>GIN</p> <p>GIST </p> <p> , </p> <p>2008</p> <p> ?</p> <p> # \dFp+</p> <p> , </p> <p>2008</p> <p>to_tsvector()DOCUMENT PARSER (token, token_type) dicts(token_type) YES i=0 ask DICT[i] YES IS STOP ? NO NO NO YES i=i+1 tsvector2008</p> <p>i /usr/local/pgsql/share/tsearch_data/ru_ru.affix % iconv -f koi8-ru -t utf-8 &lt; ru_RU.dic &gt; /usr/local/pgsql/share/tsearch_data/ru_ru.dict</p> <p> , </p> <p>2008</p> <p>Ispell - </p> <p># CREATE TEXT SEARCH DICTIONARY ru_ispell ( TEMPLATE = ispell, DictFile = ru_ru, AffFile = ru_ru, StopWords = russian ); # CREATE TEXT SEARCH CONFIGURATION ru (COPY = russian); # ALTER TEXT SEARCH CONFIGURATION ru ALTER MAPPING FOR hword, hword_part, word WITH ru_ispell, russian_stem;</p> <p> , </p> <p>2008</p> <p>?# SELECT to_tsvector('ru', ' '); to_tsvector ----------------------------------'':3 '':1 '':2 # SELECT to_tsvector('ru', ' '); to_tsvector -------------'':1,3 # SELECT to_tsvector('ru', ' '); to_tsvector --------------'':1,2 # SELECT to_tsvector('ru', ' '); to_tsvector ---------------------------':2 '':1 , 2008</p> <p>\dF \dFd \dFt \dFp ts_debug() ts_lexize() ts_parse()</p> <p> , </p> <p>2008</p> <p>ts_rank([real[],] tsvector, tsquery[, int4] ) ts_rank_cd([real[],] tsvector, tsquery [, int4])</p> <p>4 A,B,C,D (default):SELECT setweight(to_tsvector('ru', ''), 'A') || to_tsvector('ru', ' '); SELECT id, ts_rank_cd(fts, q) AS r FROM posts, plainto_tsquery('ru', '') AS q WHERE fts @@ q ORDER BY r DESC LIMIT 10;</p> <p> , </p> <p>2008</p> <p>Headline/ ts_headline([CFG,] TEXT, TSQUERY [, OPTS])# SELECT nextval, ts_headline('ru', msg_text, q), ts_rank_cd(fts, q) AS r FROM posts, plainto_tsquery('ru', ' ') AS q WHERE fts @@ q ORDER BY r DESC LIMIT 10;</p> <p>:</p> <p>StartSel, StopSel MaxWords, MinWords ShortWord HighlightAll</p> <p> , </p> <p>2008</p> <p>Synonym Thesaurus 3.14159265358979323846 =&gt; 3.14 Regex 2008</p> <p> , </p> <p> - 1</p> <p>default_text_search_config tsvector_update_trigger tsvector_update_trigger_column tsvector || tsvector setweight(), strip() tsquery &amp;&amp; tsquery tsquery || tsquery !! tsquery numnode(tsquery), length(tsvector), querytree(tsquery)2008</p> <p> , </p> <p> - 2</p> <p>SELECT id, ts_headline('ru', msg_text, q), r FROM ( SELECT *, ts_rank_cd(fts, q) AS r FROM posts, plainto_tsquery('ru', ' ') AS q WHERE fts @@ q ORDER BY r DESC LIMIT 10 ) AS res; SELECT ... @@ to_tsquery(':AB'); UPDATE tbl SET fts= setweight( coalesce( to_tsvector(title),''),'B') || setweight( coalesce( to_tsvector(keywords),''),'A') || setweight( coalesce( to_tsvector(body),''),'D'); (tsvector || NULL) =&gt; NULL , 2008</p> <p> - 3</p> <p>SELECT * FROM ts_stat('SELECT fts FROM posts') ORDER BY ndoc DESC; To be or not to be. ! - . c , .</p> <p> , </p> <p>2008</p> <p> - 4</p> <p> . ? Query rewriting: - New York =&gt; Gottham, NYC, Big Apple - =&gt; ? strip()</p> <p> , </p> <p>2008</p> <p> - 5</p> <p>VLDB (Very Large Database)</p> <p>Partitioning, sharding ( , !) GIN GiST gin_fuzzy_search_limit</p> <p> , </p> <p>2008</p> <p> - 6</p> <p> ''::tsquery to_tsquery() Y Y N Y</p> <p> Y Y</p> <p>plain_to_tsquery()</p> <p>N</p> <p>Y</p> <p>N</p> <p> to_tsquery ||, &amp;&amp; !! , 2008</p> <p>pg_trgm - </p> <p>% cd contrib/pg_trgm % gmake &amp;&amp; su -c 'gmake install' &amp;&amp; gmake installcheck % psql DB &lt; /usr/local/pgsql/share/contrib/pg_trgm.sql</p> <p> , </p> <p>2008</p> <p>pg_trgm# SELECT show_trgm(' '); show_trgm -------------------------------------------------------------{" ",,,," ",,,," "," "," "," "} # SELECT similarity(' ' , ' '); similarity -----------0.352941 # SELECT show_limit(); show_limit -----------0.3 # SELECT ' ' % ' '; ?column? ---------t , 2008</p> <p>pg_trgm</p> <p># SELECT *, similarity(name, ' ') AS s FROM keywords WHERE name % ' ' ORDER BY s DESC; id | name | s ---------+---------------+---------6386 | | 0.642857 6388 | | 0.642857 2927 | | 0.375 1428 | | 0.315789 GIN GiST , 2008</p> <p>http://www.postgresql.org/docs/8.3/static/ http://www.sai.msu.su/~megera/postgres/ http://www.sigaev.ru/</p> <p> , </p> <p>2008</p> <p> ?</p> <p> , </p> <p>2008</p>