This document describes the new features introduced in PostgreSQL 8.2 development (Apr, 2006). These features and backport release to PostgreSQL 8.1.X was sponsored by the University of Mannheim. Thesaurus dictionary was Funded by Georgia Public Library Service and LibLime, Inc. Gin support was sponsored by jfg:networks (http:www.jfg-networks.com/)
Online version of this document is available from http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2WhatsNew
=# create index fts_idx on titles using gin(fts_index);
UTF-8 support was tested for russian and greek languages. Please, check utf-8 greek test page for more information.
Important steps:
initdb -D /usr/local/pgsql-dev/data.el_utf8 --locale=el_GR.utf8
iconv -f iso-8859-7 -t utf-8 ......
set client_encoding='ISO-8859-7';
As always, don't forget configuration. Below is a setup for greek language we used for testing:
begin; delete from pg_ts_cfg where ts_name = 'utf8'; insert into pg_ts_cfg values('utf8','default','el_GR.utf8'); -- dictionaries DELETE FROM pg_ts_dict WHERE dict_name = 'el_ispell'; INSERT INTO pg_ts_dict (SELECT 'el_ispell', dict_init, 'DictFile="/tmp/greek.utf8/el_GR_utf8.dict",' 'AffFile="/tmp/greek.utf8/el.u8.aff",' dict_lexize FROM pg_ts_dict WHERE dict_name = 'ispell_template'); -- tokens to index delete from pg_ts_cfgmap where ts_name = 'utf8'; insert into pg_ts_cfgmap values('utf8','nlhword','{el_ispell}'); insert into pg_ts_cfgmap values('utf8','nlword','{el_ispell}'); insert into pg_ts_cfgmap values('utf8','nlpart_hword','{el_ispell}'); end;
We don't consider normalization of document weight to 1, since we have no global information about document collection.
Current interface of rank_cd:
rank_cd('{0.1, 0.2, 0.4, 1.0}',tsvector,tsquery,normalization)
normalization is used to select normalization method(s) and is an OR-ed value of following methods:
Example: normalize document rank by logarithm of document length and take into account extents density.
rank_cd('{0.1, 0.2, 0.4, 1.0}',tsvector,tsquery,1|4)
Examples:
contrib_regression=# select rank_cd('{1, 1, 1, 1}','1:1,20 2:2 3:3 4:4','1&2'::tsquery,0); rank_cd --------- 1.05556 (1 row) contrib_regression=# select rank_cd('{1, 1, 1, 1}','1:1,20 2:2 3:3 4:4','1&2'::tsquery,1); rank_cd ---------- 0.589117 (1 row)
Query rewriting is a set of functions and operators for tsquery type.
FUNCTIONS:
Query rewriting is flexible.
OPERATORS:
Operators could be used to speedup query rewriting, for example, filtering non-candidate tuples.
INDEX SUPPORT:
It's possible to create index to speedup operators @, ~.
create index qq on test_tsquery using gist (keyword gist_tp_tsquery_ops);
EXAMPLES: See sql/tsearch2.sql for more examples.
contrib_regression=# \d test_tsquery Table "public.test_tsquery" Column | Type | Modifiers ------------+---------+----------- txtkeyword | text | txtsample | text | keyword | tsquery | sample | tsquery | Indexes: "bt_tsq" UNIQUE, btree (keyword) "qq" gist (keyword)
Replace new york by their synonyms.
contrib_regression=# select rewrite('foo & bar & qq & new & york', 'new & york'::tsquery, 'big & apple | nyc | new & york & city'); rewrite ---------------------------------------------------------------------------------- 'qq' & 'foo' & 'bar' & ( 'city' & 'york' & 'new' | ( 'nyc' | 'apple' & 'big' ) ) (1 row)
Example with synonyms from table:
contrib_regression=# select rewrite('moscow & hotel', 'select keyword, sample from test_tsquery'::text ); rewrite ----------------------------------- ( 'moskva' | 'moscow' ) & 'hotel'
Filtering:
contrib_regression=# select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('default', 'moscow') as query where keyword ~ query; rewrite --------------------- 'moskva' | 'moscow' (1 row)