This module provides unaccent text search dictionary and function to remove accents from input text.
Unaccent dictionary is a filtering dictionary, i.e. its output is always passed to the next dictionary (if any), contrary to the standard behaviour. Currently, it supports most important accents from european languages. Edit accents.src file (should be UTF-8 encoded) to modify accents.
Compatibility: PostgreSQL version 8.4+
Installation:
cd unaccent && make && make install psql DB_NAME < unaccent.sql
Examples:
1. Unaccent dictionary does nothing and returns NULL. (lexeme 'Hotels' will be passed to the next dictionary if any)
=# select ts_lexize('unaccent','Hotels') is NULL; ?column? ---------- t (1 row)
2. Unaccent dictionary removes accent and returns 'Hotel'. (lexeme 'Hotel' will be passed to the next dictionary if any)
=# select ts_lexize('unaccent','Hôtel') is NULL; ?column? ---------- f (1 row)
3. Simple configuration for french language
CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); ALTER TEXT SEARCH CONFIGURATION fr ALTER MAPPING FOR hword, hword_part, word WITH unaccent, french_stem; =# select to_tsvector('fr','Hôtels de la Mer'); to_tsvector ------------------- 'hotel':1 'mer':4 (1 row) 'Hôtels'-> 'Hotels' -> 'hotel' unaccent french_stem =# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); ?column? ---------- t (1 row) =# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); ts_headline ------------------------ <b>Hôtel</b> de la Mer (1 row)
text unaccent(text) - remove accents in input text
=# select unaccent('Hôtels'); unaccent ---------- Hotels (1 row)