The problem: Find what keywords are in text. We have papers and kw (keywords). kw.ts is a plainto_tsquery(kw.name).
Table "public.papers" Column | Type | Modifiers -------------------+----------+----------- id | integer | oai_id | text | datestamp | date | title | text | authors | text | description | text | comment | text | creation_date | date | modification_date | date | fts | tsvector | Indexes: "fts_idx" gin (fts) "fts_types_idx" gin (document_token_types(comment)) "id_idx" btree (id) Table "public.kw" Column | Type | Modifiers -----------+-----------------------+----------- key_id | integer | name | character varying(64) | status_id | integer | ts | tsquery | arxiv=# select papers.title, array_agg(kw.name) from papers join kw on papers.fts @@ kw.ts and papers.id=2 group by papers.title; title | array_agg ------------------------------------------+--------------------------------- Sparsity-certifying Graph Decompositions | {color,Williams,trees,colorful} (1 row)
In principle, it's possible to use this approach to find plagiarism.
Other way is to use ts_stat) function, which decomposes tsvector on words.
CREATE OR REPLACE FUNCTION ts_stat(tsvector, weights text, OUT word text, OUT ndoc integer, OUT nentry integer) RETURNS SETOF record AS $$ SELECT ts_stat('SELECT ' || quote_literal( $1::text ) || '::tsvector', quote_literal( $2::text) ); $$ LANGUAGE SQL RETURNS NULL ON NULL INPUT IMMUTABLE; select ARRAY ( select (ts_stat(fts,'*')).word from papers where id=2);