For integer arrays GIN is 10x slower than GiST on update. It's about 10x faster than GiST on select. Updating of 100 rows in 100,000 takes < 0.5s for GIN index, which is quite reasonable speed.
Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM.
PostgreSQL 8.2.4
Test table tt(id integer, a int[]) with GIN index created
test=# \d tt Table "public.tt" Column | Type | Modifiers --------+-----------+----------- id | integer | a | integer[] | Indexes: "gin_idx" gin (a)
For creating GiST index
create index gist_idx on tt using gist(a gist__intbig_ops);
Example of populating a table - 10000 rows, with integer arrays cardinality 500 and maximal array length of 100.
sh ./test-id.sh -l 100 -n 10000 -v 500 | psql test -c "copy tt from stdin with delimiter as '|'"
Example of update SQL - update first 300 rows
update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) ) where id < 300;
Update array column by random array of length 200, vary #rows ( All timings in ms !)
-l 100 -n 10000 -v 500
Create index:
-l 100 -n 100000 -v 500
Create index:
-l 100 -n 10000 -v 5000
Create index:
-l 100 -n 100000 -v 5000
Create index (real):
Simulate typical news archive - 100,000 documents with length < 500 words and dictionary size 100,000 words.
-l 500 -n 100000 -v 100000
Create index:
Search performance:
explain analyze select * from tt where a @> '{2}';
GiST:
Bitmap Heap Scan on tt (cost=29.27..405.87 rows=100 width=979) (actual time=68.714..6311.757 rows=678 loops=1) Filter: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gist_idx (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 6312.807 ms
GIN:
Bitmap Heap Scan on tt (cost=150.50..527.11 rows=100 width=982) (actual time=0.610..11.277 rows=678 loops=1) Recheck Cond: (a @> '{2}'::integer[]) -> Bitmap Index Scan on gin_idx (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1) Index Cond: (a @> '{2}'::integer[]) Total runtime: 11.892 ms
GiST ia a way slower than GIN index, because GiST found 10774 documents where only 678 documents are right answer and remaining are false drops !
Скрипт test-id.sh for generating test load. It's not optimized and could be horribly slow !
#!/bin/sh # default values # array length MAXLEN=4 # number of rows MAXNUM=10 # cardinality MAXVAL=100 while getopts l:n:v: opt do case "$opt" in l) MAXLEN="$OPTARG";; n) MAXNUM="$OPTARG";; v) MAXVAL="$OPTARG";; esac done for ((i=0;i<$MAXNUM;i++)) do len=$(( $RANDOM % $MAXLEN))+1 for ((j=0;j<$len;j++)) do if [ $j == '0' ]; then val=$(( $RANDOM % $MAXVAL)) else val=$val,$(( $RANDOM % $MAXVAL)) fi done echo $i\|{$val} done