Zen: GinTestUpdate

Testing GIN update performance

For integer arrays GIN is 10x slower than GiST on update. It's about 10x faster than GiST on select. Updating of 100 rows in 100,000 takes < 0.5s for GIN index, which is quite reasonable speed.

Test environment

Linux 2.6.19.1, Slackware, x86, Intel(R) Pentium(R) 4 CPU 2.40GHz, 2Gb RAM.

PostgreSQL 8.2.4

shared_buffers = 256MB
work_mem = 8MB
maintenance_work_mem = 64MB
effective_cache_size = 256MB

Test table tt(id integer, a int[]) with GIN index created

test=# \d tt 
       Table "public.tt"
 Column |   Type    | Modifiers 
--------+-----------+-----------
 id     | integer   | 
 a      | integer[] | 
Indexes:
    "gin_idx" gin (a)

For creating GiST index

create index gist_idx on tt using gist(a gist__intbig_ops);

Load data

Example of populating a table - 10000 rows, with integer arrays cardinality 500 and maximal array length of 100.

sh ./test-id.sh  -l 100 -n 10000 -v 500 | psql test -c "copy tt from stdin with delimiter as '|'"

Example of update SQL - update first 300 rows

update tt set a=array( select(1000*random())::integer+generate_series( 0,200+id) ) 
where id < 300;

Results

Update array column by random array of length 200, vary #rows ( All timings in ms !)

Low cardinality (500)

-l 100 -n 10000 -v 500

Create index:

GIN 2804.709
GiST 1458.526
#updates, GIN, GiST
10 - 28.916, 21.273
100 - 453.513, 28.504
1000 - 2824.042, 159.474
1000 - 27909.337, 2092.837

-l 100 -n 100000 -v 500

Create index:

GIN 7669.875
GiST 1458.526
10 - 94.637, 54.870
100 - 473.569, 69.470
1000 - 2407.273, 243.742
10000 - 30856.287, 2155.403

Big cardinality (5000)

-l 100 -n 10000 -v 5000

Create index:

GIN 2303.915
GiST 1167.952
#updates, GIN, GiST
10 - 40.031, 17.197
100 - 194.322, 26.901
1000 - 2265.735, 172.825
1000 - 32416.623, 2054.121

-l 100 -n 100000 -v 5000

Create index (real):

GIN 9746.955
GiST 13600.256
10 - 158.897, 47.338
100 - 250.516, 65.378
1000 - 2871.439, 425.907
10000 - 31537.869, 2538.468

Very Big cardinality (100000)

Simulate typical news archive - 100,000 documents with length < 500 words and dictionary size 100,000 words.

-l 500 -n 100000 -v 100000

Create index:

GIN 146765.140
GiST 12721.421
#updates, GIN, GiST
10 - 160.569, 72.390
100 - 543.285, 83.026
1000 - 3702.067, 284.039
1000 - 32571.510, 2969.296

Search performance:

explain analyze select * from tt where a @> '{2}';

GiST:

 Bitmap Heap Scan on tt  (cost=29.27..405.87 rows=100 width=979) (actual time=68.714..6311.757 rows=678 loops=1)
   Filter: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gist_idx  (cost=0.00..29.24 rows=100 width=0) (actual time=43.430..43.430 rows=10774 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 6312.807 ms

GIN:

 Bitmap Heap Scan on tt  (cost=150.50..527.11 rows=100 width=982) (actual time=0.610..11.277 rows=678 loops=1)
   Recheck Cond: (a @> '{2}'::integer[])
   ->  Bitmap Index Scan on gin_idx  (cost=0.00..150.48 rows=100 width=0) (actual time=0.433..0.433 rows=678 loops=1)
         Index Cond: (a @> '{2}'::integer[])
 Total runtime: 11.892 ms

GiST ia a way slower than GIN index, because GiST found 10774 documents where only 678 documents are right answer and remaining are false drops !

Support script

Скрипт test-id.sh for generating test load. It's not optimized and could be horribly slow !

#!/bin/sh

# default values
# array length
MAXLEN=4
# number of rows
MAXNUM=10
# cardinality
MAXVAL=100

while getopts l:n:v: opt
do
        case "$opt" in
                l) MAXLEN="$OPTARG";;
                n) MAXNUM="$OPTARG";;
                v) MAXVAL="$OPTARG";;
        esac
done

for ((i=0;i<$MAXNUM;i++)) do
 len=$(( $RANDOM % $MAXLEN))+1
 for ((j=0;j<$len;j++)) do
         if [ $j == '0' ]; then
                val=$(( $RANDOM % $MAXVAL))
         else
                val=$val,$(( $RANDOM % $MAXVAL))
         fi
 done
 echo $i\|{$val}
done

GinTestUpdate

Projects

Links

Testing GIN update performance

Test environment

Load data

Results

Low cardinality (500)

Big cardinality (5000)

Very Big cardinality (100000)

Support script