gin index utilizes maintenance_work_mem for bulk insert to gin index (CREATE INDEX command). ginInsertEntry() creates tree in memory and when tree exceeds maintenance_work_mem, it flushed to disk. Since 8.4, bulk insert is also used to speedup inserts of new entries (gin fastupdate feature). Btree used in GIN code behaves very bad with presorted data and even limiting of maxdepth didn't help , so we decided to play with red-black tree, which is self-balanced tree.
Test environment: My desktop machine with Linux 2.6.24.3-smp, i686 Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz GenuineIntel, 8 GB RAM,
Pg settings (only relevant): shared_buffers = 512MB work_mem = 32MB maintenance_work_mem = 256MB checkpoint_segments = 16 effective_cache_size = 1GB
1. Original test from http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php
Test data are arrays of length 6 with identical values, which sequentially increased from row to row.
--http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php drop table a; -- insert and create index select array[n,n,n,n,n,n] as ar into a from generate_series(1,1000000) s n; create index arr_gin onsee ginInsertEntry(), a using gin ( ar ); truncate a; drop index arr_gin ; -- insert after index create index arr_gin on a using gin ( ar ); insert into a select array[n, n, n, n, n, n] from generate_series(1,1000000) as n; test=# select * from a limit 2; ar --------------- {1,1,1,1,1,1} {2,2,2,2,2,2}
Logs of 3 runs on HEAD:
DROP TABLE DROP TABLE DROP TABLE Time: 1040.375 ms Time: 1043.168 ms Time: 1040.275 ms SELECT SELECT SELECT Time: 2058.678 ms Time: 1841.830 ms Time: 1809.520 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 6517.479 ms Time: 6492.176 ms Time: 7650.608 ms TRUNCATE TABLE TRUNCATE TABLE TRUNCATE TABLE Time: 75.211 ms Time: 68.449 ms Time: 66.927 ms DROP INDEX DROP INDEX DROP INDEX Time: 2.417 ms Time: 2.627 ms Time: 2.532 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 1.106 ms Time: 1.088 ms Time: 1.140 ms INSERT 0 1000000 INSERT 0 1000000 INSERT 0 1000000 Time: 9600.281 ms Time: 9309.620 ms Time: 8826.133 ms
The same for HEAD+rbtree-0.9
DROP TABLE DROP TABLE DROP TABLE Time: 1039.268 ms Time: 1039.257 ms Time: 34.951 ms SELECT SELECT SELECT Time: 1806.630 ms Time: 2195.750 ms Time: 2632.235 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 9346.265 ms Time: 6877.033 ms Time: 7012.791 ms TRUNCATE TABLE TRUNCATE TABLE TRUNCATE TABLE Time: 56.326 ms Time: 41.171 ms Time: 1225.004 ms DROP INDEX DROP INDEX DROP INDEX Time: 2.514 ms Time: 2.649 ms Time: 2.720 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 1.249 ms Time: 1.696 ms Time: 1.377 ms INSERT 0 1000000 INSERT 0 1000000 INSERT 0 1000000 Time: 14326.864 ms Time: 9293.894 ms Time: 8357.859 ms
The same for HEAD+rbtree-0.10
DROP TABLE DROP TABLE DROP TABLE Time: 1046.992 ms Time: 1039.482 ms Time: 1040.836 ms SELECT SELECT SELECT Time: 2005.670 ms Time: 2092.013 ms Time: 2068.169 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 6750.898 ms Time: 8019.199 ms Time: 7984.485 ms TRUNCATE TABLE TRUNCATE TABLE TRUNCATE TABLE Time: 45.027 ms Time: 53.622 ms Time: 50.384 ms DROP INDEX DROP INDEX DROP INDEX Time: 2.398 ms Time: 2.600 ms Time: 2.557 ms CREATE INDEX CREATE INDEX CREATE INDEX Time: 1.780 ms Time: 1.352 ms Time: 1.534 ms INSERT 0 1000000 INSERT 0 1000000 INSERT 0 1000000 Time: 8783.836 ms Time: 8971.075 ms Time: 8446.531 ms
rbtree is used only in bulk insert into index (ins,idx sequence). We can use timings with inserts (idx,ins) to verify test results.
Summary:
The best results for CREATE INDEX are: 6492.176 (Btree+Hack), 6877.033 (rbtree-0.9), 6750.898 (rbtree-0.10). We see that difference between 6492.176 and 6750.898 is about 4%, which is in the accuracy of experiments.
2. Random array test - 100 000 integer arrays, array length is random between 5 and 1000, cardinality is 500 000. Data: http://www.sai.msu.su/~megera/postgres/files/arr-5-1000-500000-100000.sql.gz
zcat arr-5-1000-500000-100000.sql.gz| psql test create index a_idx on tt using gin(a);
HEAD HEAD+rbtree-0.9 HEAD+rbtree-0.10 idx 97833.943 108109.422 93022.371 94076.822 95377.997 94435.548 92993.065 95167.072 91723.859
Summary:
There is no visible performance difference between HEAD (btree+hack) and HEAD+rbtree-0.10.
3. Real data from Wikipedia links Data: http://www.sai.msu.su/~megera/postgres/files/links2.sql.gz
zcat links2.sql.gz | psql test Indexes will built as follows - btree, gin(idin), gin(idout)
test=# \d links2 Table "public.links2" Column | Type | Modifiers --------+-----------+----------- id | integer | idout | integer[] | idin | integer[] | Indexes: "id_idx_link2" btree (id) "idin_rbtree_idx" gin (idin) "idout_rbtree_idx" gin (idout)
HEAD HEAD+rbtree-0.10 ins 180160.087 165348.671 btree 32624.773 31304.554 idin 9006050.013 767966.869 idout 3535996.914 720310.768
Summary: HEAD+rbtree-0.10 is substantially (more than 10 times) better that HEAD with btree hack !
Links: