Yesterday I again was beaten by GIN problem (PostgreSQL 8.4) with sorted data, see my post about rbtree. I tried to index following table, which represents interlinks between wikipedia articles.
wplinks=# \d links2 Table "public.links2" Column | Type | Modifiers --------+-----------+----------- id | integer | idout | integer[] | idin | integer[] |
To my surprise timings for indexes creation were very different - ~ 6x !
wplinks=# vacuum analyze links2; VACUUM Time: 65080.606 ms wplinks=# create index idout_idx on links2 using gin(idout); CREATE INDEX Time: 6033572.537 ms wplinks=# create index idin_idx on links2 using gin(idin); vacuum analyze; CREATE INDEX Time: 35946958.226 ms
Today, I tried rbtree version of GIN and got very nice result - 18x better creation time for idin and almost 4x better than old idout.
wplinks=# create index idout_rbtree_idx on links2 using gin(idout); CREATE INDEX Time: 1560819.955 ms wplinks=# create index idin_rbtree_idx on links2 using gin(idin); CREATE INDEX Time: 1994567.418 ms
Notice, that 8.4 already have Tom's hack to protect GIN against skewed data. Certainly, we need rbtree in PostgreSQL !