Update: We wrote a paper GiST programming tutorial, in Russian.
Hellerstein et al. [HNP95] introduced an index structure, called a Generalized Search Tree (GiST), which is a generalized form of an R-tree [Gut84]. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries for specific domain experts not a database one.
GiST was implemented in an early version of PostgreSQL by J. Hellerstein and P.Aoki, more details is available from "The GiST Indexing Project" (http://gist.cs.berkeley.edu/) at Berkeley. As an "university" project it has a limited number of features and was in rare use. Since 7.1 version of PostgreSQL the GiST was taken up by Oleg Bartunov and Teodor Sigaev, see "GiST development" (http://www.sai.msu.su/~postgres/gist) for information about current development.
Current implementation of GiST supports:
(p, ptr)
pairs, where p
is a predicate
(attribute),which is used as a
search key, and ptr
is the pointer to data for a leaf node or
a pointer to another tree node for a non-leaf node.
GiST has the following properties:
min
and max
index entries unless it is the root
(p,ptr)
in a leaf node, p
is true when
instantiated with the values from the indicated tuple
(i.e., p
holds for the tuple).
(p, ptr)
in a non-leaf node,
p
is true when instantiated with the values of any tuple
reachable from ptr
.
In principle, the keys of a GiST may be any arbitrary predicates that hold for each datum below the key. In practice, the keys come from a user-defined object class, which provides a set of six methods required by the GiST:
E = (p, ptr)
, and a query predicate q
, returns
false if p ^ q
can be guaranteed unsatisfiable, and true otherwise.
(p1,ptr1), ... (pn,ptrn)
, returns
some predicate r
that holds for all tuples stored below ptr1
through ptrn
.
E = (p, ptr)
, returns an entry (pp, ptr)
where pp
is a compressed representation of p
.
E = (pp, ptr)
, returns an
entry (r, ptr)
where r
is a decompressed representation
of pp
.
E1 = (p1, ptr1), E2 = (p2, ptr2)
, returns
a domain specific penalty for inserting E2
into the subtree
rooted at E1
, which is used to aid the splitting process of the
insertion operation.
P
of M+1
entries (p, ptr)
,
splits P
into two sets of entries P1
and P2
, each of size kM
, where k
is the minimum fill factor.
The three methods of GiST provide algorithms for search, insertion and deletion operations:
(p, ptr)
and, if required, maintains the
balance of the tree using method Union (as in INSERT).
Literature: