Well, I finally decided to abandon galaxy table from our catalogue and to use hyperleda as a reference data for galaxies. This will greatly simplify maintenance of our catalog and make it more transparent. But, I see a problem with support of several reference tables for galaxies. For example, hyperleda is ok for close galaxies, but modern sn surveys look much further and we certainly need other catalogs with faint galaxies. I need to figure out how to modify SQL scheme. Ideally, we want be able to:
CREATE TABLE history ( id bigint PRIMARY KEY, who text, when timestamp without time zone, what text, old ???? );
For now, I use triggers to store modification datetime in mod_date column.
The problem:
It's necessary to maintain catalog with data collected from different (primary) catalogs. We want to find out the best database schema which supports synchronization of secondary catalog with primary one.
It's possible to store every property of a galaxy as a separate row, so we could maintain separate table with references to the source catalog and record id in source catalog.
CREATE TABLE cats ( id integer PRIMARY KEY, name text ); -- a list of properties of galaxy CREATE TABLE properties ( id integer PRIMARY KEY, name text ); CREATE TABLE galaxy ( id integer PRIMARY KEY, name text ); CREATE TABLE galaxy_property ( gid integer REFERENCES galaxy(id), pid integer REFERENCES properties(id), value float, UNIQUE(gid,pid) ); CREATE TABLE map ( gid integer REFERENCES galaxy(id), -- galaxy id pid integer REFERENCES properties(id), -- property id cid integer REFERENCES cats(id), -- catalog id cidid integer, -- id in catalog cid UNIQUE(gid,pid,cid) );
t=# select * from galaxy; id | name ----+------ 1 | N224 t=# select * from galaxy_property where gid=1; gid | pid | value -----+-----+------- 1 | 1 | 0.74 1 | 2 | 41.3 1 | 3 | 0 t=# select * from properties; id | name ----+------ 1 | ra 2 | dec 3 | z t=# select * from cats; id | name ----+------- 1 | LEDA 2 | NED 3 | SDSS 4 | OTHER (4 rows)
Benefits:
Drawbacks:
Instead of maintaining separate row for each attribute, we use custom composite type, which encapsulates knowledge about value and source catalog where this value comes from. See table galaxy_properties instead of galaxy_property'.
CREATE TYPE val_cid as ( value float, cid integer); CREATE TABLE galaxy_properties ( gid integer REFERENCES galaxy(id), ra val_cid, dec val_cid, z val_cid );
Notice, composite type designated as BLOB on picture!
t=# select * from galaxy_properties; gid | ra | dec | z -----+----------+----------+------- 1 | (0.74,1) | (41.3,1) | (0,3) (1 row)
For composite type we should use special syntax to access specific component.
t=# select * from galaxy_properties where (z).cid = 3; gid | ra | dec | z -----+----------+----------+------- 1 | (0.74,1) | (41.3,1) | (0,3) (1 row)
Benefits:
Drawbacks:
Example query - identify source (source catalog, record id in source catalog) of redshift of galaxy with gid = 1.
t=# select g.id, g.name as galaxy, c.name as catalog,m.cidid from galaxy_properties gp, cats c, map m,galaxy g,properties p where (gp.z).cid = c.id and gp.gid = 1 and m.gid = gp.gid and m.cid = c.id and g.id = gp.gid and m.pid=p.id and p.name='z'; id | galaxy | catalog | cidid ----+--------+---------+------- 1 | N224 | SDSS | 178 (1 row)
What if we combine tables galaxy and galaxy_properties ? Then, if assume, that (gid,pid) uniquely connected with (cid,cidid), we could a bit easier identified the source of a specific column.
CREATE TABLE galaxies ( gid integer PRIMARY KEY, name text, ra float, dec float, z float );
t=# select g.gid, g.name as galaxy, c.name as catalog,m.cidid from galaxies g, cats c, map m, properties p where g.gid = 1 and m.gid = g.gid and m.cid = c.id and m.pid=p.id and p.name='z'; gid | galaxy | catalog | cidid -----+--------+---------+------- 1 | N224 | SDSS | 178 (1 row)
Example of update query:
Update coordinates (ra,dec) in table galaxies using information from Leda catalogue (cid=1).
t=# update galaxies set ra=foo.ra,dec=foo.dec from (select l.al2000 as ra, l.de2000 as dec, g.id from galaxies g, map m, leda.m000 l where g.id = m.gid and m.pid=2 and m.cid=1 and m.cidid = l.pgc ) as foo where foo.id = galaxies.id;