Earth Sciences data capture background

The Seismology group within the ANU Research School of Earth Sciences (RSES) runs an extensive monitoring and data collection program based on instruments located in various locations around Australia. The instruments log data in instrument specific formats.

Data is typically recorded on SDcards and retrieved manually. The memory cards are returned to Canberra for further processing. Data is downloaded from these cards and converted to the MiniSEED format in common use within seismology. Data is then submitted to the Incorporated Research Institutions for Seismology (IRIS) repository.

While the full SEED specification files, in common with many data formats, contain both data and an embedded metadata payload, MINISeed files are data only. Metadata is normally stored in dataless SEED format files; that is, SEED format files that contain the metadata payload.

Data is currently stored locally on a Unix server within RSES. The ANU is currently developing a comprehensive data management strategy and the location of the data may be subject to change.

The amount of data held is currently in the range of 5-10TB with a growth of approximately 1TB per annum. Some of the data is under embargo and may only be released after the embargo expiry date. Publicly available non-embargoed metadata be made available both via ANDS and the IRIS collection. It is currently envisaged that approximately 50% of the data will not be subject to embargo.

Collection metadata in RIF-CS format will be made available via Research Data Australia (RDA) to provide a starting point for the development of an online collection of Australian seismological data.

The Seismology group within the Research School of Earth Sciences (RSES) runs an extensive monitoring and data collection program based on instruments located in various remote locations around Australia.

Approximately 60 instruments are involved. The majority of these are Güralp CMG models (principally C3E and T3E) although a small number of Streckeisen instruments are also used.

 

 This operates as a national facility for portable seismic instrumentation that see up to 100 instruments of four different types in the field at a time with 3-components of ground motion sampled at least 25 samples/sec and hence generating several tens of Gigabytes per instrument per month in various instrument specific formats.

The instruments are typically left unattended for long periods of time in the desert. Data is typically recorded on SDcards and retrieved manually. The memory cards are returned to Canberra for further processing. Much of the metadata (including GPS) comes from the instruments, with some added manually later on during the ingest workflow.

The data is converted to the MiniSEED format in common use within seismology and submitted to the IRIS repository.  While the full SEED specification files, in common with many data formats, contains both data and an embedded metadata payload, MINISeed files are data only. Metadata is normally stored in dataless SEED format files, ie SEED format files that contain only the metadata payload. The project will enhance this workflow to utilise the full SEED specification, and will also investigate making the data available using an XML version of the SEED format.

Some of the data is under embargo and may only be released after the embargo expiry date. Publicly available non-embargoed metadata will be made available both via RDA and the IRIS collection. The project will automate this process, allowing for the application of appropriate embargo periods.

The project will make Collection, Activity and Party metadata in RIF-CS format available to RDA (and also Service metadata if appropriate), to provide a starting point for the development of an online collection of Australian seismological data.