Last updated: 17 Dec, 2003
Standards for Ocean Profile Data Management and Exchange
Introduction
This project was formulated as a result of a DFO Science study that looked at how different kinds of data were managed in the sector. One of the obvious points was that though many regions handle the same data, they do so in different ways. These differences can be complementary or redundant. It was suggested that a study that would look in more detail at data management and exchange from the point of view of documenting current practices and attempting to converge to standard practices as appropriate. Such a task can be a very large job considering the diverse types of data handled by Science. In an effort to focus the problem, we chose to limit the details to dealing with data collected as ocean profiles.
The proposal for the work is provided in annex 1. It describes five areas to treat and these form the five major sections of this report.
Participants are mindful that Canada is not alone in the management of ocean data and we wanted to be sure that this project took appropriate steps to ensure that international standards, where they existed, were considered. This was done only in the section on QC standards. It was deemed not useful to examine the variety of formats nor variables nor exchange practices of international bodies. We chose to keep our work focused on matters internal to Canada with due regard to how data exchange with international partners would be impacted.
Formats
This was the first area tackled and was done by surveying the formats of profile data that are routinely sent to MEDS. Initially we present a brief summary of the structure of each format, then we perform an intercomparison of these formats. A sample of each format is provided in annexes. Following this, we compare each format against the content that was built for the xml bricks study. A brief introduction to xml bricks is given, followed by tables which compare the content of profile formats and bricks.
Comparing contents of profiles and xml bricks
QC Standards
This was the next major area in the standards project, and was done by examining QC documentation from different countries and organizations. A web page was created that organized all the various links to documentation into a table. This page was then used to compile documentation explaining what each place does for quality control. This documentation includes tables which compare the QC procedures of various countries to those for GTSPP.
Evaluation and comparison of QC procedures
Duplicates Handling
The issue of duplicates handling was approached by building on the preliminary analysis conducted by Sencial, Kennedy, and Benjamin. The issue of how to identify duplications was the first area covered, and it was followed by a description of how MEDS deals with duplicates. A section describing an alternate way of handling duplicates was also written.
Common Data Dictionary
Data dictionary (xls file) (if you are asked for a password click "cancel")
Ensuring timely and complete archives
Current exchange practices between MEDS and other organizations were documented to allow for comparison. For each organization, a list of questions concerning exchange practices were answered. The answers to these questions allow for the identification of weaknesses in maintaining timely and complete archives at MEDS.
Documentation of exchange practices
Annex 1: Project Proposal
Project Proposal
Bob Keeley - ISDM
Joe Linguanti - IOS
Bernard Pelchat – IML
Robert Nowlen - Gulf
John O'Neill – BIO
Dave Senciall - NWAFC
Electronic data exchange is an every day procedure. At present, all regions of DFO have their own techniques for processing, storing and exchanging data with each other. This diversity creates problems in differing data and information content being collected, stored and exchanged. The objective of this project is to standardize all aspects of data and information management and exchange procedures starting with ocean profile data. With experience, we intend to broaden the coverage of these standards to other data held by DFO Science.
MEDS acts as the designated archive for a number of data types including ocean profile data. A significant portion of the workload associated with this is consumed by dealing with the different formats of data received from contributors. Similar problems are faced by all Institutes receiving data. In developing standards for ocean profile data we will be reducing the format handling overheads but more importantly standardizing the data and information content. All this will improve the effectiveness of managing data within DFO Science.
A project of standardization requires a methodical approach to ensure agreement is achieved as work progresses. The major aspects to be addressed are listed here, with greater detail provided in the Technical Description section.
Developing standards for all aspects of handing data within DFO Science is a large undertaking. We will approach this problem by starting in an area that is familiar and already has good cooperation between regions. Dealing with ocean profile data is a suitable choice because it meets these criteria. Many of the regions and MEDS have a long history of managing and exchanging these types of data and the problems associated with standardization are fairly well understood. There is also some complimentary work associated with setting standards for biological and chemical discrete samples that is very relevant to this project. We intend to build on and supplement the activities of that project.
The BioChem project is a multi-region project to handle plankton and water sample data (at present) within a single relational data base structure. To do this, it is recognized that all regions must reach agreement on data base design and content. The design part is mostly set now and participants are coming to address questions of content. At a recent BioChem meeting, three sub projects were initiated to start the process. Each of these is relevant to the goal of developing standards.
The first sub project was to poll participants and develop a draft on common quality control procedures for water sample data. That draft is complete and will form the basis of the standard. The present draft does not discuss details of how climatologies should be produced or used as well as other unresolved issues. This draft is a good start. There remains to resolve the outstanding questions.
The second sub project was to address management of duplicates. There is a draft document that describes various facets of this issue and has been circulated for discussion. Resolution of the questions raised is needed.
A third sub project has been to poll participants to develop a common data dictionary. A first form of this has been circulated. Some Institutes use codes based on GF-3 while others do not. We must come to agreement on whether a single data dictionary is possible, or whether multiple dictionaries with one to one translators are the appropriate strategy.
This third sub project is closely related to another project that was carried out last fiscal year concerning the use of xml for exchanging ocean profile data. BIO, MEDS and IOS were involved in this project, funded by the SSF. A report is in final preparation and will be available soon. A strategy was developed for dealing with the varied ways the three partners referred to both data and metadata in their archive and processing systems. Developing a data dictionary was one aspect to the project.
A larger part of the xml project was to develop an xml file structure that reflects the inherent structures of ocean profile data and metadata. The project was able to develop the xml so that working software was written to both write a common xml structure from native archive formats of partners, and read an xml file and translate to the native formats. We intend to build on this xml project in the work of developing standards by including the exchange of data in xml as a fourth sub project to the Standards Proposal.
The above elements combine to impact how effectively a designated data centre can ensure their data holdings are up-to-date compared to data providers. The solutions to each of these sub projects must have practical application to ensuring timely and effective data exchange. We propose a fifth sub project that will look at the issue of improving timeliness and completeness of designated data centre holdings. It will contribute ideas to the other four sub projects and may develop standards beyond those considered in the other sub projects.
This project must do more than develop the standards that we think can be used. It must, in fact, start to implement these standards as a way to test our ideas and to be sure that we reap the rewards that standardization will provide. We must therefore ensure that the project has both a development and implementation stage.
The development stage is already well underway for identifying the issues relating to quality control procedures, duplicates management, a data dictionary and a data exchange structure. We must continue to pursue this development with all partners agreeing to a common standard for each of these issues.
We propose to use the work initiated by BioChem and to meet the standards definitions requirements for that project. The difference between this project and the work associated with BioChem is not the standards per se, but the manner in which the standards are implemented. The traditional way in which profile data has been handled has led to a much greater degree of autonomy among the regions so implementation of agrred upon standards will have its own special challenges.
Details of sub-projects
Each participant will need to address this as part of the implementation phase.
The results of these discussions will be documents that specify the standards to which each data provider will adhere in their own processing systems and when exchanging data.
Each participant will develop an annex to the standards document described above. In the annex they will provide details of the steps to be taken to implement the standards. There will be resource implications to this work and so each annex will need to estimate what these will be.
|
Item |
2003/04 |
|
|
Contract |
In Kind (days) |
|
|
MEDS: student |
10K |
|
|
MEDS: host meeting of participants |
10K |
|
|
MEDS: contributions to design |
20 |
|
|
BIO: contributions to design |
20 |
|
|
BIO: build standard QC tests |
5K |
|
|
IML: implementation of some agreed standards |
5K |
|
|
IML: contributions to design |
20 |
|
|
NWAFC: student |
10K |
|
|
NWAFC: contributions to design and implementation |
25 |
|
|
Gulf:: contributions to design |
10 |
|
|
Pacific: Documentation by contractor |
5K |
|
|
Pacific: contributions to design |
20 |
|
|
Pacific: Implementation of agreed standards |
5K |
|
|
Total |
||