Fisheries and Oceans Canada
Symbol of the Government of Canada

Quality Control

Quality Control Procedures for Ocean Profile Data

Data quality assurance (QC) at the Integrated Science Data Management (ISDM) is a procedure of verification and validation. To validate the data, the data are reformatted to MEDS internal processing format.  In so doing, the data are checked to be readable and that they can be interpreted.  Data which have format errors in the original source form, or which have invalid values (such as characters where numbers should be, or irrational contents of a parameters such as date/time, or profile depth order, are reviewed and corrected by programmers, by adjusting the reformat procedures to handle these inconsistencies.

Once the data are reformatted, the contents are QC'ed in order to verify that the numbers and codes actually represent physical quantities and that these are reasonable given the location and time of the observation.  ISDM has adopted an approach which combines specialized computer code to organize, and test data values according to common rules with displays of the data as plots of ship tracks and data profiles with selection and editing capabilities to allow trained personnel to review and flag the data, or correct values where obvious.  Typically, results of the QC procedure are the setting of flags or making corrections where data illustrate instrument failures and human errors.

The subtler inaccuracies (such as those caused by instrument noise, or signal processing algorithms), and whether or not the observation is representative of the ambient conditions (by considering errors due to small scale variability, or inherent randomness in convective water flow) are apparent in the ISDM system.  Automated tests have tolerances which allow for these inaccuracies.  The QC technician quite often spots these problems and flag these where values go beyond reasonable bounds.

The Software System

Ocean observations at ISDM, are reformatted, QC'ed, archived and retrieved from a system using the VMS operating system on a network of Digital Electronics VAX minicomputers and work stations.  The programming languages are the DEC command language DCL, and the DEC implementation of an enhanced FORTRAN.  The GKS library is used to generate graphics for the system.

The QC system is a fully automated pipeline of applications and executables capable of handling both real-time and delayed mode data.  The real-time system is used to handle data received primarily through the GTS, and process through to the BATHY and TESAC archive files.  The delayed mode system, which has differences to allow an operator to select an arbitrary file of formatted data, is used for the higher resolution data received well after data collection.

There are three main components to the quality control of ocean profile data.  The first component examines the characteristics of the platforms track looking to identify errors in either position and time.  The second component, examines the various profiles of observations to identify values which appear to be in error.  The third component, is software to identify duplications of profiles either by having received the data more than once, or because data of lower resolution (such as a BATHY message) will arrive, followed later by the XBT cast on which the BATHY message was based.

The Procedure for Checking a Platform's Track

The track QC procedure examines the position and date/time and, in the case of real-time messages, the call sign of the observations. To carry out these checks, the data are ordered by call sign (treated as the cruise number) and within each cruise, by date and time. Each cruise is passed through tests which check that the date is valid (including future and too far in the past), the latitude and longitude are valid, that the station location is not positioned over land (using a bathymetry file with values every 5 minutes of latitude and longitude and an algorithm that accounts for the resolution of the file), and that the inferred speed between stations is reasonable. Each cruise is plotted to show the cruise number, the track (with scales and land for reference) and, the platform speed from station to station (calculated from the time and space differences between stations).

At the same time as these displays are shown, the software tests for possible errors and presents the results directly to the screen. If a test fails, an appropriate error message and a scrollable and editable table of date/time, latitude/longitude, and their flags is shown. A QC technician then examines the plot (and error messages) and undertakes to assign flags or correct values. The interface allows the technician to select stations, edit values, and see the results of their changes, in order to experiment with solutions to find the logical reason (and fix) for the data. With real-time data, stations with different call signs may be merged into a single cruise if this is appropriate due to an erroneous call sign being reported.

The QC Procedure for Profile Checking

The profile checking software automatically tests each station profile, sets flags accordingly, and displays a plot of the profile and error messages for review and flagging by the QC technician. This is carried out in the following stages.  First, a file of stations is opened by the software, and the technician uses a menu to move through this file, station by station. A station is read, and the profiles are identified and tested.

Tests include a group to examine global ranges, bathymetry, single valued profiles and monotonically increasing depths for all known parameter types (e.g. temperature, salinity, oxygen, nitrates, etc.). Next follow a set of statistical tests including regional range, global profile envelops, and a test against the Levitus Climatology.  Other tests look for spikes, for gradients that are too pronounced, for density inversions (when temperature and salinity are present) and for temperature inversions (when only temperature is present).

Flags are set according to the severity of the test failure, based solely on the type of test. The profile is always plotted for examination by the technician.  Where both temperature and salinity are present, both are plotted, and accompanied by a plot of calculated density.  Flags are shown by graphical indicators.  The QC technician examines the plot and sets flags by selecting points and menu items using a mouse.  This interface provides a wide range of functionality, which allows the QC technician to list the station as a text file, to list flags and other specific information such as the climatology, to adjust scales and zoom, or to plot by arbitrary parameters (e.g. T/S Plot), and to show the cruise track, and location of the station in question.

The QC Procedure for Duplicates Checking

Duplicates checking is necessary to identify data which are versions of the same observation.  Exact matches mean that one of the versions has no additional information, is redundant, and is usually deleted.  Two or more data records are often found to be the same observation, but differ in their method of analysis or reporting.  In this case both records are kept, and all but the best one is flagged as a duplicate.  For example, TESAC messages reported in real-time also arrive at ISDM in a much higher resolution form as delayed mode data, and bottle data used to calibrate a particular CTD profile, and the CTD profile itself.

Duplicate handling at ISDM is a mix of software rules and algorithms, and a presentation for review and editing by the QC technician.  The automatic step is carried out for a particular set of data by first determining the date/time range of the data.  All data from the archive in the same range is retrieved and combined with the input data.  The data are read through in date/time order, to select groups of stations which fall into a common time window and area (for MEDS this is 15 minutes of time and 5 km).  Each of these groups is ordered by preference according to data type (CTD, XBT, TESAC, BATHY...), and their originator or institute from which they were delivered.  Then, exact matches are removed, where the characteristics considered are the date/time, latitude/longitude, types and values of all profiles, and instrument type.

Where data are collected within the above defined space and time window but are not exact matches, the subsurface data is compared using an algorithm which selects from each station, a common profile type (e.g. temperature), without distinguishing the instrument type (CTD, XBT...), sets allowable tolerances for comparing the profiles, using the a table of accuracies for instrument type, compares the profiles, depth by depth, using linear interpolated values from the profile with the lower vertical resolution, to the exact depth value of the profile with the higher vertical resolution and returns a ratio of trials to failures for interpretation by the main program rules.  Where duplication is proven, the duplicate checker uses program rules to select the best of the exact or inexact duplicate profiles, based on criteria evaluated in the previous steps.

Sometimes the automatic check can not determine if stations and profiles are duplicates, or if they are, which profile is the most desirable one.  This often happens when data are very close in time and space, but actually different casts of the same instrument.  It also may occur when a data originator delivers a "correction" or updated version of an station after their own revised analysis.  These cases are isolated and reported (or displayed in the delayed mode system), for the QC technician to review and flag through an interactive session.