Automated Data Quality Checks

Deployment Number
Instrument deployment number defined in OOI's Asset Management system.
Preferred Method
The data delivery method selected for review. For uncabled instruments this is always recovered instrument (when available) because this will be the most complete dataset. If recovered instrument is not available, recovered host (from the Data Concentrator Logger) or telemetered is reviewed.
Stream
Data stream name for the preferred method containing science data for review.
Deployment Days
Number of days the instrument was deployed.
File Days
Number of days for which there is at least 1 timestamp available for the instrument.
Start Gap
Number of missing days at the start of a deployment: comparison of the deployment start date to the data start date.
End Gap
Number of missing days at the end of a deployment: comparison of the deployment end date to the data end date.
Gaps Count
Number of gaps within a data file (exclusive of missing data at the beginning and end of a deployment). Gap is defined as >1 day of missing data.
Gap Days
Number of days of missing data within a data file (exclusive of missing data at the beginning and end of a deployment).
Timestamps
Number of timestamps in a data file.
Sampling Rate
Sampling rates are calculated from the differences in timestamps. The most common sampling rate is that which occurs >50%.
Pressure Comparison
Instrument deployment depth defined in OOI's Asset Management system / average (for fixed instruments) or maximum (for mobile instruments) pressure calculated from data file after eliminating data outside of global ranges and outliers (3 standard deviations).
Time Order
Test that timestamps in the file are unique and in ascending order.
Valid Data
For each science variable, the binned percent of data that are not NaNs, fill values, outside global ranges, and outside 5 standard deviations. Bins: 99 = >99%, 95 = 95-99%, 75 = 75-95%, 50 = 50-75%, 25 = 25-50%, 0 = 0-25%. For example, {'99':4, '95':1} means 4 science variables have >99% valid data points, and 1 science variable has between 95-99% valid data points.
Missing Data
Test fails if data are available in another stream from a "non-preferred" delivery method, where the same data are not available in the preferred data stream. Summary provides the number of gaps and days of data that are missing in the preferred dataset that should be available.
Data Comparison
Compare data values with matching timestamps for science variables among all delivery methods.
Missing Coordinates
Check the coordinates in the data file against expected coordinates: obs, time, lat, lon, pressure (for instruments not located on a surface buoy)
Review
The status of the review: Todo = data need to be reviewed, Tested = automated tests are complete, Blocked = automated tests are complete and an issue is preventing completion of the review, In Progress = automated tests are complete and Human In the Loop review is in progress, Complete = automated tests and Human In The Loop reviews are complete.