The NCCSV format is designed so that spreadsheet software such as Excel and Google Sheets can import an NCCSV file as a csv file, with all of the information in the spreadsheet's cells ready for editing. Or, a spreadsheet can be created from scratch following the NCCSV conventions. Regardless of the source of the spreadsheet, if it is then exported as a .csv file, it will conform to the NCCSV specification and no information will be lost. The only differences between NCCSV files and the analogous spreadsheet files which follow these conventions are:
See the Spreadsheet section below for more information.
Streamable — Like CSV files in general, NCCSV files are streamable. Thus, if an NCSV is generated on-the-fly by a data server such as ERDDAP™, the server can start to stream data to the requester before all of the data has been gathered. This is a useful and desirable feature. NetCDF files, by contrast, are not streamable.
ERDDAP™ —
This specification is designed so that NCCSV files and the .nc files
that can be created from them can be used by an
ERDDAP™ data server
(via the
EDDTableFromNccsvFiles
and
EDDTableFromNcFiles dataset types),
but this specification is external to ERDDAP.
ERDDAP™ has several required global attributes and many recommended global
and variable attributes, mostly based on CF and ACDD attributes (see
https://erddap.github.io/setupDatasetsXml.html#globalAttributes).
Balance — The design of the NCCSV format is a balance of several requirements:
Other Specifications - This specification refers to several other specifications and libraries that it is designed to work with, but this specification is not a part of any of those other specifications, nor does it need any changes to them, nor does it conflict with them. If a detail related to one of these standards is not specified here, see the related specification. Notably, this includes:
Notation - In this specification, brackets, [ ], denote optional items.
NCCSV files must contain only 7-bit ASCII characters. Because of this, the character set or encoding used to write and read the file may be any character set or encoding which is compatible with the 7-bit ASCII character set, e.g., ISO-8859-1. ERDDAP™ reads and writes NCCSV files with the ISO-8859-1 charset.
NCCSV files may use either newline (\n) (which is common on Linux and Mac OS X computers) or carriageReturn plus newline (\r\n) (which is common on Windows computers) as end-of-line markers, but not both.
.nccsvMetadata — When both the creator and the reader are expecting it, it is also possible and sometimes useful to make a variant of an NCCSV file which contains just the metadata section (including the *END_METADATA* line). The result provides a complete description of the file's attributes, variable names, and data types, thus serving the same purpose as the .das plus .dds responses from an OPeNDAP server. ERDDAP™ will return this variation if you request fileType=.nccsvMetadata from an ERDDAP™ dataset.
Conventions -
The first line of an NCCSV file is the first line of the metadata section and
must have a *GLOBAL*
Conventions attribute listing all of the
conventions used in the file as a String containing a CSV list, for example:
*GLOBAL*,Conventions,"COARDS, CF-1.6, ACDD-1.3, NCCSV-1.0"
One of the conventions listed must be NCCSV-1.0,
which refers to the current version of this specification.
*END_METADATA* -
The end of the metadata section of an NCCSV file must be denoted by a
line with only
*END_METADATA*
It is recommended but not required that all of the attributes for a given variable appear on adjacent lines of the metadata section. If an NCCSV file is converted into a NetCDF file, the order that the variableNames first appear in the metadata section will be the order of the variables in the NetCDF file.
Optional blank lines are allowed in the metadata section after the required first line with *GLOBAL* Conventions information (see below) and before the required last line with *END_METADATA*.
If a spreadsheet is created from an NCCSV file, the metadata data section will appear with variable names in column A, attribute names in column B, and values in column C.
If a spreadsheet following these conventions is saved as a CSV file, there will often be extra commas at the end of the lines in the metadata section. The software that converts NCCSV files into .nc files will ignore the extra commas.
*SCALAR* — The special attributeName *SCALAR* can be used to create a scalar data variable and define its value. The data type of the *SCALAR* defines the data type for the variable, so do not specify a *DATA_TYPE* attribute for scalar variables. Note that there must not be data for the scalar variable in the Data Section of the NCCSV file.
For example, to create a scalar variable named "ship"
with the value "Okeanos Explorer" and a cf_role attribute, use:
ship,*SCALAR*,"Okeanos Explorer"
ship,cf_role,trajectory_id
When a scalar data variable is read into ERDDAP™,
the scalar value is converted into a column in the data table with the
same value on every row.
value is the value of the metadata attribute
and must be an array with one or more of either a
byte, short, int, long, float, double, String, or char.
No other data types are supported. Attributes with no value will be ignored.
If there is more than one sub-value, the sub-values must all be of the
same data type and separated by commas, for example:
sst,actual_range,0.17f,23.58f
If there are multiple String values, use a single String with
\n (newline) characters separating the substrings.
The definitions of the attribute data types are:
Suffix — Note that in the attributes section of an NCCSV file, all numeric attribute values must have a suffix letter (e.g., 'b') to identify the numeric data type (e.g., byte). But in the data section of an NCCSV file, numeric data values must never have these suffix letters (with the exception of 'L' for long integers) — the data type is specified by the *DATA_TYPE* attribute for the variable.
*DATA_TYPE* -
The data type for each non-scalar
variable must be specified by a
*DATA_TYPE* attribute which can have a value of
byte, short, int, long, float, double, String, or char
(case insensitive). For example,
qc_flag,*DATA_TYPE*,byte
WARNING: Specifying the correct *DATA_TYPE* is your responsibility.
Specifying the wrong data type (e.g., int when you should have
specified float) will not generate an error message and may cause
information to be lost (e.g., float values will be rounded to ints)
when the NCCSV file is read by ERDDAP™ or converted into a NetCDF file.
char Discouraged - The use of char data values is discouraged because they are not widely supported in other file types. char values may be written in the data section as single characters or as Strings (notably, if you need to write a special character). If a String is found, the first character of the String will be used as the char's value. Zero length Strings and missing values will be converted to character \uFFFF. Note that NetCDF files only support single byte chars, so any chars greater than char #255 will be converted to '?' when writing NetCDF files. Unless a charset attribute is used to specify a different charset for a char variable, the ISO-8859-1 charset will be used.
long Discouraged - Although many file types (e.g., NetCDF-4 and json) and ERDDAP™ support long data values, the use of long data values in NCCSV files is currently discouraged because they are currently not supported by Excel, CF and NetCDF-3 files. If you want to specify long data values in an NCCSV file (or in the corresponding Excel spreadsheet), you must use the suffix 'L' so that Excel doesn't treat the numbers as floating point numbers with lower precision. Currently, if an NCCSV files is converted into a NetCDF-3 .nc file, long data values will be converted into double values, causing a loss of precision for very large values (less than -2^53 or greater than 2^53).
CF, ACDD, and ERDDAP™ Metadata -
Since it is envisioned that most NCCSV files, or the .nc files
created from them, will be read into ERDDAP,
it is strongly recommended that NCCSV files include the metadata attributes
which are required or recommended by ERDDAP™ (see
https://erddap.github.io/setupDatasetsXml.html#globalAttributes).
The attributes are almost all from the CF and ACDD metadata standards and
serve to properly describe the dataset (who, what, when, where, why, how)
to someone who otherwise knows
nothing about the dataset. Of particular importance,
almost all numeric variables should have a units
attribute with a UDUNITS-compatible value, e.g.,
sst,units,degree_C
It is fine to include additional attributes which are not from the CF or ACDD standards or from ERDDAP.
The second through the penultimate lines of the data section must have a comma-separated list of values. Each row of data must have the same number of values as the comma-separated list of variable names. Spaces before or after values are not allowed because they cause problems when importing the file into spreadsheet programs. Each column in this section must contain only values of the *DATA_TYPE* specified for that variable by the *DATA_TYPE* attribute for that variable. Unlike in the attributes section, numeric values in the data section must not have suffix letters to denote the data type. Unlike in the attributes section, char values in the data section may omit the enclosing single quotes if they are not needed for disambiguation (thus, ',' and '\'' must be quoted as shown here). There may be any number of these data rows in an NCCSV file, but currently ERDDAP™ can only read NCCSV files with up to about 2 billion rows. In general, it is recommended that you split large datasets into multiple NCCSV data files with fewer than 1 million rows each.
*END_DATA* -
The end of the data section must be denoted by a line with only
*END_DATA*
If there is additional content in the NCCSV file after the *END_DATA* line, it will be ignored when the NCCSV file is converted into an .nc file. Such content is therefore discouraged.
In a spreadsheet following these conventions, the variable names and data values will be in multiple columns. See the example below.
Numeric missing values may be written as a numeric value identified by a
missing_value or _FillValue attribute for that variable.
For example, see the second value on this data row:
Bell M. Shimada,99,123.4
This is the recommended way to handle missing values for
byte, short, int, and long variables.
float or double NaN values may be written as NaN.
For example, see the second value on this data row:
Bell M. Shimada,NaN,123.4
String and numeric missing values may be indicated by an empty field.
For example, see the second value on this data row:
Bell M. Shimada,,123.4
For byte, short, int, and long variables,
the NCCSV converter utility and ERDDAP™ will convert an empty field
into the maximum allowed value for that data type (e.g., 127 for bytes).
If you do this, be sure to add a missing_value or
_FillValue attribute for that variable to identify this value,
e.g.,
variableName,_FillValue,127b
For float and double variables, an empty field will be converted to NaN.
DateTime values represented as numeric values must have a
units attribute which specifies the "units since dateTime"
as required by CF and specified by UDUNITS, e.g.,
time,units,seconds since 1970-01-01T00:00:00Z
DateTime values represented as String values must have a
String *DATA_TYPE* attribute and a units attribute
which specifies a dateTime pattern as specified by the
Java DateTimeFormatter class
(https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html). For example,
time,units,yyyy-MM-dd'T'HH:mm:ssZ
All dateTime values for a given data variable must use the same format.
In most cases, the dateTime pattern you need for the units
attribute will be a variation of one of these formats:
Precision —
When a software library converts an .nc file into an NCCSV file,
all dateTime values will be written as Strings with the
ISO 8601:2004(E) dateTime format, e.g., 1970-01-01T00:00:00Z .
You can control the precision with the ERDDAP-specific attribute
time_precision. See
https://erddap.github.io/setupDatasetsXml.html#time_precision.
Time Zone —
The default time zone for dateTime values is the Zulu (or GMT)
time zone, which has no daylight saving time periods.
If a dateTime variable has dateTime values from a different time zone,
you must specify this with the ERDDAP-specific attribute time_zone.
This is a requirement for ERDDAP™ (see
https://erddap.github.io/setupDatasetsXml.html#time_zone).
*GLOBAL*,Conventions,"COARDS, CF-1.6, ACDD-1.3, NCCSV-1.0" *GLOBAL*,cdm_trajectory_variables,"ship" *GLOBAL*,creator_email,erd.data@noaa.gov *GLOBAL*,creator_name,Bob Simons *GLOBAL*,creator_type,person *GLOBAL*,creator_url,https://www.pfeg.noaa.gov *GLOBAL*,featureType,trajectory *GLOBAL*,infoUrl,https://erddap.github.io/NCCSV.html *GLOBAL*,institution,"NOAA NMFS SWFSC ERD, NOAA PMEL" *GLOBAL*,license,"""NCCSV Demonstration"" by Bob Simons and Steve Hankin is licensed under CC BY 4.0, https://creativecommons.org/licenses/by/4.0/ ." *GLOBAL*,keywords,"NOAA, sea, ship, sst, surface, temperature, trajectory" *GLOBAL*,standard_name_vocabulary,CF Standard Name Table v55 *GLOBAL*,subsetVariables,"ship" *GLOBAL*,summary,"This is a paragraph or two describing the dataset." *GLOBAL*,title,"NCCSV Demonstration" ship,*DATA_TYPE*,String ship,cf_role,trajectory_id time,*DATA_TYPE*,String time,standard_name,time time,units,"yyyy-MM-dd'T'HH:mm:ssZ" lat,*DATA_TYPE*,double lat,units,degrees_north lon,*DATA_TYPE*,double "lon","units","degrees_east" status,*DATA_TYPE*,char status,comment,"From http://some.url.gov/someProjectDocument , Table C" testLong,*DATA_TYPE*,long testLong,units,1 sst,*DATA_TYPE*,float sst,standard_name,sea_surface_temperature sst,actual_range,0.17f,23.58f sst,units,degree_C sst,missing_value,99f sst,testBytes,-128b,0b,127b sst,testShorts,-32768s,0s,32767s sst,testInts,-2147483648i,0i,2147483647i sst,testLongs,-9223372036854775808L,0L,9223372036854775807L sst,testFloats,-3.40282347e38f,0f,3.40282347E+38f sst,testDoubles,-1.79769313486231570e308d,0d,1.79769313486231570E+308d sst,testChars,"','","'""'","'\u20AC'" sst,testStrings," a~,\n'z""\u20AC" *END_METADATA* ship,time,lat,lon,status,testLong,sst Bell M. Shimada,2017-03-23T00:45:00Z,28.0002,-130.2576,A,-9223372036854775808L,10.9 Bell M. Shimada,2017-03-23T01:45:00Z,28.0003,-130.3472,\u20AC,-1234567890123456L, "Bell M. Shimada","2017-03-23T02:45:00Z",28.0001,-130.4305,"'\t'",0L,10.7 Bell M. Shimada,2017-03-23T12:45:00Z,27.9998,-131.5578,"'""'",1234567890123456L,99 Bell M. Shimada,2017-03-23T21:45:00Z,28.0003,-132.0014,\u00fc,9223372036854775806L,10.0 Bell M. Shimada,2017-03-23T23:45:00Z,28.0002,-132.1591,,NaN
Notes:
The only differences between NCCSV files and the analogous spreadsheet which follow these conventions are:
If a spreadsheet following these conventions is saved as a CSV file, there will often be extra commas at the end of many of the lines. The software that converts NCCSV files into .nc files will ignore the extra commas.
To import an NCCSV file into Excel:
To create an NCCSV file from an Excel spreadsheet:
In Excel, the sample NCCSV file above appears as
To create an NCCSV file from a Google Sheets spreadsheet:
ERDDAP, Version 2.25
Disclaimers |
Privacy Policy