This presentation is written in reveal.js.
Click the link to learn more about how to navigate reveal.js presentations.
The Network Common Data Format (netCDF) file is a self-describing, portable, scalable, appendable, sharable and archivable binary data format.
Let’s start with the netCDF classic model.
There are three main components
to the classic netCDF data structure:
A quick note on unified modeling language (UML)
Notice how the enhanced netCDF model resembles HDF by adding the hierarchy of groups and by introducing user-defined data types.
That’s because it uses HDF5 as its base format!
For maximum portability, users are encouraged to use netCDF classic format to distribute data.
Exploring NetCDF
Let’s begin with an example scenario.
What is the impact on global public health
of rising maximum air temperatures?
What data do we need?
Where do we get it?
The Climatic Research Unit Time Series (CRU-TS)
Note: CRU data have been “versioning up” over the years and recently have moved from NetCDF3 (CRU TS v3.X) to NetCDF4 (CRU TS v4.X). QGIS does not handle HDF5/NetCDF4 formats without some finagling.
What’s available in CRU-TS?
How do we access it?
While you’re here…
scroll through all the options available through BADC.
Find cru
Take a look at the script found in
archive/badc/cru/software/third-party
Does the formatting look familiar?
Find the most recent file format doc of CRU TS in
archive/badc/cru/doc
Explore archive/badc/cru/data and find the time series (TS) for monthly average daily maximum air temperature.
Download the data
There are several versions of the same data. Find the netCDF version (.dat.nc.gz) for the shortest time period that includes the latest data.
Decompress the file
try from the command line
gzip --decompress cru_ts3.26.2011.2017.tmx.dat.nc.gz
or try one of these alternative solutions:
7zip or Python gzip
QGIS | Panoply |
Panoply shows you the dimensions, variables and attributes…
Notice how the dimensions are also saved as variables.
By giving dimensions values, it gives them meaning.
What is lat[1]
?
Check lat
variable and associated attributes
to see it’s -89.25 degrees.
More interesting is the actual tmx
data array:
It is N-dimensional, where n = 3
(time x lat x lon
)
Notice that time
, lat
, and lon
have dimensional values (84, 360, and 720)
and what each those values represent can be found from their respective variables.
Bonus question
Let’s try it out.
We are going to use
scipy.io.netcdf
to read and write in the classic format.
Find nc_read.py
and nc_write.py
in the scripts folder of our spatial-data-discovery.github.io repository.
Please follow along with the demo.