Hierarchical Data Format Version 5 (HDF5)
is model for managing and storing data, which includes both the storage model (i.e., the file format .hdf) and the
libraries for programming interfaces to implement this model (e.g., h5py).
The HDF5 file is a portable, self-describing file format that supports large and heterogeneously complex data.
There are three main components
to the HDF5 data structure:
It all begins with the root (/).
From there, you may define some groups.
And, in one of the groups, you put a dataset.
Then, you decide to add a dataset to your root group.
The world is your oyster.
Exploring HDF
Let’s take a look at an example scenario.
Find where all the plant life
is located on Earth.
What data do we need?
Where do we get it?
Both satellites host the
Moderate Resolution Imaging Spectroradiometer
or MODIS for short
By combining spectral bands from MODIS, we can “see” vegetation coverage over the entire globe at about 16-day to one month averages
(it takes a while for these satellites to image the whole earth.)
How to Find This Data
*Some data are only available to select users/researchers.
How do we access it?
Challenge
In the LP DAAC, find the latest monthly 0.05 degree global EVI vegetation index from the Terra satellite in HDF file format.
What is the name of that file?
Let’s try it out.
We are going to use
h5py
to read and write the HDF5 file format.
Find hdf_read.py
and hdf_write.py
in the scripts folder of our spatial-data-discovery.github.io repository.
Please follow along with the demo.