Context

The WEkEO DIAS provides more than 400 Earth Observation datasets from several originating centers (more info on data available on WEkEO).

In this article, we will discuss the various formats of the data provided and show easy instructions on how to open the data using Python.

Table of data formats

Observation domain	Data format
Atmosphere	GRIB and NetCDF* in `.zip`
Climate Change	GRIB and NetCDF*
Emergency	GRIB and NetCDF in `.zip`
Land	GRIB, GeoTIFF and NetCDF* in `.zip`
Marine	NetCDF
Sentinel	NetCDF, `.SEN3`, `.SEN6`, `.SAFE` in `.zip`**

*the NetCDF format is experimental for these Services

**all Sentinel data are stored in a .zip file (learn how to unzip files)

📝 In the following examples, we will use different datasets over Italy.

NetCDF format

Let's see here how to open a NetCDF data file.

We will focus on the atmospheric temperature of January 1978, provided in the product ERA5 hourly data on pressure levels from 1950 to 1978 (preliminary version) (datasetID = EO:ECMWF:DAT:REANALYSIS_ERA5_PRESSURE_LEVELS_PRELIMINARY_BACK_EXTENSION).

The main packages we use are:

xarray: to open the dataset
matplotlib: to customize our map

This simple code allows to access all the information (dimensions, coordinates, variables, attributes) of the NetCDF file:

import xarray as xr
dataset = xr.open_dataset("ERA5_CAMS_1978.nc")
dataset

Now, to generate a quick map, just call the xarray.plot() function as follows:

time = "1978-01-01"
dataset.t.sel(time=time).plot()

We can then use matplotlib to enhance the plot. We are visualizing here the atmospheric temperature, and we add the coasts and better georeference the map:

import matplotlib.pyplot as plt 

f = plt.figure(figsize=(15,10))
ax = plt.axes(projection=ccrs.PlateCarree())                  
ax.coastlines()  
ax.add_feature(cfeature.LAND, zorder=1, edgecolor='k')  

dataset.t.sel(time=time).plot()
plt.title(f"Atmospheric Temperature (K) on {time}", size = 15)

GRIB format

For the GRIB data format, we downloaded the ERA5-Land hourly data from 1950 to present product (datasetID = "EO:ECMWF:DAT:REANALYSIS_ERA5_LAND"), and its leaf area index (lai_hv) for the first week of January 2022.

We recommend installing cfgrib via conda:

conda install -c conda-forge cfgrib

As for the data in NetCDF format, we use xarray to open the data:

import xarray as xr
grib = xr.load_dataset("mydirectory/ERA_CLMS_2022.grib", engine = "cfgrib")
grib

📌 Note: it is important here to specify the engine type engine = "cfgrib".

This command will open the .grib file and explore the downloaded data:

💡WEkEO Pro Tip: make sure you have the cfgrib package installed, otherwise the error message "ValueError: unrecognized engine cfgrib must be one of: ['netcdf4', 'scipy', 'store']" will be displayed.

Finally, a simple line of code to plot and view the data:

grib.lai_hv.sel(time="2022-01-01").plot()

GeoTIFF format

For GeoTIFF data format, let's focus on the Copernicus Land's Total productivity (PPI) data from the Vegetation Phenology and Productivity, yearly, product (datasetID = "EO:EEA:DAT:CLMS_HRVPP_VPP"), in southern Italy.

First, open the .tif file. We will use the rasterio package (to be installed if not yet):

import rasterio as rs
import rasterio.plot

# set the directory where .tif files are stored
data_dir = './tiff'
all_tiff = []

# loop to open and plot all .tif files
for path in os.listdir(data_dir):
    if os.path.isfile(os.path.join(data_dir, path)):
        all_tiff.append(path)
        print('file_name = ', path)               # file's name 
        with rs.open(os.path.join(data_dir, path)) as file:
            print("data info : ", file.profile)   # file's information
            rasterio.plot.show(file)              # plot
print(all_tiff)

Thus, for each GeoTIFF in the data_dir directory, we will obtain:

The file name:

file_name = VPP_2020_S2_T33TXE-010m_V101_s1_TPROD.tif

General file information, such as the data format, data type, crs, and more:

data info : {'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 65535.0, 'width': 10980, 'height': 10980, 'count': 1, 'crs': CRS.from_epsg(32633), 'transform': Affine(10.0, 0.0, 600000.0, 0.0, -10.0, 4500000.0), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'deflate', 'interleave': 'band'}

The basic generated image:

📌 Note: rasterio also allows to get some information about the .tif file:

tiff.bounds: indicates the spatial bounding box
tiff.count: number of bands
tiff.width: number of columns of the raster dataset
tiff.height: number of rows of the raster dataset
tiff.crs: coordinate reference system

For more information about this Python package, please consult the rasterio documentation page.

Sentinel data

All Sentinel data are concatenated in a .zip file when downloaded, in which you'll find files in specific extensions:

Sentinel-1 and Sentinel-2 products (supplied by ESA) are provided in .SAFE format (cf. sentiwiki)
Sentinel-3 products (supplied by ESA or Eumetsat) are provided in .SEN3 format (a tailored version of SAFE format)
Sentinel-5P products (supplied by ESA) are provided in NetCDF format
Sentinel-6 products (supplied by Eumetsat) are provided in .SEN6 format (a tailored version of SAFE format)

To open downloaded Sentinel data, you simply need to unzip all the files:

import os
import zipfile

extension = ".zip"
path = "Directory"

for item in os.listdir(path):                # loop through items in path
    if item.endswith(extension):             # check for ".zip" extension
        file_name = os.path.join(path, item) # get full path of files
        zip_ref = zipfile.ZipFile(file_name) # create zipfile to read it
        zip_ref.extractall(path)             # extract file to dir
        zip_ref.close()                      # close file

💡WEkEO Pro Tip: for data containing NetCDF files, you can check our previous section to learn how to open a .nc file via xarray! 😃

And that's it!

Now you know all about WEkEO data format and how to read it! 😎

What's next?

More examples are available in our previous WEkEO trainings, available online, from our JupyterHub!

Additional resources can be found in our Help Center. Should you require further assistance or wish to provide feedback, feel free to contact us through a chat session available in the bottom right corner of the page.

What data is available in WEkEO?

How to get direct access to EODATA?

Which are the computing resources of the WEkEO JupyterHub?

Exploring WEkEO Earthkit: Introduction

Overview of datasets compatible with Earthkit

In which formats are WEkEO data delivered?