All Collections
WEkEO Data Viewer and Catalogue
Product Information
In which format the WEkEO data are delivered?
In which format the WEkEO data are delivered?

Let's see the different data formats available in the WEkEO DIAS and some simple steps to open them!

David Bina avatar
Written by David Bina
Updated over a week ago

Context


WEkEO DIAS provides several Earth observation data from several originating centers (What is WEkEO?).

In this article, we present the different formats of the provided data and simple tips on how to open them in Python.

Observation Domain

Data Format

Atmosphere

GRIB and NetCDF* in .zip

Climate Change

GRIB and NetCDF*

Emergency

GRIB and NetCDF in .zip

Land

GRIB, GeoTIFF and NetCDF* in .zip

Marine

NetCDF

Sentinel

NetCDF** in .SEN3, .SEN6, .SAFE

*: the NetCDF format is experimental.

**: for Sentinel data, the NetCDF files are stored in specific .zip folders (e.g. this section).

In the following examples, we will use different datasetIDs on Italy.

NetCDF data format


Let's see here how to open a NetCDF data file.

We will focus on the atmospheric temperature of January 1978, provided in the product ERA5 hourly data on pressure levels from 1950 to 1978 (preliminary version) (datasetID = EO:ECMWF:DAT:REANALYSIS_ERA5_PRESSURE_LEVELS_PRELIMINARY_BACK_EXTENSION).

The main packages we use are xarray (to open the dataset) and matplotlib (to customize our map):

import xarray as xr

ds = xr.open_dataset("ERA5_CAMS_1978.nc")
ds

This simple line allows to access all the information (dimensions, coordinates, variable(s), attributes) of the downloaded file:

Now, to generate a quick map, just call the xarray.plot() function as follows:

time = "1978-01-01"
ds.t.sel(time=time).plot()

However, we are visualizing here the atmospheric temperature, so if we want to add the coasts and better georeference the map, we need to import the package matplotlib:

import matplotlib.pyplot as plt 

f = plt.figure(figsize=(15,10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.add_feature(cfeature.LAND, zorder=1, edgecolor='k')

ds.t.sel(time=time).plot()
plt.title(f"Atmospheric Temperature (K) on {time}", size = 15)

GRIB data format


For the GRIB data format, we downloaded the ERA5-Land hourly data from 1950 to present product (datasetID = "EO:ECMWF:DAT:REANALYSIS_ERA5_LAND"), and its leaf area index (lai_hv) for the first week of January 2022.

We recommend installing cfgrib via conda:

conda install -c conda-forge cfgrib

As for the data in NetCDF format, the main package we will use is xarray:

import xarray as xr

gb = xr.load_dataset("mydirectory/ERA_CLMS_2022.grib", engine = "cfgrib")
gb

πŸ“Œ Note: it is important here to specify the engine type, engine = "cfgrib".

This command will open the .grib file and explore the downloaded data:

πŸ’‘WEkEO Pro Tip: make sure you have the cfgrib package installed, otherwise the error message "ValueError: unrecognized engine cfgrib must be one of: ['netcdf4', 'scipy', 'store']" will be displayed.

Finally, a simple line of code to plot and view the data:

gb.lai_hv.sel(time="2022-01-01").plot()

GeoTIFF data format


For GeoTIFF data format, let's focus on the Copernicus Land's Total productivity (PPI) data from the Vegetation Phenology and Productivity, yearly, product (datasetID = "EO:EEA:DAT:CLMS_HRVPP_VPP"), in southern Italy.

First, open the .tif file. We will use the rasterio package (to be installed if not yet):

import rasterio as rs
import rasterio.plot

# set the directory where .tif files are stored
data_dir = './tiff'
all_tiff = []

# loop to open and plot all .tif files
for path in os.listdir(data_dir):
if os.path.isfile(os.path.join(data_dir, path)):
all_tiff.append(path)
print('file_name = ', path) # file's name
with rs.open(os.path.join(data_dir, path)) as file:
print("data info : ", file.profile) # file's information
rasterio.plot.show(file) # plot
print(all_tiff)

Thus, for each GeoTIFF in the data_dir directory, we will obtain:

  • the file name

file_name = VPP_2020_S2_T33TXE-010m_V101_s1_TPROD.tif
  • general file information, such as the data format, data type, crs, and more:

data info : {'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 65535.0, 'width': 10980, 'height': 10980, 'count': 1, 'crs': CRS.from_epsg(32633), 'transform': Affine(10.0, 0.0, 600000.0, 0.0, -10.0, 4500000.0), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'deflate', 'interleave': 'band'}
  • and the basic generated image:

πŸ“Œ Note

rasterio package also allows to get some information about the .tif file:

  • tiff.bounds: indicates the spatial bounding box

  • tiff.count: number of bands

  • tiff.width: number of columns of the raster dataset

  • tiff.height: number of rows of the raster dataset

  • tiff.crs: coordinate reference system

For more information about this Python package, please consult the rasterio documentation page.

Sentinel data


Sentinel observation data have specific storage file extensions:

  • .SEN3: Sentinel-3 data

  • .SAFE: Sentinel-2 data

  • .SEN6: Sentinel-6 data

Despite these different extensions, the process of opening the downloaded data remains the same. Let's see what it is!

Since these extensions are only one type of compressed folder, the easiest way is to:

  1. Rename the filename extension adding .zip.

  2. Decompress the "new" .zip file.

Once all the data are extracted, it will be possible to access the downloaded NetCDF files using xarray as we have seen above (cf. this section).

For this short demo, we downloaded the Sentinel-3 OLCI data (stored in .SEN3 format).

As always, first step starting a script is to import the packages we will use:

import xarray as xr
import glob
import os
import shlex
import zipfile

So let's start with the first step of renaming the downloaded file in .zip. For convenience we will also define in a variable our working path:

# working path
path = f'{os.getcwd()}/sentinel'
path

# rename S3 data
for item in os.listdir(path): # loop through items in path
if item.endswith('.SEN3') and not item.endswith('.zip'):
os.rename(os.path.join(path, item), os.path.join(path, item+".zip"))

πŸ’‘ WEkEO Pro Tip: in case you have permission issues when renaming the archive with the code above, please manually do that by right-clicking on the file, then rename by adding .zip to the end of the filename.

And we obtain the following file:

  • from: S3B_OL_1_EFR____20220101T070244_20220101T070544_20220102T113328_0180_061_063_2520_MAR_O_NT_002

  • to: S3B_OL_1_EFR____20220101T070244_20220101T070544_20220102T113328_0180_061_063_2520_MAR_O_NT_002.zip

Let's now work on the .zip extension file and extract all the NetCDF files it stores:

extension = ".zip"

for item in os.listdir(path): # loop through items in path
if item.endswith(extension): # check for ".zip" extension
file_name = os.path.join(path, item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(path) # extract file to dir
zip_ref.close() # close file

In this way, you can explore the "new" unzipped folder with extension .SEN3 and work with the NetCDF data it contains.

πŸ“ŒNote: Sentinel-5P data is downloaded in a folder with no extension (e.g. S5P_OFFL_L2__NO2____20210205T104439_20210205T122609_17182_01_010400_20210207T042548), but the zip/unzip process is the same as described above.

What's next?


More examples are available in our previous WEkEO trainings, available online, from our JupyterHub!

We are user-driven and we implement users' suggestions, so feel free to contact us:

  • through a chat session available in the bottom right corner of the page

  • via e-mail to our support team (supportATwekeo.eu)

Did this answer your question?