Skip to main content
All CollectionsWEkEO Plugins
Exploring WEkEO Earthkit: From data retrieval to visualization
Exploring WEkEO Earthkit: From data retrieval to visualization

Learn how to use WEkEO Earthkit for data access and visualization with practical examples.

Alexandre avatar
Written by Alexandre
Updated this week

Context


EarthKit simplifies workflows in weather and climate science by streamlining data access, processing, analysis, and visualization. Designed to eliminate technical complexities, it enables efficient data handling without concerns about formats or file management.

As an open-source Python project led by the European Centre for Medium-Range Weather Forecasts (ECMWF), EarthKit now integrates the WEkEO Harmonized Data Access (HDA) into the earthkit package. This integration allows seamless retrieval of most WEkEO datasets through EarthKit’s wrapper functions, further enhancing accessibility and efficiency in meteorological and climate research.

This artcile gives an overview on how to use the Earthkit WEkEO Plugin for data access, analysis and visualization.

You'll find the code used in this article in the following 2 notebooks, as follows:

Setting up Earthkit


Before accessing and processing data, EarthKit must be installed and configured. This section covers the installation process and how to set up caching to optimize dataset retrieval and reuse.

Installation

The WEkEO plugins are included in the earthkit package, so you will only need to download the earthkit package for running this notebook:

! pip install earthkit

Next, import the necessary packages and the freshly installed earthkit package:

import os
import earthkit.data as ekd
import earthkit.plots
import xarray as xr
from earthkit.data import settings, cache
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature

Configuring Cache and Settings

EarthKit's cache feature detects repeated dataset requests and reuses previously downloaded data. A temporary cache or a custom cache directory can be configured, as shown below with the "cache" folder.

s = {"cache-policy": "user",
"user-cache-directory": "./cache"}
settings.set(s)
cache.directory()

Accessing and managing data


EarthKit simplifies data retrieval and management by providing seamless access to WEkEO datasets. This section covers how to download datasets, leverage caching for efficiency, and convert data into a structured format for analysis.

Download datasets using Earthkit

The earthkit.data.from_source function loads datasets from the WEkEO HDA. When first downloading, configure your WEkEO username and password in the pop-up below the code cell.

The first argument is always "wekeo-source", and the second is the request for the WEkEO HDA. You can reuse past requests or retrieve the API request from the WEkEO Viewer.

ds_eum = ekd.from_source("wekeo", 
"EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS_MONTHLY_MEANS",
request = {
"dataset_id": "EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS_MONTHLY_MEANS",
"product_type": ["monthly_averaged_reanalysis_by_hour_of_day"],
"variable": ["2m_temperature"],
"year": ["2019"],
"month": ["01"],
"time": ["00:00","01:00","02:00","03:00","04:00","05:00",
"06:00","07:00","08:00","09:00","10:00","11:00",
"12:00","13:00","14:00","15:00","16:00","17:00",
"18:00","19:00","20:00","21:00","22:00","23:00"],
"data_format": "netcdf",
"download_format": "zip",
"itemsPerPage": 200,
"startIndex": 0
})

The data is cached, so no file management is needed. It will be deleted when the cache is cleared. Re-running ekd.from_source with the same dataset and parameters uses the cached data instead of downloading again, making execution faster.

More information on the caching of Earthkit can be found in the official documentation: Earthkit Caching Documentation.

Converting data to xArray

The earthkit.to_xarray function converts the downloaded data into an xArray.

t_monthly = ds_eum.to_xarray()
t_monthly

Output (click me):

<xarray.Dataset> Size: 100MB
Dimensions: (valid_time: 24, latitude: 721, longitude: 1440)
Coordinates:
number int64 8B ...
* valid_time (valid_time) datetime64[ns] 192B 2019-01-01 ... 2019-01-01T23...
* latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
expver (valid_time) <U4 384B dask.array<chunksize=(24,), meta=np.ndarray>
Data variables:
t2m (valid_time, latitude, longitude) float32 100MB dask.array<chunksize=(12, 361, 720), meta=np.ndarray>
Attributes:
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts


Data exploration and processing


Once data is retrieved and converted into xArray, it can be explored and processed for analysis. This section covers how to select subsets, perform calculations on time steps, and merge multiple datasets for comparison.

Exploring data with xArray

After converting to xArray, you can explore, analyze, and visualize the data using xArray functions.

The example below selects a subset by slicing latitude and longitude to approximate Germany. Then, it averages all 24 hours using mean() and visualizes the result as a line chart with plot.line.

t_monthly.t2m.sel( latitude=slice( 56, 47), longitude = slice(5, 16)).mean(dim=["latitude", "longitude"]).plot.line(x="valid_time")

plt.title ("Diurnal Temperature Cycle for Germany, Jan. 2019")

Output:

Performing arithmetic operations on time steps

Arithmetic operations can be performed on different time steps of the dataset. Here, two data arrays from single time steps are subtracted to create a temperature difference map, showing the temperature change between 00:00 UTC and 12:00 UTC globally. This difference reflects the day-night cycle inversion across the globe.

diff=t_monthly.t2m.isel(valid_time=0) - t_monthly.t2m.isel(valid_time=11)  
diff.plot()
plt.title("Temperature Difference between 00:00 UTM and 12:00 UTM")

Output:

Combining multiple datasets

With the WEkEO Earthkit Plugin, datasets from different sources can be downloaded and combined. This section covers handling multiple datasets by adding daily temperature data from January 1, 2019, for comparison with the previously downloaded monthly temperature means.

ds_day = ekd.from_source("wekeo", "EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS", request = {
"dataset_id": "EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS",
"product_type": ["reanalysis"],
"variable": ["2m_temperature"],
"year": ["2019"],
"month": ["01"],
"day": ["01"],
"time": ["00:00","01:00","02:00","03:00","04:00","05:00",
"06:00","07:00","08:00","09:00","10:00","11:00",
"12:00","13:00","14:00","15:00","16:00","17:00",
"18:00","19:00","20:00","21:00","22:00","23:00"],
"data_format": "netcdf",
"download_format": "zip",
"itemsPerPage": 200,
"startIndex": 0
})

t_daily= ds_day.to_xarray()
t_daily

Output (click me):

<xarray.Dataset> Size: 100MB
Dimensions: (valid_time: 24, latitude: 721, longitude: 1440)
Coordinates:
number int64 8B ...
* valid_time (valid_time) datetime64[ns] 192B 2019-01-01 ... 2019-01-01T23...
* latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
expver (valid_time) <U4 384B dask.array<chunksize=(24,), meta=np.ndarray>
Data variables:
t2m (valid_time, latitude, longitude) float32 100MB dask.array<chunksize=(12, 361, 720), meta=np.ndarray>
Attributes:
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2025-02-10T09:27 GRIB to CDM+CF via cfgrib-0.9.1...


Since both datasets use the same variable name (t2m) for temperature, at least one must be renamed before merging:

t_daily = t_daily.rename({'t2m': 't2m_daily'})

The merge function combines datasets with identical dimensions. For datasets with different characteristics, consider using concat, combine_by_coords, or combine_nested. Detailed documentation on these functions is available here.

t = t_monthly.merge(t_daily)

You can now compare the daily temperature on the 1st of January 2019 against the mean temperature in January in Germany.

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(15, 5))
fig.suptitle ("Comparison of temperature deviation between the daily temperatures on 01.01.2029 and the monthly means of January 2019 over Germany")

t.t2m_daily.sel( latitude=slice( 56, 47), longitude = slice(5, 16)).mean(dim=["latitude", "longitude"]).plot.line(x="valid_time", label="Daily", ax=ax1)
t.t2m.sel( latitude=slice( 56, 47), longitude = slice(5, 16)).mean(dim=["latitude", "longitude"]).plot.line(x="valid_time", label="Monthly Mean", ax=ax1)

diff = t.sel( latitude=slice( 56, 47), longitude = slice(5, 16)).t2m_daily - t.t2m
diff.isel(valid_time=11).plot(ax=ax2)
ax1.legend()

Output:

Saving and exporting processed data


Exporting the created data subsets for later use is recommended, especially for large datasets. Saving intermediate results prevents the need to redownload data when the Jupyter Hub cache is cleared, significantly saving time.

t.to_zarr("temp_europe_daily_monthly_201901.zarr")


Data visualization with earthkit-plots


Earthkit provides built-in plotting capabilities to visualize data retrieved from the HDA. This section presents examples of how to plot different types of data using earthkit-plots. The available visualization methods vary depending on the dataset. For a complete overview, refer to the earthkit-plots documentation

Loading datasets for visualization

To proceed with visualization, two different datasets from the previous sections will be used:

  • ERA5 Air Temperature (2m) – Monthly averaged reanalysis data by the hour of the day.

  • CLMS Land Surface Temperature (LST) – 10-daily median values.

Load the datasets using earthkit.data.from_source:

ds_era5 = ekd.from_source("wekeo", "EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS_MONTHLY_MEANS", request={
"dataset_id": "EO:ECMWF:DAT:REANALYSIS_ERA5_SINGLE_LEVELS_MONTHLY_MEANS",
"product_type": ["monthly_averaged_reanalysis_by_hour_of_day"],
"variable": ["2m_temperature"],
"year": ["2020"],
"month": ["07"],
"time": ["00:00","01:00","02:00","03:00","04:00","05:00",
"06:00","07:00","08:00","09:00","10:00","11:00",
"12:00","13:00","14:00","15:00","16:00","17:00",
"18:00","19:00","20:00","21:00","22:00","23:00"],
"data_format": "netcdf",
"download_format": "zip",
"itemsPerPage": 200,
"startIndex": 0
})

ds_clms = ekd.from_source("wekeo", "EO:CLMS:DAT:CLMS_GLOBAL_LST_5KM_V1_10DAILY-DAILY-CYCLE_NETCDF", request = {
"dataset_id": "EO:CLMS:DAT:CLMS_GLOBAL_LST_5KM_V1_10DAILY-DAILY-CYCLE_NETCDF",
"productType": "LST10",
"resolution": "5000",
"startdate": "2020-07-01T00:00:00.000Z",
"enddate": "2020-07-02T23:59:59.999Z",
"itemsPerPage": 200,
"startIndex": 0
} )

With the data loaded, the next step is visualization using earthkit-plots.

Plotting two different datasets side by side (same region)

This example visualizes and compares two datasets for the same region using earthkit-plots:

  1. ERA5 Air Temperature (2m) at 12:00 in July 2020.

  2. CLMS Land Surface Temperature (LST) Median at 12:00 on July 1, 2020.

Side-by-side maps reveal differences between air and land surface temperatures, created with the earthkit-plots Figure class for Europe:

figure = earthkit.plots.Figure(rows=1, columns=2)

dt_map = figure.add_map(domain="Europe")
dt_map.plot(ds_era5[11], units="kelvin")
dt_map.legend(location="bottom")
dt_map.title("ERA5 Air Temperature (2m) at 12:00 in July 2020")


eum_map = figure.add_map(domain="Europe")
eum_map.plot(ds_clms[59], units="kelvin")
eum_map.legend(location="bottom")
eum_map.title("CLMS LST Median on at 12:00 on 1st July.2020")

figure.land()
figure.gridlines()

figure.show()

Output:

Plotting two different datasets side by side (different regions)

This example compares datasets across regions using earthkit-plots:

  • ERA5 Air Temperature (2m) at 12:00 in July 2020 over Germany.

  • CLMS LST Median at 12:00 on July 1, 2020, over Spain.

Side-by-side maps highlight regional temperature variations, created with the earthkit-plots Figure class:

figure = earthkit.plots.Figure(rows=1, columns=2)

dt_map = figure.add_map(domain="Germany")
dt_map.plot(ds_era5[11], units="kelvin")
dt_map.legend(location="bottom")
dt_map.title("ERA5 Air Temperature (2m) at 12:00 in July 2020")


eum_map = figure.add_map(domain="Spain")
eum_map.plot(ds_clms[59], units="kelvin")
eum_map.legend(location="bottom")
eum_map.title("CLMS LST Median on at 12:00 on 1st July.2020")

figure.land()
figure.gridlines()

figure.show()

Output:

Plot each record of a dataset in a subplot

This example demonstrates how to visualize multiple time steps of a dataset using earthkit-plots. Each subplot represents a different timestamp of ERA5 Air Temperature (2m) over Europe, allowing for an easy comparison of temporal variations:

figure = earthkit.plots.Figure(size=(12, 18), rows=6, columns=4)

for i in range(len(ds_era5.ls())):
dt_map = figure.add_map(domain="Europe")
dt_map.plot(ds_era5[i], units="kelvin")
date = str(ds_era5[i].slices[0].value.strftime("%Y-%m-%d %H:%M:%S"))
dt_map.title("ERA5 Air Temperature (2m) at \n"+date, fontsize=10)
figure.legend(location="bottom")

Output (click me):

What's next?


Several articles are available in the WEkEO Help Center, but if you have any questions, issues or suggestions, feel free to contact us through a chat session available in the bottom right corner of the page.

Did this answer your question?