Skip to main content
All CollectionsWEkEO Data DownloadHDA API Python package
How to Use the hdar Package for Accessing the WEkEO HDA API in R?
How to Use the hdar Package for Accessing the WEkEO HDA API in R?

Let see how the Harmonized Data Access (HDA) API can be used in R.

David Bina avatar
Written by David Bina
Updated this week

Context


The hdar R package provides seamless access to the WEkEO Harmonised Data Access (HDA) API, enabling users to programmatically query and download data from within R.

To utilize the HDA service and library, you must first register for a WEkEO account.

Registration is straightforward and can be completed through the following link: Register for WEkEO. Once your account is set up, you will be able to access the HDA services immediately.

Now let's see now how to use the HDA API with R! 🙌

Installation


To start using the hdar package, you first need to install and load it into your R environment. The stable version is available on CRAN and can be installed with the following command:

if(!require("hdar")){install.packages("hdar")}

Next, load the package into your R session:

library(hdar)

If you want to access the development version of the package, you can install it directly from GitHub.

Install the development version from GitHub

Make sure you have the devtools package installed, then use the following command:

devtools::install_github("eea/hdar@develop")

Authentication


To interact with the HDA service, you need to authenticate by providing your username and password. The Client class allows you to pass these credentials directly and optionally save them to a configuration file for future use. If credentials are not specified as parameters, the client will read them from the ~/.hdarc file.

Step 1. Creating and Authenticating the Client

Next, you need to set up the user credentials. There are two available methods to do this.

Method 1 (occasional users)

Set up WEkEO credentials directly by passing your username and password. Here's an example of how to authenticate, with the option to save these credentials for future use in the ~/.hdarc file:

username <- "your_username"
password <-"your_password"
client <- Client$new(username, password, save_credentials = TRUE)

Method 2 (regular users)

Once the credentials are saved, you can initialize the Client class without passing the credentials. The client will read the credentials from the ~/.hdarc file:

client <- Client$new()

Step 2. Checking Authentication

Once the client is created, you can check if it has been authenticated properly by calling a method token() that verifies authentication. For example:

client$get_token() 

If the authentication is successful, this will return a token indicating that you are connected.

Copernicus Terms and Conditions (T&C)


Copernicus data is free to use and modify, still T&Cs must be accepted in order to download the data. The hdar package provides a convenient feature to review and accept or reject the T&Cs for each individual Copernicus service.

To display the T&Cs in your browser and read them, you can use the following command:

client$show_terms()

Once you've reviewed the terms, you can accept or reject individual T&Cs, or all at once, using the following command:

client$terms_and_conditions()

This will display a list of the T&Cs, showing whether they have been accepted (TRUE) or not (FALSE), like this:

                                               term_id accepted
1 Copernicus_General_License FALSE
2 Copernicus_Sentinel_License FALSE
3 EUMETSAT_Core_Products_Licence FALSE
4 EUMETSAT_Copernicus_Data_Licence FALSE
5 Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m FALSE
6 Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m FALSE
7 Copernicus_ECMWF_License FALSE
8 Copernicus_Land_Monitoring_Service_Data_Policy FALSE
9 Copernicus_Marine_Service_Product_License FALSE
10 CNES_Open_2.0_ETALAB_Licence FALSE

To accept all T&Cs at once, you can use the following command:

client$terms_and_conditions(term_id = 'all')

This will mark all terms as accepted, as shown in the following output:

                                              term_id accepted
1 Copernicus_General_License TRUE
2 Copernicus_Sentinel_License TRUE
3 EUMETSAT_Core_Products_Licence TRUE
4 EUMETSAT_Copernicus_Data_Licence TRUE
5 Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m TRUE
6 Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m TRUE
7 Copernicus_ECMWF_License TRUE
8 Copernicus_Land_Monitoring_Service_Data_Policy TRUE
9 Copernicus_Marine_Service_Product_License TRUE
10 CNES_Open_2.0_ETALAB_Licence TRUE

This simplifies the process of agreeing to the required terms so you can proceed with downloading the data.

Finding Datasets


WEkEO provides access to a wide variety of products. To help you find what you need, the Client class includes a method called datasets, which lists all available datasets and allows you to filter them by a text pattern if desired.

Retrieving Available Datasets

The basic usage of the datasets method is simple. To retrieve a complete list of all datasets available on WEkEO, which can take about 2 minutes, use the following command:

all_datasets <- client$datasets()

If you want to list all dataset IDs from the filtered datasets, you can do so with this command:

sapply(filtered_datasets,FUN = function(x){x$dataset_id})

Filtering Datasets

You can also filter the datasets by providing a text pattern. This is useful when you are looking for datasets that match a specific keyword or phrase.

filtered_datasets <- client$datasets("Seasonal Trajectories")

# To list the dataset IDs from the filtered results:
sapply(filtered_datasets,FUN = function(x){x$dataset_id})

The output will display dataset IDs like the following:

"EO:EEA:DAT:CLMS_HRVPP_VPP-LAEA" "EO:EEA:DAT:CLMS_HRVPP_ST"       "EO:EEA:DAT:CLMS_HRVPP_ST-LAEA" "EO:EEA:DAT:CLMS_HRVPP_VPP"

Similarly, if you are looking for datasets related to "Baltic," you can filter them as follows:

filtered_datasets <- client$datasets("Baltic")

# To list the dataset IDs from the filtered results:
sapply(filtered_datasets, FUN = function(x) { x$dataset_id })

The output for this query will be:

[1] "EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_BGC_003_007:cmems_mod_bal_bgc-pp_anfc_P1D-i_202311" 
[2] "EO:MO:DAT:NWSHELF_MULTIYEAR_PHY_004_009:cmems_mod_nws_phy-sst_my_7km-2D_PT1H-i_202112"
[3] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_MY_009_134:cmems_obs-oc_bal_bgc-plankton_my_l4-multi-1km_P1M_202211" [4] "EO:MO:DAT:SST_BAL_PHY_SUBSKIN_L4_NRT_010_034:cmems_obs-sst_bal_phy-subskin_nrt_l4_PT1H-m_202211"
[5] "EO:MO:DAT:BALTICSEA_MULTIYEAR_PHY_003_011:cmems_mod_bal_phy_my_P1Y-m_202303"
[6] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_NRT_009_131:cmems_obs-oc_bal_bgc-transp_nrt_l3-olci-300m_P1D_202207"
[7] "EO:MO:DAT:BALTICSEA_MULTIYEAR_BGC_003_012:cmems_mod_bal_bgc_my_P1Y-m_202303"
[8]"EO:MO:DAT:SST_BAL_SST_L4_REP_OBSERVATIONS_010_016:DMI_BAL_SST_L4_REP_OBSERVATIONS_010_016_202012"
[9]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_PHY_003_006:cmems_mod_bal_phy_anfc_PT15M-i_202311"
[10] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_MY_009_133:cmems_obs-oc_bal_bgc-plankton_my_l3-multi-1km_P1D_202207"
[11] "EO:MO:DAT:SST_BAL_PHY_L3S_MY_010_040:cmems_obs-sst_bal_phy_my_l3s_P1D-m_202211"
[12] "EO:MO:DAT:SEAICE_BAL_SEAICE_L4_NRT_OBSERVATIONS_011_004:FMI-BAL-SEAICE_THICK-L4-NRT-OBS"
[13] "EO:MO:DAT:SEAICE_BAL_PHY_L4_MY_011_019:cmems_obs-si_bal_seaice-conc_my_1km_202112"
[14]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_WAV_003_010:cmems_mod_bal_wav_anfc_PT1H-i_202311"
[15] "EO:MO:DAT:BALTICSEA_REANALYSIS_WAV_003_015:dataset-bal-reanalysis-wav-hourly_202003"
[16] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_NRT_009_132:cmems_obs-oc_bal_bgc-plankton_nrt_l4-olci-300m_P1M_202207"
[17] "EO:MO:DAT:SST_BAL_SST_L3S_NRT_OBSERVATIONS_010_032:DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE_201904"

Understanding the Results

The datasets method returns a list of datasets along with relevant information such as names, descriptions, and other metadata. This provides key details about each dataset, helping users understand its purpose and content. Here is an example of the output for the dataset EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES:

client$datasets("EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES")

The result includes the following information:

$terms
"Copernicus_ECMWF_License"

$dataset_id
"EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES"

$title
"Near surface meteorological variables from 1979 to 2019 derived from bias-corrected reanalysis"

$abstract
"This dataset provides bias-corrected reconstruction of near-surface meteorological variables derived from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses (ERA5). It is intended to be used as a meteorological forcing dataset for land surface and hydrological models. \nThe dataset has been obtained using the same methodology used to derive the widely used water, energy and climate change (WATCH) forcing data, and is thus also referred to as WATCH Forcing Data methodology applied to ERA5 (WFDE5). The data are derived from the ERA5 reanalysis product that have been re-gridded to a half-degree resolution. Data have been adjusted using an elevation correction and monthly-scale bias corrections based on Climatic Research Unit (CRU) data (for temperature, diurnal temperature range, cloud-cover, wet days number and precipitation fields) and Global Precipitation Climatology Centre (GPCC) data (for precipitation fields only). Additional corrections are included for varying atmospheric aerosol-loading and separate precipitation gauge observations. For full details please refer to the product user-guide.\nThis dataset was produced on behalf of Copernicus Climate Change Service (C3S) and was generated entirely within the Climate Data Store (CDS) Toolbox. The toolbox source code is provided in the documentation tab.\n\nVariables in the dataset/application are:\nGrid-point altitude, Near-surface air temperature, Near-surface specific humidity, Near-surface wind speed, Rainfall flux, Snowfall flux, Surface air pressure, Surface downwelling longwave radiation, Surface downwelling shortwave radiation"

$doi
NULL

$thumbnails
"https://datastore.copernicus-climate.eu/c3s/published-forms-v2/c3sprod/derived-near-surface-meteorological-variables/overview.jpg"

Creating a Query Template


To search for a specific product, you first need to create a query template. You can either use the WEkEO viewer to define your search parameters and copy-paste the resulting JSON query directly into your script.

Alternatively, you can use the function generate_query_template to create a query template for a specific dataset. This offers a flexible and programmatic way to define your search parameters within R.

Using the WEkEO Viewer

You can use the WEkEO Viewer to visually select the desired dataset and define the search parameters, such as the region of interest, the variable to retrieve, and the time period. Once you have configured your query, you can copy the resulting JSON and use it in your R script to execute the search (see how to get the query):

For example, a query from the WEkEO viewer might look like this:

query <- '{
"dataset_id": "EO:ECMWF:DAT:CEMS_GLOFAS_HISTORICAL",
"system_version": [
"version_4_0"
],
"hydrological_model": [
"lisflood"
],
"product_type": [
"consolidated"
],
"variable": [
"river_discharge_in_the_last_24_hours"
],
"hyear": [
"2024"
],
"hmonth": [
"june"
],
"hday": [
"01"
],
"format": "grib",
"bbox": [
11.77115199576009,
44.56907885098417,
13.0263737724595,
45.40384015467251
],
"itemsPerPage": 200,
"startIndex": 0
}'

Using the generate_query_template Function

Alternatively, you can programmatically create a query using the generate_query_template function. This function generates a query template for a specified dataset by pulling information about available parameters, default values, and more from the /queryable endpoint of the HDA service. This is useful for customizing your search within R, without needing to manually copy and paste JSON queries.

Generating a Query Template

Here's an example of how to generate a query template for the dataset with the ID EO:EEA:DAT:CLMS_HRVPP_ST:

query_template <-client$generate_query_template("EO:EEA:DAT:CLMS_HRVPP_ST")

query_template

The output will look something like this:

{
"dataset_id": "EO:EEA:DAT:CLMS_HRVPP_ST",
"itemsPerPage": 11,
"startIndex": 0,
"uid": "__### Value of string type with pattern: [\\w-]+",
"productType": "PPI",
"_comment_productType": "One of",
"_values_productType": ["PPI", "QFLAG"],
"platformSerialIdentifier": "S2A, S2B",
"_comment_platformSerialIdentifier": "One of",
"_values_platformSerialIdentifier": ["S2A, S2B"],
"tileId": "__### Value of string type with pattern: [\\w-]+",
"productVersion": "__### Value of string type with pattern: [\\w-]+",
"resolution": "10",
"_comment_resolution": "One of",
"_values_resolution": ["10"],
"processingDate": "__### Value of string type with format: date-time",
"start": "__### Value of string type with format: date-time",
"end": "__### Value of string type with format: date-time",
"bbox": [-180, -90, 180, 90]
}

This generated template contains placeholders for the parameters you can adjust based on your specific query needs. It also provides default values and options for parameters like productType, resolution, and bbox (bounding box).

Modify and use the generated Query Template

You can and should customize the generated query template to match your specific needs.

  • Fields that start with __### are placeholders indicating that they require user input. If these placeholders are not replaced, they will be automatically removed before the query is sent to the HDA service.

  • Similarly, fields with the prefix _comment_ provide useful information about the corresponding field, such as valid values, format, or data patterns. These comment fields will also be removed before the query is submitted.

    Placeholders are used when it is not possible to derive a value from the metadata, while comment fields appear when a value is already defined, offering additional context for customizing the query.

  • Additionally, fields prefixed with _values_ list all possible values for a given field. This allows you to programmatically reference them in your code, making customization easier and ensuring that you are using valid options when configuring the query.

To modify the query, you can transform the JSON template into an R list using the jsonlite::fromJSON() function:

library(jsonlite)
query_template <- fromJSON(query_template, flatten = FALSE)
query_template

Here is the output of the query template for the dataset with the ID "EO:EEA:DAT:CLMS_HRVPP_ST"

$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"

$itemsPerPage
[1] 11

$startIndex
[1] 0

$uid
[1]"__### Value of string type with pattern: [\\w-]+"

$productType
[1] "PPI"

$_comment_productType
[1]"One of"

$_values_productType
[1] "PPI", "QFLAG"

$platformSerialIdentifier
[1] "S2A, S2B"

$_comment_platformSerialIdentifier
[1] "One of"

$_values_platformSerialIdentifier
[1] "S2A, S2B"

$tileId
[1] "__### Value of string type with pattern: [\\w-]+"

$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"

$resolution
[1] "10"

$_comment_resolution
[1] "One of"

$_values_resolution
[1] "10"

$processingDate
[1] "__### Value of string type with format: date-time"

$start
[1] "__### Value of string type with format: date-time"

$end
[1] "__### Value of string type with format: date-time"

$bbox
[1] -180, -90, 180, 90

How to use the query template in a search:

After transforming the JSON template into an R list, you can modify it as needed. For example, you might want to set a new bounding box (bbox) or limit the time range (start and end):

# Set a new bounding box
query_template$bbox <- c(11.1090, 46.6210, 11.2090, 46.7210)

# Limit the time range
query_template$start <- "2018-03-01T00:00:00.000Z"
query_template$end <- "2018-05-31T00:00:00.000Z"
query_template

This will update the template as follows:

$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"

$itemsPerPage
[1] 11

$startIndex
[1] 0

$uid
[1] "__### Value of string type with pattern: [\\w-]+"

$productType
[1] "PPI"

$_comment_productType
[1] "One of"

$_values_productType
[1] "PPI", "QFLAG"

$platformSerialIdentifier
[1] "S2A, S2B"

$_comment_platformSerialIdentifier
[1] "One of"

$_values_platformSerialIdentifier
[1] "S2A, S2B"

$tileId
[1] "__### Value of string type with pattern: [\\w-]+"

$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"

$resolution
[1] "10"

$_comment_resolution
[1] "One of"

$_values_resolution
[1] "10"

$processingDate
[1] "__### Value of string type with format: date-time"

$start
[1] "2018-03-01T00:00:00.000Z"

$end
[1] "2018-05-31T00:00:00.000Z"

$bbox
[1] 11.109, 46.621, 11.209, 46.721

Once you've made the necessary modifications, convert the list back to JSON format using the jsonlite::toJSON() function. It's important to use the auto_unbox = TRUE flag when converting, as this ensures that the JSON is formatted correctly, particularly for single-element arrays:

# Convert back to JSON format
query_template <- toJSON(query_template, auto_unbox = TRUE, digits = 17)
# don't forget to put auto_unbox = TRUE

This approach maintains the correct formatting of the query, making it ready for submission to the HDA service.

Searching and Downloading Data


To search for data in the HDA service, you can use the search function provided by the Client class. This function allows you to search for datasets based on a query. The search results can then be downloaded using the download method of the SearchResults class.

Searching for Data

The search function takes a query and an optional limit parameter, which specifies the maximum number of results you want to retrieve. It only performs the search and does not download the data. The output of this function is an instance of the SearchResults class.

Here’s an example of how to search for data using a query and limit the results:

# Assuming 'client' is already created and authenticated, 'query_template' is defined
matches <- client$search(query_template)

# Output
[1] "Found 9 files"
[1] "Total Size 1.8 GB"

# Display the IDs of the search results
sapply(matches$results, FUN = function(x) { x$id })

This example would return:

[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI"

This shows that the search returned 9 files with a total size of 1.8 GB, and the IDs of each file are listed.

Downloading the Files

The SearchResults class contains a public field results and a method called download that is responsible for downloading the search results. The download() function requires an output directory, which will be created if it doesn’t already exist. It also includes an optional force parameter:

  • When force = TRUE, the function will re-download the files even if they already exist in the specified directory, overwriting the existing files.

  • When force = FALSE (the default), the function will skip downloading files that already exist, saving both time and bandwidth.

Here is an example of how to download the files:

# Assuming 'matches' is an instance of SearchResults obtained from the search 
odir <- tempdir()
matches$download(odir)

When you run this, you will see output indicating the download progress:

The total size is 1.8 GB . Do you want to proceed? (Y/N):y
[1] "[Download] Start"
[1] "[Download] Downloading file 1/9"
[1] "[Download] Downloading file 2/9"
[1] "[Download] Downloading file 3/9"
[1] "[Download] Downloading file 4/9"
[1] "[Download] Downloading file 5/9"
[1] "[Download] Downloading file 6/9"
[1] "[Download] Downloading file 7/9"
[1] "[Download] Downloading file 8/9"
[1] "[Download] Downloading file 9/9"
[1] "[Download] DONE"

Once the download is complete, you can list the downloaded files:

list.files(odir)

This will return something like:

[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI.tif" 
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI.tif"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI.tif"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI.tif"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI.tif"

After you are done with the files, you can clean up by deleting the directory and its contents:

unlink(odir, recursive = TRUE)

This command deletes the directory odir and all files within it, ensuring no temporary files remain. The recursive = TRUE option ensures that all subdirectories and files inside odir are also removed.

And voilàààà!

You have downloaded data with the hdar R package😊

📌Note on R package Updates:

The R packages installed in WEkEO JupyterHub are managed centrally and cannot be updated by individual users. This ensures compatibility and stability within the environment.

If you require a specific version of an R package or need additional R packages that are not currently available, please don’t hesitate to contact WEkEO User Support. You can submit a request, and our team will review and consider adding it for you.

What's next?


We are user-driven and we implement users' suggestions, so feel free to contact us:

  • through a chat session available in the bottom right corner of the page

  • via e-mail to our support team (supportATwekeo.eu)

Regardless of how you choose to contact us, you will first be put in touch with our AI Agent Neo. At any time, you can reach a member of the WEkEO User Support team by clicking on "talk to a person" via chat, or by naturally requesting it in reply to Neo's email.

Did this answer your question?