Context

The hdar R package provides seamless access to the WEkEO Harmonised Data Access (HDA) API, enabling users to programmatically query and download data from R. You will find more information about the HDA in this article.

Without any further ado, let's see how to use the HDA API with R! 🙌

💡WEkEO Pro Tip: to utilize the HDA service and library, you must first register for a WEkEO account for free.

Installation

First of all, you need to install and load the hdar package into your R environment. The stable version is available on CRAN and can be installed with the following command:

if(!require("hdar")){install.packages("hdar")}

Next, load the package into your R session:

library(hdar)

💡WEkEO Pro Tip: if you want to access the development version of the package, you can install it directly from GitHub. Make sure you have the devtools package installed, then use the following command:

devtools::install_github("eea/hdar@develop")

Authentication

To interact with the HDA service, you need to authenticate by providing your WEkEO username and password. The Client allows you to pass these credentials directly and optionally save them to a configuration file for future use. If credentials are not specified as parameters, the Client will read them from the ~/.hdarc file.

Step 1. Create and authenticate through the Client

First, you need to set up your WEkEO credentials via one of the following methods.

Method 1 (occasional users)

You can pass directly the username and password in the Client as follows:

username <- "your_username"
password <-"your_password"
client <- Client$new(username, password)

Method 2 (regular users)

You can save your credentials for future use in the ~/.hdarc file:

username <- "your_username"
password <-"your_password"
client <- Client$new(username, password, save_credentials = TRUE)

Once the credentials are saved, you can initialize the Client without passing the credentials. The Client will read the credentials from the ~/.hdarc file:

client <- Client$new()

Step 2. Check for authentication

Once the Client is created, you can check if authentication was successful by calling a method that verifies it, for instance:

client$get_token()

If the authentication is successful, it will return a token indicating that you are connected.

Copernicus Terms and Conditions (T&Cs)

Copernicus data is free to use and manipulate, still Terms and Conditions must be accepted in order to download them. The hdar package provides a convenient feature to review and accept or reject the T&Cs for each individual Copernicus service.

To display the T&Cs in your browser and read them, you can do:

client$show_terms()

Once you've reviewed the terms, you can accept or reject individual T&Cs, or all at once, using the following command:

client$terms_and_conditions()

This will display a list of the T&Cs, showing whether they have been accepted (TRUE) or not (FALSE), like the following:

See list of T&Cs not accepted by default

                                               term_id accepted
1                           Copernicus_General_License    FALSE
2                          Copernicus_Sentinel_License    FALSE
3                       EUMETSAT_Core_Products_Licence    FALSE
4                     EUMETSAT_Copernicus_Data_Licence    FALSE
5  Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m    FALSE
6  Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m    FALSE
7                             Copernicus_ECMWF_License    FALSE
8       Copernicus_Land_Monitoring_Service_Data_Policy    FALSE
9            Copernicus_Marine_Service_Product_License    FALSE
10                        CNES_Open_2.0_ETALAB_Licence    FALSE

To accept all T&Cs at once, you can use the following command:

client$terms_and_conditions(term_id = 'all')

This will mark all terms as accepted, as shown in the following output:

See list of T&Cs accepted

                                              term_id accepted
1                           Copernicus_General_License     TRUE
2                          Copernicus_Sentinel_License     TRUE
3                       EUMETSAT_Core_Products_Licence     TRUE
4                     EUMETSAT_Copernicus_Data_Licence     TRUE
5  Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m     TRUE
6  Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m     TRUE
7                             Copernicus_ECMWF_License     TRUE
8       Copernicus_Land_Monitoring_Service_Data_Policy     TRUE
9            Copernicus_Marine_Service_Product_License     TRUE
10                        CNES_Open_2.0_ETALAB_Licence     TRUE

This simplifies the process of agreeing to the required terms so you can proceed with downloading the data.

Find specific datasets

WEkEO provides access to a wide variety of products. To help you find what you need, the Client includes a method called datasets, which lists all available datasets and allows to filter them by a text pattern if desired.

Retrieve available datasets

The basic usage of the datasets method is simple. To retrieve a complete list of all datasets available on WEkEO, which can take about 2 minutes, use the following command:

all_datasets <- client$datasets()

Filter datasets

You can also filter the datasets by providing a text pattern. This is useful when you are looking for datasets that match a specific keyword or phrase:

filtered_datasets <- client$datasets("Seasonal Trajectories")

# To list the dataset IDs from the filtered results:
sapply(filtered_datasets,FUN = function(x){x$dataset_id})

See output displaying datasetIDs

"EO:EEA:DAT:CLMS_HRVPP_VPP-LAEA" "EO:EEA:DAT:CLMS_HRVPP_ST"       "EO:EEA:DAT:CLMS_HRVPP_ST-LAEA" "EO:EEA:DAT:CLMS_HRVPP_VPP"

Similarly, if you are looking for datasets related to "Baltic", you can filter them as follows:

filtered_datasets <- client$datasets("Baltic")

# To list the dataset IDs from the filtered results:
sapply(filtered_datasets, FUN = function(x) { x$dataset_id })

See output of this query

[1] "EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_BGC_003_007:cmems_mod_bal_bgc-pp_anfc_P1D-i_202311" 
[2] "EO:MO:DAT:NWSHELF_MULTIYEAR_PHY_004_009:cmems_mod_nws_phy-sst_my_7km-2D_PT1H-i_202112" 
[3] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_MY_009_134:cmems_obs-oc_bal_bgc-plankton_my_l4-multi-1km_P1M_202211" [4] "EO:MO:DAT:SST_BAL_PHY_SUBSKIN_L4_NRT_010_034:cmems_obs-sst_bal_phy-subskin_nrt_l4_PT1H-m_202211"
[5] "EO:MO:DAT:BALTICSEA_MULTIYEAR_PHY_003_011:cmems_mod_bal_phy_my_P1Y-m_202303"
[6] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_NRT_009_131:cmems_obs-oc_bal_bgc-transp_nrt_l3-olci-300m_P1D_202207"
[7] "EO:MO:DAT:BALTICSEA_MULTIYEAR_BGC_003_012:cmems_mod_bal_bgc_my_P1Y-m_202303"
[8]"EO:MO:DAT:SST_BAL_SST_L4_REP_OBSERVATIONS_010_016:DMI_BAL_SST_L4_REP_OBSERVATIONS_010_016_202012"
[9]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_PHY_003_006:cmems_mod_bal_phy_anfc_PT15M-i_202311"
[10] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_MY_009_133:cmems_obs-oc_bal_bgc-plankton_my_l3-multi-1km_P1D_202207"
[11] "EO:MO:DAT:SST_BAL_PHY_L3S_MY_010_040:cmems_obs-sst_bal_phy_my_l3s_P1D-m_202211"
[12] "EO:MO:DAT:SEAICE_BAL_SEAICE_L4_NRT_OBSERVATIONS_011_004:FMI-BAL-SEAICE_THICK-L4-NRT-OBS"
[13] "EO:MO:DAT:SEAICE_BAL_PHY_L4_MY_011_019:cmems_obs-si_bal_seaice-conc_my_1km_202112"
[14]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_WAV_003_010:cmems_mod_bal_wav_anfc_PT1H-i_202311"
[15] "EO:MO:DAT:BALTICSEA_REANALYSIS_WAV_003_015:dataset-bal-reanalysis-wav-hourly_202003"
[16] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_NRT_009_132:cmems_obs-oc_bal_bgc-plankton_nrt_l4-olci-300m_P1M_202207"
[17] "EO:MO:DAT:SST_BAL_SST_L3S_NRT_OBSERVATIONS_010_032:DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE_201904"

Understand the results

The datasets method returns a list of datasets along with relevant information such as names, descriptions, and other metadata. This provides key details about each dataset, helping users understand its purpose and content.

Here is an example of the output for the dataset EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES:

client$datasets("EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES")

See output result

$terms
"Copernicus_ECMWF_License"

$dataset_id
"EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES"

$title
"Near surface meteorological variables from 1979 to 2019 derived from bias-corrected reanalysis"

$abstract
"This dataset provides bias-corrected reconstruction of near-surface meteorological variables derived from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses (ERA5). It is intended to be used as a meteorological forcing dataset for land surface and hydrological models. \nThe dataset has been obtained using the same methodology used to derive the widely used water, energy and climate change (WATCH) forcing data, and is thus also referred to as WATCH Forcing Data methodology applied to ERA5 (WFDE5). The data are derived from the ERA5 reanalysis product that have been re-gridded to a half-degree resolution. Data have been adjusted using an elevation correction and monthly-scale bias corrections based on Climatic Research Unit (CRU) data (for temperature, diurnal temperature range, cloud-cover, wet days number and precipitation fields) and Global Precipitation Climatology Centre (GPCC) data (for precipitation fields only). Additional corrections are included for varying atmospheric aerosol-loading and separate precipitation gauge observations. For full details please refer to the product user-guide.\nThis dataset was produced on behalf of Copernicus Climate Change Service (C3S) and was generated entirely within the Climate Data Store (CDS) Toolbox. The toolbox source code is provided in the documentation tab.\n\nVariables in the dataset/application are:\nGrid-point altitude, Near-surface air temperature, Near-surface specific humidity, Near-surface wind speed, Rainfall flux, Snowfall flux, Surface air pressure, Surface downwelling longwave radiation, Surface downwelling shortwave radiation"

$doi
NULL

$thumbnails
"https://datastore.copernicus-climate.eu/c3s/published-forms-v2/c3sprod/derived-near-surface-meteorological-variables/overview.jpg"

Create a query template

To search for a specific product, you first need to create a query template. The best way is to use the graphical approach by using the Expert Data Viewer. It allows to visualize data first and build the query without mistakes

Using the Expert Data Viewer

You can use the Expert Data Viewer to visually select the desired dataset and define the search parameters, such as the region of interest, the variable to retrieve, and the time period. Once you have configured your query, you can copy the resulting JSON and use it in your R script (see how to get the query):

query <- '{
  "dataset_id": "EO:ECMWF:DAT:CEMS_GLOFAS_HISTORICAL",
  "system_version": [
    "version_4_0"
  ],
  "hydrological_model": [
    "lisflood"
  ],
  "product_type": [
    "consolidated"
  ],
  "variable": [
    "river_discharge_in_the_last_24_hours"
  ],
  "hyear": [
    "2024"
  ],
  "hmonth": [
    "june"
  ],
  "hday": [
    "01"
  ],
  "format": "grib",
  "bbox": [
    11.77115199576009,
    44.56907885098417,
    13.0263737724595,
    45.40384015467251
  ],
  "itemsPerPage": 200,
  "startIndex": 0
}'

Using the `generate_query_template` function

Alternatively, you can programmatically create a query using the generate_query_template function. This function generates a query template for a specified dataset by pulling information about available parameters, default values, and more from the /queryable endpoint of the HDA service.

This is useful for customizing your search within R, without needing to manually copy/paste JSON queries.

Generate a query template

Here's an example of how to generate a query template for the dataset EO:EEA:DAT:CLMS_HRVPP_ST:

query_template <- client$generate_query_template("EO:EEA:DAT:CLMS_HRVPP_ST")

query_template

See output query

{
  "dataset_id": "EO:EEA:DAT:CLMS_HRVPP_ST",
  "itemsPerPage": 11,
  "startIndex": 0,
  "uid": "__### Value of string type with pattern: [\\w-]+",
  "productType": "PPI",
  "_comment_productType": "One of",
  "_values_productType": ["PPI", "QFLAG"],
  "platformSerialIdentifier": "S2A, S2B",
  "_comment_platformSerialIdentifier": "One of",
  "_values_platformSerialIdentifier": ["S2A, S2B"],
  "tileId": "__### Value of string type with pattern: [\\w-]+",
  "productVersion": "__### Value of string type with pattern: [\\w-]+",
  "resolution": "10",
  "_comment_resolution": "One of",
  "_values_resolution": ["10"],
  "processingDate": "__### Value of string type with format: date-time",
  "start": "__### Value of string type with format: date-time",
  "end": "__### Value of string type with format: date-time",
  "bbox": [-180, -90, 180, 90]
}

This generated template contains placeholders for the parameters you can adjust based on your specific query needs. It also provides default values and options for parameters like productType, resolution, and bbox (bounding box).

Modify and use the generated query template

You can and should customize the generated query template to match your specific needs.

Fields that start with __### are placeholders indicating that they require user input. If these placeholders are not replaced, they will be automatically removed before the query is sent to the HDA service

Similarly, fields with the prefix _comment_ provide useful information about the corresponding field, such as valid values, format, or data patterns. These comment fields will also be removed before the query is submitted

Placeholders are used when it is not possible to derive a value from the metadata, while comment fields appear when a value is already defined, offering additional context for customizing the query.

Additionally, fields prefixed with _values_ list all possible values for a given field. This allows to programmatically reference them in the code, making customization easier and ensuring that you are using valid options when configuring the query

To modify the query, you can transform the JSON template into an R list using the jsonlite::fromJSON() function:

library(jsonlite)
query_template <- fromJSON(query_template, flatten = FALSE)
query_template

See output query template for EO:EEA:DAT:CLMS_HRVPP_ST

$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"

$itemsPerPage
[1] 11

$startIndex
[1] 0

$uid
[1]"__### Value of string type with pattern: [\\w-]+"

$productType
[1] "PPI"

$_comment_productType
[1]"One of"

$_values_productType
[1] "PPI", "QFLAG"

$platformSerialIdentifier
[1] "S2A, S2B"

$_comment_platformSerialIdentifier
[1] "One of"

$_values_platformSerialIdentifier
[1] "S2A, S2B"

$tileId
[1] "__### Value of string type with pattern: [\\w-]+"

$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"

$resolution
[1] "10"

$_comment_resolution
[1] "One of"

$_values_resolution
[1] "10"

$processingDate
[1] "__### Value of string type with format: date-time"

$start
[1] "__### Value of string type with format: date-time"

$end
[1] "__### Value of string type with format: date-time"

$bbox
[1] -180, -90, 180, 90

How to use the query template in a search

After transforming the JSON template into an R list, you can modify it as needed. For example, you might want to set a new bounding box (bbox) or limit the time range (start and end):

# Set a new bounding box
query_template$bbox <- c(11.1090, 46.6210, 11.2090, 46.7210)

# Limit the time range
query_template$start <- "2018-03-01T00:00:00.000Z"
query_template$end   <- "2018-05-31T00:00:00.000Z"
query_template

See updated template

$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"

$itemsPerPage
[1] 11

$startIndex
[1] 0

$uid
[1] "__### Value of string type with pattern: [\\w-]+"

$productType
[1] "PPI"

$_comment_productType
[1] "One of"

$_values_productType
[1] "PPI", "QFLAG"

$platformSerialIdentifier
[1] "S2A, S2B"

$_comment_platformSerialIdentifier
[1] "One of"

$_values_platformSerialIdentifier
[1] "S2A, S2B"

$tileId
[1] "__### Value of string type with pattern: [\\w-]+"

$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"

$resolution
[1] "10"

$_comment_resolution
[1] "One of"

$_values_resolution
[1] "10"

$processingDate
[1] "__### Value of string type with format: date-time"

$start
[1] "2018-03-01T00:00:00.000Z"

$end
[1] "2018-05-31T00:00:00.000Z"

$bbox
[1] 11.109, 46.621, 11.209, 46.721

Once you've made the necessary modifications, convert the list back to JSON format using the jsonlite::toJSON() function. It's important to use the auto_unbox = TRUE flag when converting, as this ensures that the JSON is formatted correctly, particularly for single-element arrays:

# Convert back to JSON format
query <- toJSON(query, auto_unbox = TRUE)
# don't forget to put auto_unbox = TRUE

This approach maintains the correct formatting of the query, making it ready for submission to the HDA service.

Search and download data

To search for data in the HDA service, you can use the search function provided by the Client class. This function allows to search for datasets based on a query. The search results can then be downloaded using the download method of the SearchResults class.

Search for data

The search function takes a query and an optional limit parameter, which specifies the maximum number of results you want to retrieve. It only performs the search and does not download the data. The output of this function is an instance of the SearchResults class.

Here’s an example of how to search for data using a query and limit the results:

# Assuming 'client' is already created and authenticated
matches <- client$search(query)

# Output
[1] "Found 9 files"
[1] "Total Size 1.8 GB"

# Display the IDs of the search results
sapply(matches$results, FUN = function(x) { x$id })

See output result

[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI"

It shows that the search returned 9 files with a total size of 1.8 GB, and the names of each file are listed.

Download the files

The SearchResults class contains a public field results and a method called download that is responsible for downloading the search results. The download() function requires an output directory, which will be created if it doesn’t already exist. It also includes an optional force parameter:

When force = TRUE, the function will re-download the files even if they already exist in the specified directory, overwriting the existing files
When force = FALSE (default value), the function will skip downloading files that already exist, saving both time and bandwidth

Here is an example of how to download the files:

# Assuming 'matches' is an instance of SearchResults obtained from the search 
output_directory <- tempdir()
matches$download(output_directory)

See output result indicating download progress

The total size is 1.8 GB . Do you want to proceed? (Y/N):y
[1] "[Download] Start"
[1] "[Download] Downloading file 1/9"
[1] "[Download] Downloading file 2/9"
[1] "[Download] Downloading file 3/9"
[1] "[Download] Downloading file 4/9"
[1] "[Download] Downloading file 5/9"
[1] "[Download] Downloading file 6/9"
[1] "[Download] Downloading file 7/9"
[1] "[Download] Downloading file 8/9"
[1] "[Download] Downloading file 9/9"
[1] "[Download] DONE"

Once the download is complete, you can list the downloaded files:

list.files(output_directory)

See output result

[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI.tif" 
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI.tif" 
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI.tif" 
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI.tif" 
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI.tif"

After you are done with the files, you can clean up by deleting the directory and its contents:

unlink(output_directory, recursive = TRUE)

This command deletes the directory output_directory and all files within it, ensuring no temporary files remain. The recursive = TRUE option ensures that all subdirectories and files inside output_directory are also removed.

And voilàààà!

You have downloaded data with the hdar R package! 😊

📌Note: the R packages installed in WEkEO JupyterHub are managed centrally and cannot be updated by individual users. This ensures compatibility and stability within the environment. If you require a specific version of an R package or need additional R packages that are not currently available, please don’t hesitate to contact the WEkEO User Support.

Go further with a ready-to-use notebook

To help you explore and practice using the HDA API with R, we’ve prepared a dedicated R notebook that demonstrates how to query and download data using the hdar package.

You can access this notebook directly on GitHub, or find it pre-installed on the WEkEO JupyterHub under the following path:
public/notebooks/land/R_Land_S2_CLC_download.ipynb

The JupyterHub environment is fully configured and ready to use — no setup required!

What's next?

Additional resources can be found in our Help Center. Should you require further assistance or wish to provide feedback, feel free to contact us through a chat session available in the bottom right corner of the page.

How to download WEkEO data?

HDA API errors

How to use the HDA API in Python?

How to use the Harmonized Data Access REST API with cURL?

HDA API request gives no results: what should I do?