Context
The hdar R package provides seamless access to the WEkEO Harmonised Data Access (HDA) API, enabling users to programmatically query and download data from R. You will find more information about the HDA in this article.
Without any further ado, let's see how to use the HDA API with R! 🙌
💡WEkEO Pro Tip: to utilize the HDA service and library, you must first register for a WEkEO account for free.
Installation
First of all, you need to install and load the hdar
package into your R environment. The stable version is available on CRAN and can be installed with the following command:
if(!require("hdar")){install.packages("hdar")}
Next, load the package into your R session:
library(hdar)
💡WEkEO Pro Tip: If you want to access the development version of the package, you can install it directly from GitHub. Make sure you have the devtools package installed, then use the following command:
devtools::install_github("eea/hdar@develop")
Authentication
To interact with the HDA service, you need to authenticate by providing your WEkEO username
and password
. The Client allows you to pass these credentials directly and optionally save them to a configuration file for future use. If credentials are not specified as parameters, the Client will read them from the ~/.hdarc
file.
Step 1. Create and authenticate through the Client
Next, you need to set up the user credentials. There are two available methods to do this.
Method 1 (occasional users)
Set up WEkEO credentials directly by passing your username
and password
. Here's an example of how to authenticate, with the option to save these credentials for future use in the ~/.hdarc
file:
username <- "your_username"
password <-"your_password"
client <- Client$new(username, password, save_credentials = TRUE)
Method 2 (regular users)
Once the credentials are saved, you can initialize the Client without passing the credentials. The Client will read the credentials from the ~/.hdarc
file:
client <- Client$new()
Step 2. Check for authentication
Once the Client is created, you can check if it has been authenticated properly by calling a method token()
that verifies authentication. For example:
client$get_token()
If the authentication is successful, it will return a token indicating that you are connected.
Copernicus Terms and Conditions (T&Cs)
Copernicus data is free to use and modify, still Terms and Conditions must be accepted in order to download the data. The hdar
package provides a convenient feature to review and accept or reject the T&Cs for each individual Copernicus service.
To display the T&Cs in your browser and read them, you can use the following command:
client$show_terms()
Once you've reviewed the terms, you can accept or reject individual T&Cs, or all at once, using the following command:
client$terms_and_conditions()
This will display a list of the T&Cs, showing whether they have been accepted (TRUE
) or not (FALSE
), like the following:
See list of T&Cs not accepted by default
See list of T&Cs not accepted by default
term_id accepted
1 Copernicus_General_License FALSE
2 Copernicus_Sentinel_License FALSE
3 EUMETSAT_Core_Products_Licence FALSE
4 EUMETSAT_Copernicus_Data_Licence FALSE
5 Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m FALSE
6 Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m FALSE
7 Copernicus_ECMWF_License FALSE
8 Copernicus_Land_Monitoring_Service_Data_Policy FALSE
9 Copernicus_Marine_Service_Product_License FALSE
10 CNES_Open_2.0_ETALAB_Licence FALSE
To accept all T&Cs at once, you can use the following command:
client$terms_and_conditions(term_id = 'all')
This will mark all terms as accepted, as shown in the following output:
See list of T&Cs accepted
See list of T&Cs accepted
term_id accepted
1 Copernicus_General_License TRUE
2 Copernicus_Sentinel_License TRUE
3 EUMETSAT_Core_Products_Licence TRUE
4 EUMETSAT_Copernicus_Data_Licence TRUE
5 Copernicus_DEM_Instance_COP-DEM-GLO-90-F_Global_90m TRUE
6 Copernicus_DEM_Instance_COP-DEM-GLO-30-F_Global_30m TRUE
7 Copernicus_ECMWF_License TRUE
8 Copernicus_Land_Monitoring_Service_Data_Policy TRUE
9 Copernicus_Marine_Service_Product_License TRUE
10 CNES_Open_2.0_ETALAB_Licence TRUE
This simplifies the process of agreeing to the required terms so you can proceed with downloading the data.
Find specific datasets
WEkEO provides access to a wide variety of products. To help you find what you need, the Client includes a method called datasets
, which lists all available datasets and allows to filter them by a text pattern if desired.
Retrieve available datasets
The basic usage of the datasets
method is simple. To retrieve a complete list of all datasets available on WEkEO, which can take about 2 minutes, use the following command:
all_datasets <- client$datasets()
If you want to list all datasetIDs from the filtered datasets, you can do as follows:
sapply(filtered_datasets,FUN = function(x){x$dataset_id})
Filter datasets
You can also filter the datasets by providing a text pattern. This is useful when you are looking for datasets that match a specific keyword or phrase:
filtered_datasets <- client$datasets("Seasonal Trajectories")
# To list the dataset IDs from the filtered results:
sapply(filtered_datasets,FUN = function(x){x$dataset_id})
See output displaying datasetIDs
See output displaying datasetIDs
"EO:EEA:DAT:CLMS_HRVPP_VPP-LAEA" "EO:EEA:DAT:CLMS_HRVPP_ST" "EO:EEA:DAT:CLMS_HRVPP_ST-LAEA" "EO:EEA:DAT:CLMS_HRVPP_VPP"
Similarly, if you are looking for datasets related to "Baltic," you can filter them as follows:
filtered_datasets <- client$datasets("Baltic")
# To list the dataset IDs from the filtered results:
sapply(filtered_datasets, FUN = function(x) { x$dataset_id })
See output of this query
See output of this query
[1] "EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_BGC_003_007:cmems_mod_bal_bgc-pp_anfc_P1D-i_202311"
[2] "EO:MO:DAT:NWSHELF_MULTIYEAR_PHY_004_009:cmems_mod_nws_phy-sst_my_7km-2D_PT1H-i_202112"
[3] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_MY_009_134:cmems_obs-oc_bal_bgc-plankton_my_l4-multi-1km_P1M_202211" [4] "EO:MO:DAT:SST_BAL_PHY_SUBSKIN_L4_NRT_010_034:cmems_obs-sst_bal_phy-subskin_nrt_l4_PT1H-m_202211"
[5] "EO:MO:DAT:BALTICSEA_MULTIYEAR_PHY_003_011:cmems_mod_bal_phy_my_P1Y-m_202303"
[6] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_NRT_009_131:cmems_obs-oc_bal_bgc-transp_nrt_l3-olci-300m_P1D_202207"
[7] "EO:MO:DAT:BALTICSEA_MULTIYEAR_BGC_003_012:cmems_mod_bal_bgc_my_P1Y-m_202303"
[8]"EO:MO:DAT:SST_BAL_SST_L4_REP_OBSERVATIONS_010_016:DMI_BAL_SST_L4_REP_OBSERVATIONS_010_016_202012"
[9]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_PHY_003_006:cmems_mod_bal_phy_anfc_PT15M-i_202311"
[10] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L3_MY_009_133:cmems_obs-oc_bal_bgc-plankton_my_l3-multi-1km_P1D_202207"
[11] "EO:MO:DAT:SST_BAL_PHY_L3S_MY_010_040:cmems_obs-sst_bal_phy_my_l3s_P1D-m_202211"
[12] "EO:MO:DAT:SEAICE_BAL_SEAICE_L4_NRT_OBSERVATIONS_011_004:FMI-BAL-SEAICE_THICK-L4-NRT-OBS"
[13] "EO:MO:DAT:SEAICE_BAL_PHY_L4_MY_011_019:cmems_obs-si_bal_seaice-conc_my_1km_202112"
[14]"EO:MO:DAT:BALTICSEA_ANALYSISFORECAST_WAV_003_010:cmems_mod_bal_wav_anfc_PT1H-i_202311"
[15] "EO:MO:DAT:BALTICSEA_REANALYSIS_WAV_003_015:dataset-bal-reanalysis-wav-hourly_202003"
[16] "EO:MO:DAT:OCEANCOLOUR_BAL_BGC_L4_NRT_009_132:cmems_obs-oc_bal_bgc-plankton_nrt_l4-olci-300m_P1M_202207"
[17] "EO:MO:DAT:SST_BAL_SST_L3S_NRT_OBSERVATIONS_010_032:DMI-BALTIC-SST-L3S-NRT-OBS_FULL_TIME_SERIE_201904"
Understand the results
The datasets
method returns a list of datasets along with relevant information such as names, descriptions, and other metadata. This provides key details about each dataset, helping users understand its purpose and content.
Here is an example of the output for the dataset EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES
:
client$datasets("EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES")
See output result
See output result
$terms
"Copernicus_ECMWF_License"
$dataset_id
"EO:ECMWF:DAT:DERIVED_NEAR_SURFACE_METEOROLOGICAL_VARIABLES"
$title
"Near surface meteorological variables from 1979 to 2019 derived from bias-corrected reanalysis"
$abstract
"This dataset provides bias-corrected reconstruction of near-surface meteorological variables derived from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyses (ERA5). It is intended to be used as a meteorological forcing dataset for land surface and hydrological models. \nThe dataset has been obtained using the same methodology used to derive the widely used water, energy and climate change (WATCH) forcing data, and is thus also referred to as WATCH Forcing Data methodology applied to ERA5 (WFDE5). The data are derived from the ERA5 reanalysis product that have been re-gridded to a half-degree resolution. Data have been adjusted using an elevation correction and monthly-scale bias corrections based on Climatic Research Unit (CRU) data (for temperature, diurnal temperature range, cloud-cover, wet days number and precipitation fields) and Global Precipitation Climatology Centre (GPCC) data (for precipitation fields only). Additional corrections are included for varying atmospheric aerosol-loading and separate precipitation gauge observations. For full details please refer to the product user-guide.\nThis dataset was produced on behalf of Copernicus Climate Change Service (C3S) and was generated entirely within the Climate Data Store (CDS) Toolbox. The toolbox source code is provided in the documentation tab.\n\nVariables in the dataset/application are:\nGrid-point altitude, Near-surface air temperature, Near-surface specific humidity, Near-surface wind speed, Rainfall flux, Snowfall flux, Surface air pressure, Surface downwelling longwave radiation, Surface downwelling shortwave radiation"
$doi
NULL
$thumbnails
"https://datastore.copernicus-climate.eu/c3s/published-forms-v2/c3sprod/derived-near-surface-meteorological-variables/overview.jpg"
Create a query template
To search for a specific product, you first need to create a query template. You can use the WEkEO viewer to define your search parameters and copy/paste the resulting JSON query directly into your script.
Alternatively, you can use the function generate_query_template
to create a query template for a specific dataset. This offers a flexible and programmatic way to define your search parameters within R.
Using the WEkEO Viewer
You can use the WEkEO Viewer to visually select the desired dataset and define the search parameters, such as the region of interest, the variable to retrieve, and the time period. Once you have configured your query, you can copy the resulting JSON and use it in your R script to execute the search (see how to get the query):
See example of query using the Viewer
See example of query using the Viewer
query <- '{
"dataset_id": "EO:ECMWF:DAT:CEMS_GLOFAS_HISTORICAL",
"system_version": [
"version_4_0"
],
"hydrological_model": [
"lisflood"
],
"product_type": [
"consolidated"
],
"variable": [
"river_discharge_in_the_last_24_hours"
],
"hyear": [
"2024"
],
"hmonth": [
"june"
],
"hday": [
"01"
],
"format": "grib",
"bbox": [
11.77115199576009,
44.56907885098417,
13.0263737724595,
45.40384015467251
],
"itemsPerPage": 200,
"startIndex": 0
}'
Using the generate_query_template
function
Alternatively, you can programmatically create a query using the generate_query_template
function. This function generates a query template for a specified dataset by pulling information about available parameters, default values, and more from the /queryable
endpoint of the HDA service. This is useful for customizing your search within R, without needing to manually copy/paste JSON queries.
Generate a query template
Here's an example of how to generate a query template for the dataset EO:EEA:DAT:CLMS_HRVPP_ST
:
query_template <-client$generate_query_template("EO:EEA:DAT:CLMS_HRVPP_ST")
query_template
See output query
See output query
{
"dataset_id": "EO:EEA:DAT:CLMS_HRVPP_ST",
"itemsPerPage": 11,
"startIndex": 0,
"uid": "__### Value of string type with pattern: [\\w-]+",
"productType": "PPI",
"_comment_productType": "One of",
"_values_productType": ["PPI", "QFLAG"],
"platformSerialIdentifier": "S2A, S2B",
"_comment_platformSerialIdentifier": "One of",
"_values_platformSerialIdentifier": ["S2A, S2B"],
"tileId": "__### Value of string type with pattern: [\\w-]+",
"productVersion": "__### Value of string type with pattern: [\\w-]+",
"resolution": "10",
"_comment_resolution": "One of",
"_values_resolution": ["10"],
"processingDate": "__### Value of string type with format: date-time",
"start": "__### Value of string type with format: date-time",
"end": "__### Value of string type with format: date-time",
"bbox": [-180, -90, 180, 90]
}
This generated template contains placeholders for the parameters you can adjust based on your specific query needs. It also provides default values and options for parameters like productType
, resolution
, and bbox
(bounding box).
Modify and use the generated query template
You can and should customize the generated query template to match your specific needs.
Fields that start with
__###
are placeholders indicating that they require user input. If these placeholders are not replaced, they will be automatically removed before the query is sent to the HDA service.
Similarly, fields with the prefix
_comment_
provide useful information about the corresponding field, such as valid values, format, or data patterns. These comment fields will also be removed before the query is submitted.
Placeholders are used when it is not possible to derive a value from the metadata, while comment fields appear when a value is already defined, offering additional context for customizing the query.
Additionally, fields prefixed with
_values_
list all possible values for a given field. This allows to programmatically reference them in the code, making customization easier and ensuring that you are using valid options when configuring the query.
To modify the query, you can transform the JSON template into an R list using the jsonlite::fromJSON()
function:
library(jsonlite)
query_template <- fromJSON(query_template, flatten = FALSE)
query_template
See output query template for EO:EEA:DAT:CLMS_HRVPP_ST
See output query template for EO:EEA:DAT:CLMS_HRVPP_ST
$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"
$itemsPerPage
[1] 11
$startIndex
[1] 0
$uid
[1]"__### Value of string type with pattern: [\\w-]+"
$productType
[1] "PPI"
$_comment_productType
[1]"One of"
$_values_productType
[1] "PPI", "QFLAG"
$platformSerialIdentifier
[1] "S2A, S2B"
$_comment_platformSerialIdentifier
[1] "One of"
$_values_platformSerialIdentifier
[1] "S2A, S2B"
$tileId
[1] "__### Value of string type with pattern: [\\w-]+"
$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"
$resolution
[1] "10"
$_comment_resolution
[1] "One of"
$_values_resolution
[1] "10"
$processingDate
[1] "__### Value of string type with format: date-time"
$start
[1] "__### Value of string type with format: date-time"
$end
[1] "__### Value of string type with format: date-time"
$bbox
[1] -180, -90, 180, 90
How to use the query template in a search
After transforming the JSON template into an R list, you can modify it as needed. For example, you might want to set a new bounding box (bbox
) or limit the time range (start
and end
):
# Set a new bounding box
query_template$bbox <- c(11.1090, 46.6210, 11.2090, 46.7210)
# Limit the time range
query_template$start <- "2018-03-01T00:00:00.000Z"
query_template$end <- "2018-05-31T00:00:00.000Z"
query_template
See updated template
See updated template
$dataset_id
[1] "EO:EEA:DAT:CLMS_HRVPP_ST"
$itemsPerPage
[1] 11
$startIndex
[1] 0
$uid
[1] "__### Value of string type with pattern: [\\w-]+"
$productType
[1] "PPI"
$_comment_productType
[1] "One of"
$_values_productType
[1] "PPI", "QFLAG"
$platformSerialIdentifier
[1] "S2A, S2B"
$_comment_platformSerialIdentifier
[1] "One of"
$_values_platformSerialIdentifier
[1] "S2A, S2B"
$tileId
[1] "__### Value of string type with pattern: [\\w-]+"
$productVersion
[1] "__### Value of string type with pattern: [\\w-]+"
$resolution
[1] "10"
$_comment_resolution
[1] "One of"
$_values_resolution
[1] "10"
$processingDate
[1] "__### Value of string type with format: date-time"
$start
[1] "2018-03-01T00:00:00.000Z"
$end
[1] "2018-05-31T00:00:00.000Z"
$bbox
[1] 11.109, 46.621, 11.209, 46.721
Once you've made the necessary modifications, convert the list back to JSON format using the jsonlite::toJSON()
function. It's important to use the auto_unbox = TRUE
flag when converting, as this ensures that the JSON is formatted correctly, particularly for single-element arrays:
# Convert back to JSON format
query_template <- toJSON(query_template, auto_unbox = TRUE, digits = 17)
# don't forget to put auto_unbox = TRUE
This approach maintains the correct formatting of the query, making it ready for submission to the HDA service.
Search and download data
To search for data in the HDA service, you can use the search
function provided by the Client
class. This function allows to search for datasets based on a query. The search results can then be downloaded using the download
method of the SearchResults
class.
Search for data
The search
function takes a query and an optional limit
parameter, which specifies the maximum number of results you want to retrieve. It only performs the search and does not download the data. The output of this function is an instance of the SearchResults
class.
Here’s an example of how to search for data using a query and limit the results:
# Assuming 'client' is already created and authenticated, 'query_template' is defined
matches <- client$search(query_template)
# Output
[1] "Found 9 files"
[1] "Total Size 1.8 GB"
# Display the IDs of the search results
sapply(matches$results, FUN = function(x) { x$id })
See output result
See output result
[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI"
It shows that the search returned 9 files with a total size of 1.8 GB, and the names of each file are listed.
Download the files
The SearchResults
class contains a public field results
and a method called download
that is responsible for downloading the search results. The download()
function requires an output directory, which will be created if it doesn’t already exist. It also includes an optional force
parameter:
When
force = TRUE
, the function will re-download the files even if they already exist in the specified directory, overwriting the existing filesWhen
force = FALSE
(default value), the function will skip downloading files that already exist, saving both time and bandwidth
Here is an example of how to download the files:
# Assuming 'matches' is an instance of SearchResults obtained from the search
output_directory <- tempdir()
matches$download(output_directory)
See output result indicating download progress
See output result indicating download progress
The total size is 1.8 GB . Do you want to proceed? (Y/N):y
[1] "[Download] Start"
[1] "[Download] Downloading file 1/9"
[1] "[Download] Downloading file 2/9"
[1] "[Download] Downloading file 3/9"
[1] "[Download] Downloading file 4/9"
[1] "[Download] Downloading file 5/9"
[1] "[Download] Downloading file 6/9"
[1] "[Download] Downloading file 7/9"
[1] "[Download] Downloading file 8/9"
[1] "[Download] Downloading file 9/9"
[1] "[Download] DONE"
Once the download is complete, you can list the downloaded files:
list.files(output_directory)
See output result
See output result
[1] "ST_20180301T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180311T000000_S2_T32TPS-010m_V101_PPI.tif"
[3] "ST_20180321T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180401T000000_S2_T32TPS-010m_V101_PPI.tif"
[5] "ST_20180411T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180421T000000_S2_T32TPS-010m_V101_PPI.tif"
[7] "ST_20180501T000000_S2_T32TPS-010m_V101_PPI.tif" "ST_20180511T000000_S2_T32TPS-010m_V101_PPI.tif"
[9] "ST_20180521T000000_S2_T32TPS-010m_V101_PPI.tif"
After you are done with the files, you can clean up by deleting the directory and its contents:
unlink(output_directory, recursive = TRUE)
This command deletes the directory output_directory
and all files within it, ensuring no temporary files remain. The recursive = TRUE
option ensures that all subdirectories and files inside output_directory
are also removed.
And voilàààà!
You have downloaded data with the hdar R package! 😊
📌Note: the R packages installed in WEkEO JupyterHub are managed centrally and cannot be updated by individual users. This ensures compatibility and stability within the environment. If you require a specific version of an R package or need additional R packages that are not currently available, please don’t hesitate to contact the WEkEO User Support.
What's next?
We are user-driven and we implement users' suggestions, so feel free to contact us:
through a chat session available in the bottom right corner of the page
via our contact webpage
via e-mail to our support team (supportATwekeo.eu)
Regardless of how you choose to contact us, you will first be put in touch with our AI Agent Neo. At any time, you can reach a member of the WEkEO User Support team by clicking on "talk to a person" via chat, or by naturally requesting it in reply to Neo's email.