Context

WEkEO provides access to a wide range of data, which can be downloaded using the Harmonized Data Access (HDA) service. For more detailed information, please refer to the article What is the Harmonized Data Access (HDA) API?

The HDA API got a Python Client that will help you to download and process quickly needed data. Let's see now how to use the HDA API with Python! 🙌

💡WEkEO Pro Tip: to utilize the HDA service and library, you must first register for a WEkEO account.

Step by step Guide

You can follow along the steps of the article to download data in this notebook:

Download Notebook Example

Official documentation

For further information, please visit the WEkEO HDA API documentation.

Step 1. Install the latest version of `hda`

Run the following command in a terminal in order to install the latest version of HDA:

Using pip:
```
pip install hda -U 
```
Using Mamba|Conda (you can replace mamba by conda):
```
mamba install conda-forge::hda
```

💡WEkEO Pro Tip: all required packages, including the hda library, are already installed in the default environment wekeolab of the WEkEO JupyterHub!

Step 2. Import `hda` module

In a Python script, import the needed functions from the hda module:

from hda import Client, Configuration

Step 3. Configure credentials and load `hda` Client

Before downloading using the HDA, you need to configure your WEkEO credentials and load the hda Client. There are two methods to do so.

Method 1 (occasional users)

This first method is straightforward by indicating your WEkEO credentials directly in the Python script, via the Configuration function:

# Configure user's credentials without a .hdarc
conf = Configuration(user = "username", password = "password")
hda_client = Client(config = conf)

Method 2 (regular users)

This second method allows to create a configuration file (.hdarc) to avoid having to specify the credentials everytime, unlike the first method. Simply run in a Python script:

from pathlib import Path

# Default location expected by hda package
hdarc = Path(Path.home() / '.hdarc')

# Create it only if it does not already exists
if not hdarc.is_file():
    import getpass
    USERNAME = input('Enter your username: ')
    PASSWORD = getpass.getpass('Enter your password: ')

    with open(Path.home() / '.hdarc', 'w') as f:
        f.write(f'user:{USERNAME}\n')
        f.write(f'password:{PASSWORD}\n')

hda_client = Client()

💡 WEkEO Pro Tip: this method needs to be done only once. Future calls hda_client = Client() will always retrieve credentials from the created file.

⚠️ If you created a .hdarc file before March 2024, you need to remove the url indicated in it.

Step 4. Create the request

You can now call the .json request (see how to get the query):

# The JSON query loaded in the "query" variable
query = {
  "dataset_id": "EO:EEA:DAT:CLMS_HRVPP_VPP",
  "productType": "TPROD",
  "productGroupId": "s1",
  "bbox": [
    -9.53592042,
    42.46825465,
    -7.0363102799999995,
    43.99700636
  ],
  "startdate": "2020-01-01T00:00:00.000Z",
  "enddate": "2021-01-01T23:59:59.999Z",
  "itemsPerPage": 200,
  "startIndex": 0
}

# Ask the result for the query passed in parameter
matches = hda_client.search(query)

# List the results
print(matches)

The JSON request loaded in the query variable specifies the parameters for searching the dataset. The hda_client.search(query) function is used to perform the search based on these parameters, and the results are stored in the matches variable.

Step 5. Download data

To download all the matching files, simply run this command:

# Download results in a directory (e.g. '/tmp')
matches.download(download_dir="/tmp")

The download operation is automatic and performs a batch download of all specified matches, saving them in the specified directory (/tmp in our example). This ensures all relevant data is downloaded in one operation without needing to initiate each download individually.

💡WEkEO Pro Tip: the code above will download all the results, which can correspond to a lot of files and/or heavy data. Please check the Advanced functions to learn how to reduce the download to specific files.

📌Note: You also have the possibility to download data in a bucket, but first you need to upgrade your plan to get a tenant.

⚠️ There is a limitation of Request and Orders. More details in this article.

Advanced functions

Filter the results

It is possible to choose the files to download by slicing the matches object:

matches.download(): download all results
matches[0].download(): download only the first result
matches[-1].download(): download only the last result
matches[:10].download(): download only the first 10 results

Display names of files to be downloaded

You can browse the matches object and display names of files via their id:

for item in matches.results:
    print(item['id'])

Display information of a dataset

Use the dataset() function with the dataset_id of your choice to display information of the dataset:

hda_client.dataset('EO:EEA:DAT:CLMS_HRVPP_VPP')

It returns a JSON object that includes the abstract, the dataset_id and other properties of the given dataset.

Display list of available datasets

Use the datasets(limit=None) function to display the list of available datasets. Specify a limit to control the number of datasets displayed, or leave it empty to show all:

hda_client.datasets(3)

The line above returns a list of 3 available datasets, each represented as a JSON object containing the abstract, dataset_id and other properties.

Limit number of results returned by `search`

You can limit the number of results returned by the search() function:

matches = hda_client.search(query,3)

From the line above, the number of items returned in matches is limited to 3.

And tadaaaaa! 😃

Now you know how to use the HDA in Python to download WEkEO data!

What's next?

Feel free to check these articles that might be of interest for you:

Additional resources can be found in our Help Center. Should you require further assistance or wish to provide feedback, feel free to contact us through a chat session available in the bottom right corner of the page.

How to create a custom environment on WEkEO JupyterHub?

HDA API errors

How to use the Harmonized Data Access REST API with cURL?

HDA API request gives no results: what should I do?

How to use the hdar package for accessing the WEkEO HDA API in R?