Context

WEkEO provides access to a wide range of data, which can be downloaded using the Harmonized Data Access (HDA) service. For more detailed information, please refer to the article What is the Harmonized Data Access (HDA) API?

The HDA API got a Python Client that will help you to download and process quickly needed data. Let's see now how to use the HDA API with Python! 🙌

How to use the HDA API in Python?

You can follow along the steps of the article to download data in this notebook:

Download Notebook Example

Official documentation

For further information, please visit the WEkEO HDA API documentation.

Step 1. Install the latest version of `hda`

Run the following command in a terminal in order to install the latest version of HDA:

using pip:
```
pip install hda -U 
```
using Mamba (you can replace mamba by conda):
```
mamba install conda-forge::hda
```

💡WEkEO Pro Tip: all the required packages, including the hda library, are already installed in the default environment miniwekeolab of the WEkEO JupyterHub.

Step 2. Import `hda` module

In a Python script import the needed functions:

from hda import Client, Configuration

Step 3. Configure credentials and load `hda` Client

Afterwards, we must configure user's credentials and load the hda Client.

Let's see most used methods:

Method 1 (not regular users)

Configuration of WEkEO credentials directly in the Python script:

# Configure user's credentials without a .hdarc
conf = Configuration(user = "username", password = "password")
hda_client = Client(config = conf)

Method 2 (regular users)

Create the .hdarc configuration file as follows in the Python script:

from pathlib import Path

# Default location expected by hda package
hdarc = Path(Path.home() / '.hdarc')

# Create it only if it does not already exists
if not hdarc.is_file():
    import getpass
    USERNAME = input('Enter your username: ')
    PASSWORD = getpass.getpass('Enter your password: ')

    with open(Path.home() / '.hdarc', 'w') as f:
        f.write(f'user:{USERNAME}\n')
        f.write(f'password:{PASSWORD}\n')

hda_client = Client()

⚠️ This method needs to be done only once. Future calls hda_client = Client() will always retrieve credentials from created file.

📌Note: be careful, if you created a .hdarc before March 2024, you'll need remove the url indicated in it.

Step 4. Create the request and download data

We can now call the .json request (see how to get the query):

# The JSON query loaded in the "query" variable
query = {
  "dataset_id": "EO:EEA:DAT:CLMS_HRVPP_VPP",
  "productType": "TPROD",
  "productGroupId": "s1",
  "start": "2020-01-01T00:00:00.000Z",
  "end": "2021-01-01T00:00:00.000Z",
  "bbox": [
    -9.53592042,
    42.46825465,
    -7.0363102799999995,
    43.99700636
  ]
}

# Ask the result for the query passed in parameter
matches = hda_client.search(query)

# List the results
print(matches)

The JSON request loaded in the query variable specifies the parameters for searching the dataset. The hda_client.search(query) function is used to perform the search based on these parameters, and the results are stored in the matches variable.

To download the results, the following command is used:

# Download results in a directory (e.g. '/tmp')
matches[-1].download(download_dir="/tmp")

For this example, we will fetch the last result.

The download operation is automatic and performs a batch download of all specified matches, saving them in the specified directory (/tmp). This ensures all relevant data is downloaded in one operation without needing to initiate each download individually.

💡WEkEO Pro Tip: the code above will download the last result in matches, but it is also possible to easily customize the files to download by slicing the matches object:

matches.download() # Will download all results

matches[0].download() # Will only download the first result

matches[-1].download() # Will only download the last result

matches[:10].download() # Will only download the first 10 results

📌Note: You also have the possibility to download them in a bucket, but first you need to upgrade your plan to get a tenant.

⚠️ There is a limitation of Request and Orders. More details in this article.

`hda` functions

Display names of files to be downloaded
You can browse the matches object and display names of files via their id:
```
for item in matches.results:
    print(item['id'])
```
Display information of a dataset
Use the dataset() function with the dataset_id of your choice to display its information:
```
hda_client.dataset('EO:EEA:DAT:CLMS_HRVPP_VPP')
```
It returns a JSON object that includes the abstract, the dataset_id and other properties of the given dataset.
Display list of available datasets
Use the datasets(limit=None) function to display the list of available datasets. Specify a limit to control the number of datasets displayed, or leave it empty to show all:
```
hda_client.datasets(3)
```
The line above returns a list of 3 available datasets, each represented as a JSON object containing the abstract, dataset_id and other properties.
Limit number of results returned by the search function
You can limit the number of results returned by the search() function as follows:
```
matches = hda_client.search(query,3)
```
From the line above, the number of items returned in matches is limited to 3.