Context

In some cases, you may want to further refine the results returned by your query. This can be useful when the available query parameters are not sufficient to isolate the desired files, when current subset methods do not work as expected, or simply when you prefer to apply a custom filtering approach.

The examples below show how to filter results based on a specific string in the file name or identifier, using Python.

Step by step Guide

In a nutshell, we will set up the WEkEO credentials, define the query, run the search, filter the results, and download the matching files.

💡WEkEO Pro Tip: if you are new to using the HDA API in Python, please refer first to our introductory article to learn all the basic information about downloading data via HDA in Python.

Step 1. Configure HDA access credentials

First, set up your WEkEO credentials in order to connect to the HDA API. Replace YOUR_USERNAME and YOUR_PASSWORD in the code below with your own credentials:

from hda import Client, Configuration

# Replace with your own WEkEO credentials
conf = Configuration(user="YOUR_USERNAME", password="YOUR_PASSWORD")
hda_client = Client(config=conf)

Step 2. Define the JSON query

You can now call the .json request of your choice (see how to get the query):

query = {
    "dataset_id": "EO:CLMS:DAT:CLMS_GLOBAL_LAI_300M_V1_10DAILY_NETCDF",
    "productType": "LAI300",
    "resolution": "300",
    "bbox": [24.8461, 60.1964, 24.9771, 60.2175],
    "startdate": "2023-09-01T00:00:00.000Z",
    "enddate": "2023-09-30T23:59:59.999Z",
    "itemsPerPage": 200,
    "startIndex": 0
}

Step 3. Run the search

Run the query to retrieve the list of matching files, and optionally preview their IDs:

matches = hda_client.search(query)

# (Optional) List the results
print(matches)
for item in matches.results:
    print(item['id'])

Step 4. Filter by a string from the ID and download matching files

Define the following parameters to match your needs:

pattern: substring to look for in the file IDs
download_dir: folder where the matching files will be downloaded

Finally, run the following code to loop through the search results and download only files whose IDs contain the target string:

# --- Parameters to update ---
pattern = "RT6_202309"           # String to search for in file IDs
download_dir = "C:/path"         # Folder where matching files will be saved

# --- Filter and download ---
for index, item in enumerate(matches.results):
    # Get the ID of the item
    item_id = item['id']
    
    # Check if the pattern is present in the ID
    if pattern in item_id:
        print(f"Downloading item with ID: {item_id}")
        
        # Download the item
        matches[index].download(download_dir=download_dir)

And voilàààà!

This way, only the files matching your pattern will be downloaded, skipping all others! 🙌

What's next?

Additional resources can be found in our Help Center. Should you require further assistance or wish to provide feedback, feel free to contact us through a chat session available in the bottom right corner of the page.

How to use the HDA API in Python?

How to use the Harmonized Data Access REST API with cURL?

How to use the hdar package for accessing the WEkEO HDA API in R?

WEkEO EOCanvas: Serverless Functions for Copernicus Data

Filter and download WEkEO data by file ID pattern in R

Filter and download WEkEO data by file ID pattern in Python