Skip to main content

Filter and download WEkEO data by file ID pattern in R

This article shows how to filter HDA query results to download only files whose ID contains a specific string, in R.

Alexandre avatar
Written by Alexandre
Updated yesterday

Context


In some cases, you may want to further refine the results returned by your query. This can be useful when the available query parameters are not sufficient to isolate the desired files, when current subset methods do not work as expected, or simply when you prefer to apply a custom filtering approach.

The examples below show how to filter results based on a specific string in the file name or identifier, using R.

Step by step Guide


In a nutshell, we will set up the WEkEO credentials, define the query, run the search, filter the results both by a target string and by removing files already present in the download folder, and download the remaining files.

💡WEkEO Pro Tip: if you are new to using the HDA API in R, please refer first to our introductory article to learn all the basic information about downloading data via HDA in R.

Step 1. Install and load required packages

First, let's start by loading the R packages needed to interact with the HDA API:

# Installation
library(hdar)
library(jsonlite)

Step 2. Configure HDA access credentials

Then set up your WEkEO credentials in order to connect to the HDA API. Replace YOUR_USERNAME and YOUR_PASSWORD in the code below with your own credentials:

# Authentication
username <- "YOUR_USERNAME"
password <- "YOUR_PASSWORD"
client <- Client$new(username, password, save_credentials = TRUE)

Step 3. Create a function to filter by a target string in file IDs

Let's define a function to keep only search results whose id contains a given string, ignoring case if needed:

# Filter search results by pattern(s) in file ID
filter_id_contains <- function(search_results, patterns, ignore_case = TRUE) {
total_files <- length(search_results$results)
if (total_files == 0) {
cat("No results to filter.\n")
return(search_results)
}

patterns <- as.character(patterns)
if (ignore_case) patterns <- tolower(patterns)

keep <- vapply(
search_results$results,
function(x) {
id <- x$id
if (ignore_case) id <- tolower(id)
any(vapply(patterns, function(p) grepl(p, id, fixed = TRUE), logical(1)))
},
logical(1)
)

kept <- sum(keep)
search_results$results <- search_results$results[keep]
cat("Kept", kept, "of", total_files, "files matching pattern(s):", paste(patterns, collapse = ", "), "\n")
return(search_results)
}

This function allows to narrow down the list of products to only those you are interested in, before downloading.

Step 4. Define the search query

You can now call the .json request of your choice (see how to get the query):

# Define query
query_template <- '{
"dataset_id": "EO:CLMS:DAT:CLMS_GLOBAL_LAI_300M_V1_10DAILY_NETCDF",
"productType": "LAI300",
"resolution": "300",
"bbox": [
24.8461,
60.1964,
24.9771,
60.2175
],
"startdate": "2023-09-01T00:00:00.000Z",
"enddate": "2023-09-30T23:59:59.999Z",
"itemsPerPage": 200,
"startIndex": 0
}'

Step 5. Run the search

Run the query to retrieve the list of matching files:

matches <- client$search(query_template)

Step 6. Filter and download matching files

Define the following parameters to match your needs:

  • output_directory: folder where the matching files will be saved

  • pattern: string or list of strings to look for in the file IDs

Finally, run the following code to download only files whose IDs contain the target string:

# Parameters
output_directory <- "Download"
pattern <- "RT6_202309" # Example: filter by this substring in the file ID

# Filter by pattern and download
matches_filtered <- filter_id_contains(matches, pattern, ignore_case = TRUE)
matches_filtered$download(output_directory, prompt = FALSE)

And voilàààà!

All matching files are now downloaded directly to the folder you specified! 🙌

What's next?


Additional resources can be found in our Help Center. Should you require further assistance or wish to provide feedback, feel free to contact us through a chat session available in the bottom right corner of the page.

Did this answer your question?