Project

General

Profile

Actions

ESGFClient

Overview

ESGFClient is a Command Line tool written in C# able to download products from the "ESGF Data Search" getting as input the location of a RDF file or a list of single product's urls.

Installation

The user can install the ESGFClient following these easy steps:

yum install esgf-tools

and test the installation with

ESGFClient --help

How it works

ESGFClient is designed to perform data search and access from the Earth System Grid Federation.

The first task includes querying the GEOWOW Terradue ESGF Catalogue at http://geowow.terradue.com/catalogue/esgf/rdf .
The query string is directly passed to the ESGFClient through the parameter --uri.

Example:

http://geowow.terradue.com/catalogue/esgf/thetao/rdf?time_frequency=mon&experiment=rcp85&ensemble=r1i1p1&institute=MRI&count=1

As you can see opening the link with a web browser, this query returns rdf metadata about the subset of data filtered by using the given parameters: time_frequency, experiment, ensemble and institute...
The ESGFClient parses the response and retrieves the list of OPeNDAP online resources

Example:

http://dapp2p.cccma.ec.gc.ca/thredds/dodsC/cmip5.output1.CCCma.CanESM2.rcp85.mon.ocean.Omon.r1i1p1.thetao.20120407.aggregation

Starting from this list, the ESGFClient build a new list of query strings by adding the rest of the options (such as --dtastart, --zmax etc.).
Since the OPeNDAP needs "indexes" rather than absolute values, the ESGFClient converts the options.
Finally the list is "injected" in a revised version of the wget script provided by the ESGF Portal and the wget itself is launched.

Usage

These are the options given:

Options:
  -u, --uri=VALUE            RDF's URI to parse and download
  -o, --output=VALUE         Output folder
  -r, --resource=VALUE       Resource to download
  -O, --openid=VALUE         OpenId used to access.
  -p, --password=VALUE       Password for OpenId used to access.
  -s, --dtstart=VALUE        The beginning of the time query  to restrict
                               the subset of data to download. YYYY-MM-
                               DDTHH:mm:ssZ.
  -e, --dtend=VALUE          The end of the time query  to restrict the
                               subset of data to download. YYYY-MM-DDTHH:mm:ssZ.
  --zM, --zmax=VALUE         Maximum level (z)
  --zm, --zmin=VALUE         Minimum level (z). By default it's equal to 0
  -h, --help                 Show this message and exit 

Download from rdf url

The Client allows the user to download only the OPeNDAP online resources.
It tries to retrieve from the RDF file all the parameters needed for a query: a time start and a time stop (for temporal queries), the level range (z dimension if it's provided by the variable) and the variable to query.

Example:

ESGFClient -u "http://geowow.terradue.com/catalogue/esgf/thetao/rdf?time_frequency=mon&experiment=rcp85&ensemble=r1i1p1&institute=MRI&count=1" -O "https://pcmdi9.llnl.gov/esgf-idp/openid/user" -p password -s 2006-01-17 -e 2008-12-16 -zmin 500 -zmax 700 -o "./tmp/" 

In order to build these correct OPeNDAP query, the time search and level parameters are converted to a set of indexes needed to make a spatial and temporal query.

The download is based on a revised wget script built upon a template called wgetTemplate.sh, filled with the URLs to be gotten.

Then the client run the WGET script, that saves the files into the file system.

All these files are protected by OpenID, so the user has to give his credentials by specifying the "--openid" option an --password. Conversely the security certificates are automatically downloaded by the WGET script.

After the OPeNDAP files (.dods, .das, .lev, .lat etc..) are downloaded, an internal script converts the .dods Binary Format file to a NetCDF file, according to the .das Dataset Attribute Structure description file. Once the NetDCF file is generated, the script also deletes the downloaded temporary files.

For more details on the OPeNDAP formats, refer to http://docs.opendap.org/index.php/UserGuideOPeNDAPMessages

OPeNDAP limits

Since this version of the ESGFClient is exclusively based on the OPeNDAP servers, there is a size limit for every download. With the ESGF federation, this limit is set to the default value (500 MBytes), and if the query reaches this limit, the server returns a 403 error and the download fails. You can deal with this limit by setting the right temporal and level slices of your sub-datasets.

Updated by Francesco Barchetta almost 11 years ago ยท 7 revisions