Essential Climate Variable extraction and plotting¶
Using data discovery (ESGF Gateway), access (CAS & ESGFClient), processing (Combiner) and visualization (UVCDAT).
Overview¶
In this example three NetCDF files are opened and the "Sea Surface Height above Geoid" (zos) variable is extracted and plotted overlapped. The result image (png) files, called like the bounding box used, are saved waiting to be published from the ciop framework. More than a bounding box can be passed to this service in order to have multiple png plot files.
Three jobs are for three steps: data access, combiner and zos plot.
System requirements¶
ESGFClient is needed to perform the download of the files and UV-CDAT for the data processing.
Data Access: ESGFClient¶
As input a text file containing three urls is given. These resources come from the use of the OpenSearch interface to the ESGF Data, exploiting the ESGFGateway. At the end of the download, every NetCDF file is uploaded on hdfs with the utility ciop-publish.
Combiner¶
Since the ESGFClient is run in parallel due the work of Hadoop framework, all the results hdfs locations need to be listed on one single file and rearranged with the bounding boxes coming as user parameter. This is the work of the combiner and this is the content of the result file:
141,149,13,21;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
140,148,12,20;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
Data Processing: zosplot¶
The data processing step of the workflow consists of a python script that exploits the UV-CDAT libraries. The output from this algorithm is a png file (one for every bounding box) representing the variable zos (Sea Surface Height above Geoid) plotted for three institutes and overlapped.
Description of File System¶
Here is the directory tree of our Sandbox:
/application/
|__ application.xml
|__ combiner
| |__ run
|__ dataaccess
| |__ run
|__ inputparams
|__ share
| |__ uvcdat
| |__ 1.2.0
| |__ bin
| |__ .
| |__ .
| |__ python
| |__ Externals
| |__ include
| |__ lib
| |__ man
| |__ sample_data
| |__ share
|__ zosplot
|__ etc
| |__ zos_plot_params.py
|__ data
| |__ zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc
| |__ zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc
| |__ zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc
|__ run
- application.xml¶
In this xml file we keep all the job templates for every job to be executed, and some sample workflows in order to test the frameworks.
Here we have only one job called zosplot and this is the actual application.xml:
<application id="cdat">
<jobTemplates>
<jobTemplate id="dataaccess">
<streamingExecutable>/application/dataaccess/run</streamingExecutable>
<defaultParameters>
<parameter id="openid"/>
<parameter id="openpass"/>
<parameter id="esgfclient"/>
</defaultParameters>
</jobTemplate>
<jobTemplate id="combiner">
<streamingExecutable>/application/combiner/run</streamingExecutable>
<defaultParameters>
<parameter id="bboxes"/>
</defaultParameters>
<defaultJobconf>
<property id="ciop.job.max.tasks">1</property>
</defaultJobconf>
</jobTemplate>
<jobTemplate id="zosplot">
<streamingExecutable>/application/zosplot/run</streamingExecutable>
<defaultParameters>
<parameter id="cdatscript"/>
</defaultParameters>
</jobTemplate>
</jobTemplates>
<workflow id="wcdat"> <!-- Sample workflow -->
<workflowVersion>1.0</workflowVersion>
<node id="vdataaccess"> <!-- workflow node unique id -->
<job id="dataaccess"></job> <!-- job defined above -->
<sources>
<source refid="file:urls" >/application/inputurls</source>
</sources>
<parameters> <!-- parameters of the job -->
<parameter id="openid">https://pcmdi9.llnl.gov/esgf-idp/openid/username</parameter>
<parameter id="openpass">password</parameter>
<parameter id="esgfclient">ESGFClient</parameter>
</parameters>
</node>
<node id="vcombiner"> <!-- workflow node unique id -->
<job id="combiner"></job> <!-- job defined above -->
<sources>
<source refid="wf:node" >vdataaccess</source>
</sources>
<parameters> <!-- parameters of the job -->
<parameter id="bboxes">141,149,13,21;140,148,12,20</parameter>
</parameters>
</node>
<node id="vzosplot"> <!-- workflow node unique id -->
<job id="zosplot"></job> <!-- job defined above -->
<sources>
<source refid="wf:node" >vcombiner</source>
</sources>
<parameters> <!-- parameters of the job -->
<parameter id="cdatscript">zos_plot_params.py</parameter>
</parameters>
</node>
</workflow>
</application>
This file is a kind of User Interface where to define the work and its flow. The first part is the jobTemplates section, where every job composing the workflow needs to be defined. The main tags of this metadata section are the :
streamingExecutable: location of the executable job file;
defaultParameters: list of parameters name to be passed to the "streamingExecutable" job;
Once the user filled these metadata for every job, a real workflow has to be defined.
A workflow is defined by a unique workflow id and consists of one or more node, one for each job instance. Every node is defined by a source which can be a physical input (like a file or a catalogue reference) if its the first node, or the output of the previous node.
Also each parameter defined in the jobTemplate section needs to be filled here with a valid value.
- Example of executable script: zosplot/run¶
This is the main script where to define the list of instruction to be executed in the job. It is a bash script from which we'll list the most significant sections:
to include the all ciop environment used in this file (e.g. $_CIOP_APPLICATION_PATH, $TMPDIR, etc.).
source ${ciop_job_include}
define the set of error/exit codes to be managed above.
SUCCESS=0
ERR_MISSING_CCCMA_PARAM=10
ERR_MISSING_BCC_PARAM=11
ERR_MISSING_MRI_PARAM=12
ERR_CDAT=13
import the parameters defined in the application.xml file through the "ciop-getparam" call.
CCCMA=`ciop-getparam cccma`
[ $? != 0 ] && exit $ERR_MISSING_CCCMA_PARAM
BCC=`ciop-getparam bcc`
[ $? != 0 ] && exit $ERR_MISSING_BCC_PARAM
MRI=`ciop-getparam mri`
[ $? != 0 ] && exit $ERR_MISSING_MRI_PARAM
CDAT_SCRIPT=`ciop-getparam cdatscript`
[ $? != 0 ] && exit $ERR_MISSING_CDAT_SCRIPT
ciop-copy is an utility from ciop framework that may copy everything (URLSs, local files, etc...).
echo $CCCMA $BCC $MRI | ciop-copy -o $TMPDIR -
finally run cdat to compile and execute the python script defined by $CDAT_SCRIPT (in this example is zosplot/etc/zos_plot_params.py).
while read bbox
do
cd $TMPDIR
ciop-log "INFO" "xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox"
xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox &> /tmp/cdatlog
[ $? != 0 ] && exit $ERR_CDAT
ciop-publish `find $TMPDIR -name "*.png" -print`
done
after the execution, in the case of processing successfully concluded, the result file is published by the framework utility "ciop-publish" .
ciop-publish `find $TMPDIR -name "*.png" -print`
- zosplot/etc/zos_plot_params.py¶
the python script which makes the real job.
Launch the job or the workflow¶
I order to test and debug every single job, this can be executed separately, although the previous nodes needs to be successfully executed at lest once, or a valid source will not be available. To launch a job run:
ciop-simjob vzosplot
where vzplot comes from application.xml, as node id.
In this example during the execution, a lot of log is displayed. The most significant line is represented by a URL to the job like this:
http://sb-10-10-14-17.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_XXXXXXXX_XXX
where it is possible to find real time information about the running jobs and eventually their error/debug log.
When the user feels ready to launch the whole workflow, he can use:
ciop-simwf wcdat
In this case the output will appear different from the single job execution. A real-time status of the flow is displayed, also with a url reference to a web interface which graphically shows the status of the workflow and the relative jobs logs.
The final result¶
These are the two image files (one for each given bounding box):
Updated by Herve Caumont about 11 years ago ยท 3 revisions