App-ecv-plot » History » Revision 2
Revision 1 (Herve Caumont, 2013-06-20 10:50) → Revision 2/3 (Herve Caumont, 2013-10-25 17:06)
h1. Essential Climate Variable extraction and plotting {{>toc}} Using data discovery (ESGF Gateway), access (CAS & ESGFClient), processing (Combiner) and visualization (UVCDAT). h2. Overview In this example three NetCDF files are opened and the "Sea Surface Height above Geoid" (zos) variable is extracted and plotted overlapped. The result image (png) files, called like the bounding box used, are saved waiting to be published from the ciop framework. More than a bounding box can be passed to this service in order to have multiple png plot files. Three jobs are for three steps: data access, combiner and zos plot. h2. System requirements *ESGFClient* is needed to perform the download of the files and [[lib-uvcdat|UV-CDAT]] for the data processing. h2. Data Access: ESGFClient As input a text file containing three urls is given. These resources come from the use of the OpenSearch interface to the ESGF Data, exploiting the *ESGFGateway*. At the end of the download, every NetCDF file is uploaded on hdfs with the utility ciop-publish. h2. Combiner Since the ESGFClient is run in parallel due the work of Hadoop framework, all the results hdfs locations need to be listed on one single file and rearranged with the bounding boxes coming as user parameter. This is the work of the *combiner* and this is the content of the result file: <pre><code class="xml"> 141,149,13,21;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc; 140,148,12,20;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc; </code></pre> h2. Data Processing: zosplot The data processing step of the workflow consists of a python script that exploits the [[lib-uvcdat|UV-CDAT]] libraries. The output from this algorithm is a png file (one for every bounding box) representing the variable zos (Sea Surface Height above Geoid) plotted for three institutes and overlapped. h2. Description of File System Here is the directory tree of our Sandbox: /application/ |__ application.xml |__ combiner | |__ run |__ dataaccess | |__ run |__ inputparams |__ share | |__ uvcdat | |__ 1.2.0 | |__ bin | |__ . | |__ . | |__ python | |__ Externals | |__ include | |__ lib | |__ man | |__ sample_data | |__ share |__ zosplot |__ etc | |__ zos_plot_params.py |__ data | |__ zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc | |__ zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc | |__ zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc |__ run h2. - application.xml In this xml file we keep all the job templates for every job to be executed, and some sample workflows in order to test the frameworks. Here we have only one job called zosplot and this is the actual application.xml: <pre><code class="xml"> <application id="cdat"> <jobTemplates> <jobTemplate id="dataaccess"> <streamingExecutable>/application/dataaccess/run</streamingExecutable> <defaultParameters> <parameter id="openid"/> <parameter id="openpass"/> <parameter id="esgfclient"/> </defaultParameters> </jobTemplate> <jobTemplate id="combiner"> <streamingExecutable>/application/combiner/run</streamingExecutable> <defaultParameters> <parameter id="bboxes"/> </defaultParameters> <defaultJobconf> <property id="ciop.job.max.tasks">1</property> </defaultJobconf> </jobTemplate> <jobTemplate id="zosplot"> <streamingExecutable>/application/zosplot/run</streamingExecutable> <defaultParameters> <parameter id="cdatscript"/> </defaultParameters> </jobTemplate> </jobTemplates> <workflow id="wcdat"> <!-- Sample workflow --> <workflowVersion>1.0</workflowVersion> <node id="vdataaccess"> <!-- workflow node unique id --> <job id="dataaccess"></job> <!-- job defined above --> <sources> <source refid="file:urls" >/application/inputurls</source> </sources> <parameters> <!-- parameters of the job --> <parameter id="openid">https://pcmdi9.llnl.gov/esgf-idp/openid/username</parameter> <parameter id="openpass">password</parameter> <parameter id="esgfclient">ESGFClient</parameter> </parameters> </node> <node id="vcombiner"> <!-- workflow node unique id --> <job id="combiner"></job> <!-- job defined above --> <sources> <source refid="wf:node" >vdataaccess</source> </sources> <parameters> <!-- parameters of the job --> <parameter id="bboxes">141,149,13,21;140,148,12,20</parameter> </parameters> </node> <node id="vzosplot"> <!-- workflow node unique id --> <job id="zosplot"></job> <!-- job defined above --> <sources> <source refid="wf:node" >vcombiner</source> </sources> <parameters> <!-- parameters of the job --> <parameter id="cdatscript">zos_plot_params.py</parameter> </parameters> </node> </workflow> </application> </code></pre> This file is a kind of User Interface where to define the work and its flow. The first part is the jobTemplates section, where every job composing the workflow needs to be defined. The main tags of this metadata section are the : *streamingExecutable*: location of the executable job file; *defaultParameters*: list of parameters name to be passed to the "streamingExecutable" job; Once the user filled these metadata for every job, a real workflow has to be defined. A workflow is defined by a unique workflow id and consists of one or more *node*, one for each job instance. Every node is defined by a *source* which can be a physical input (like a file or a catalogue reference) if its the first node, or the output of the previous node. Also each parameter defined in the jobTemplate section needs to be filled here with a valid value. h2. - Example of executable script: zosplot/run This is the main script where to define the list of instruction to be executed in the job. It is a bash script from which we'll list the most significant sections: to include the all ciop environment used in this file (e.g. $_CIOP_APPLICATION_PATH, $TMPDIR, etc.). <pre><code class="bash"> source ${ciop_job_include} </code></pre> define the set of error/exit codes to be managed above. <pre><code class="bash"> SUCCESS=0 ERR_MISSING_CCCMA_PARAM=10 ERR_MISSING_BCC_PARAM=11 ERR_MISSING_MRI_PARAM=12 ERR_CDAT=13 </code></pre> import the parameters defined in the application.xml file through the "ciop-getparam" call. <pre><code class="bash"> CCCMA=`ciop-getparam cccma` [ $? != 0 ] && exit $ERR_MISSING_CCCMA_PARAM BCC=`ciop-getparam bcc` [ $? != 0 ] && exit $ERR_MISSING_BCC_PARAM MRI=`ciop-getparam mri` [ $? != 0 ] && exit $ERR_MISSING_MRI_PARAM CDAT_SCRIPT=`ciop-getparam cdatscript` [ $? != 0 ] && exit $ERR_MISSING_CDAT_SCRIPT </code></pre> ciop-copy is an utility from ciop framework that may copy everything (URLSs, local files, etc...). <pre><code class="bash"> echo $CCCMA $BCC $MRI | ciop-copy -o $TMPDIR - </code></pre> finally run cdat to compile and execute the python script defined by $CDAT_SCRIPT (in this example is zosplot/etc/zos_plot_params.py). <pre><code class="bash"> while read bbox do cd $TMPDIR ciop-log "INFO" "xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox" xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox &> /tmp/cdatlog [ $? != 0 ] && exit $ERR_CDAT ciop-publish `find $TMPDIR -name "*.png" -print` done </code></pre> after the execution, in the case of processing successfully concluded, the result file is published by the framework utility "ciop-publish" . <pre><code class="bash"> ciop-publish `find $TMPDIR -name "*.png" -print` </code></pre> h2. - zosplot/etc/zos_plot_params.py the python script which makes the real job. h2. Launch the job or the workflow I order to test and debug every single job, this can be executed separately, although the previous nodes needs to be successfully executed at lest once, or a valid *source* will not be available. To launch a job run: <pre><code class="bash"> ciop-simjob vzosplot </code></pre> where vzplot comes from application.xml, as node id. In this example during the execution, a lot of log is displayed. The most significant line is represented by a URL to the job like this: http://sb-10-10-14-17.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_XXXXXXXX_XXX where it is possible to find real time information about the running jobs and eventually their error/debug log. When the user feels ready to launch the whole workflow, he can use: <pre><code class="bash"> ciop-simwf wcdat </code></pre> In this case the output will appear different from the single job execution. A real-time status of the flow is displayed, also with a url reference to a web interface which graphically shows the status of the workflow and the relative jobs logs. h2. The final result These are the two image files (one for each given bounding box): !{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_141.0_149.0_13.0_21.0.png! !{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_140.0_148.0_12.0_20.0.png!