Project

General

Profile

App-ecv-plot » History » Version 1

Herve Caumont, 2013-06-20 10:50

1 1 Herve Caumont
h1. Essential Climate Variable extraction and plotting
2
3
Using data discovery (ESGF Gateway), access (CAS & ESGFClient), processing (Combiner) and visualization (UVCDAT).
4
5
h2. Overview
6
7
In this example three NetCDF files are opened and the "Sea Surface Height above Geoid" (zos) variable is extracted and plotted overlapped. The result image (png) files, called like the bounding box used, are saved waiting to be published from the ciop framework. More than a bounding box can be passed to this service in order to have multiple png plot files.
8
Three jobs are for three steps: data access, combiner and zos plot.
9
10
h2. System requirements
11
12
*ESGFClient* is needed to perform the download of the files and [[lib-uvcdat|UV-CDAT]] for the data processing.
13
14
h2. Data Access: ESGFClient
15
16
As input a text file containing three urls is given. These resources come from the use of the OpenSearch interface to the ESGF Data, exploiting the *ESGFGateway*. At the end of the download, every NetCDF file is uploaded on hdfs with the utility ciop-publish.
17
18
h2. Combiner
19
20
Since the ESGFClient is run in parallel due the work of Hadoop framework, all the results hdfs locations need to be listed on one single file and rearranged with the bounding boxes coming as user parameter. This is the work of the *combiner* and this is the content of the result file:
21
22
<pre><code class="xml">
23
141,149,13,21;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
24
140,148,12,20;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
25
</code></pre>
26
27
h2. Data Processing: zosplot
28
29
The data processing step of the workflow consists of a python script that exploits the [[lib-uvcdat|UV-CDAT]] libraries. The output from this algorithm is a png file (one for every bounding box) representing the variable zos (Sea Surface Height above Geoid) plotted for three institutes and overlapped.
30
31
h2. Description of File System
32
33
Here is the directory tree of our Sandbox:
34
35
 /application/
36
 |__ application.xml
37
 |__ combiner
38
 |		|__ run
39
 |__ dataaccess
40
 |		|__ run
41
 |__ inputparams
42
 |__ share
43
 |		|__ uvcdat
44
 |			|__ 1.2.0
45
 |			|__ bin
46
 |				|__ .
47
 |				|__ .
48
 |				|__ python
49
 |			|__ Externals
50
 |			|__ include
51
 |	        |__ lib
52
 |	        |__ man
53
 |	        |__ sample_data
54
 |	        |__ share
55
 |__ zosplot
56
	|__ etc
57
	|		|__ zos_plot_params.py
58
	|__ data
59
	|		|__ zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc
60
	|		|__ zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc
61
	|		|__ zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc
62
	|__ run
63
	
64
h2. - application.xml
65
66
In this xml file we keep all the job templates for every job to be executed, and some sample workflows in order to test the frameworks.
67
Here we have only one job called zosplot and this is the actual application.xml:
68
69
<pre><code class="xml">
70
<application id="cdat">
71
        <jobTemplates>
72
                <jobTemplate id="dataaccess">
73
                        <streamingExecutable>/application/dataaccess/run</streamingExecutable>
74
                        <defaultParameters>
75
                                <parameter id="openid"/>
76
                                <parameter id="openpass"/>
77
                                <parameter id="esgfclient"/>
78
                        </defaultParameters>
79
                </jobTemplate>
80
                <jobTemplate id="combiner">
81
                        <streamingExecutable>/application/combiner/run</streamingExecutable>
82
                        <defaultParameters>
83
                                <parameter id="bboxes"/>
84
                        </defaultParameters>
85
                        <defaultJobconf>
86
                                <property id="ciop.job.max.tasks">1</property>
87
                        </defaultJobconf>
88
                </jobTemplate>
89
                <jobTemplate id="zosplot">
90
                        <streamingExecutable>/application/zosplot/run</streamingExecutable>
91
                        <defaultParameters>
92
                                <parameter id="cdatscript"/>
93
                        </defaultParameters>
94
                </jobTemplate>
95
        </jobTemplates>
96
        <workflow id="wcdat">                                   <!-- Sample workflow -->
97
                <workflowVersion>1.0</workflowVersion>
98
                <node id="vdataaccess">                                                    <!-- workflow node unique id -->
99
                        <job id="dataaccess"></job>                                        <!-- job defined above -->
100
                        <sources>
101
                                <source refid="file:urls" >/application/inputurls</source>
102
                        </sources>
103
                        <parameters>                                                    <!-- parameters of the job -->
104
                                <parameter id="openid">https://pcmdi9.llnl.gov/esgf-idp/openid/username</parameter>
105
                                <parameter id="openpass">password</parameter>
106
                                <parameter id="esgfclient">ESGFClient</parameter>
107
                        </parameters>
108
                </node>
109
                <node id="vcombiner">                                                    <!-- workflow node unique id -->
110
                        <job id="combiner"></job>                                        <!-- job defined above -->
111
                        <sources>
112
                                <source refid="wf:node" >vdataaccess</source>
113
                        </sources>
114
                        <parameters>                                                    <!-- parameters of the job -->
115
                                <parameter id="bboxes">141,149,13,21;140,148,12,20</parameter>
116
                        </parameters>
117
                </node>
118
                <node id="vzosplot">                                                    <!-- workflow node unique id -->
119
                        <job id="zosplot"></job>                                        <!-- job defined above -->
120
                        <sources>
121
                                <source refid="wf:node" >vcombiner</source>
122
                        </sources>
123
                        <parameters>                                                    <!-- parameters of the job -->
124
                                <parameter id="cdatscript">zos_plot_params.py</parameter>
125
                        </parameters>
126
                </node>
127
        </workflow>
128
</application>
129
</code></pre>
130
131
This file is a kind of User Interface where to define the work and its flow. The first part is the jobTemplates section, where every job composing the workflow needs to be defined. The main tags of this metadata section are the :
132
133
*streamingExecutable*: location of the executable job file;
134
*defaultParameters*: list of parameters name to be passed to the "streamingExecutable" job;
135
136
Once the user filled these metadata for every job, a real workflow has to be defined. 
137
A workflow is defined by a unique workflow id and consists of one or more *node*, one for each job instance. Every node is defined by a *source* which can be a physical input (like a file or a catalogue reference) if its the first node, or the output of the previous node. 
138
Also each parameter defined in the jobTemplate section needs to be filled here with a valid value.
139
140
h2. - Example of executable script: zosplot/run
141
142
This is the main script where to define the list of instruction to be executed in the job. It is a bash script from which we'll list the most significant sections:
143
144
to include the all ciop environment used in this file (e.g. $_CIOP_APPLICATION_PATH, $TMPDIR, etc.).
145
146
<pre><code class="bash">
147
source ${ciop_job_include}
148
</code></pre>
149
150
define the set of error/exit codes to be managed above.
151
152
<pre><code class="bash">
153
SUCCESS=0
154
ERR_MISSING_CCCMA_PARAM=10
155
ERR_MISSING_BCC_PARAM=11
156
ERR_MISSING_MRI_PARAM=12
157
ERR_CDAT=13
158
</code></pre>
159
160
import the parameters defined in the application.xml file through the "ciop-getparam" call.
161
162
<pre><code class="bash">
163
CCCMA=`ciop-getparam cccma`
164
[ $? != 0 ] && exit $ERR_MISSING_CCCMA_PARAM
165
166
BCC=`ciop-getparam bcc`
167
[ $? != 0 ] && exit $ERR_MISSING_BCC_PARAM
168
169
MRI=`ciop-getparam mri`
170
[ $? != 0 ] && exit $ERR_MISSING_MRI_PARAM
171
172
CDAT_SCRIPT=`ciop-getparam cdatscript`
173
[ $? != 0 ] && exit $ERR_MISSING_CDAT_SCRIPT
174
</code></pre>
175
176
ciop-copy is an utility from ciop framework that may copy everything (URLSs, local files, etc...).
177
178
<pre><code class="bash">
179
echo $CCCMA $BCC $MRI | ciop-copy -o $TMPDIR -
180
</code></pre>
181
182
finally run cdat to compile and execute the python script defined by $CDAT_SCRIPT (in this example is zosplot/etc/zos_plot_params.py).
183
184
<pre><code class="bash">
185
while read bbox
186
do
187
        cd $TMPDIR
188
        ciop-log "INFO" "xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox"
189
        xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox &> /tmp/cdatlog
190
        [ $? != 0 ] && exit $ERR_CDAT
191
        ciop-publish `find $TMPDIR -name "*.png" -print`
192
193
done
194
</code></pre>
195
196
after the execution, in the case of processing successfully concluded, the result file is published by the framework utility "ciop-publish" .
197
198
<pre><code class="bash">
199
 ciop-publish `find $TMPDIR -name "*.png" -print`
200
</code></pre>
201
202
203
h2. - zosplot/etc/zos_plot_params.py
204
205
the python script which makes the real job.
206
207
h2. Launch the job or the workflow 
208
209
I order to test and debug every single job, this can be executed separately, although the previous nodes needs to be successfully executed at lest once, or a valid *source* will not be available. To launch a job run:
210
211
<pre><code class="bash">
212
ciop-simjob vzosplot
213
</code></pre>
214
215
where vzplot comes from application.xml, as node id. 
216
In this example during the execution, a lot of log is displayed. The most significant line is represented by a URL to the job like this:
217
218
http://sb-10-10-14-17.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_XXXXXXXX_XXX
219
220
where it is possible to find real time information about the running jobs and eventually their error/debug log.
221
222
223
When the user feels ready to launch the whole workflow, he can use:
224
225
<pre><code class="bash">
226
ciop-simwf wcdat
227
</code></pre>
228
229
In this case the output will appear different from the single job execution. A real-time status of the flow is displayed, also with a url reference to a web interface which graphically shows the status of the workflow and the relative jobs logs.
230
231
h2. The final result
232
233
These are the two image files (one for each given bounding box):
234
235
!{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_141.0_149.0_13.0_21.0.png!
236
237
!{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_140.0_148.0_12.0_20.0.png!	
238
239
 
240
 	
241