Project

General

Profile

App-ecv-plot » History » Version 2

Herve Caumont, 2013-10-25 17:06

1 1 Herve Caumont
h1. Essential Climate Variable extraction and plotting
2
3 2 Herve Caumont
{{>toc}}
4
5 1 Herve Caumont
Using data discovery (ESGF Gateway), access (CAS & ESGFClient), processing (Combiner) and visualization (UVCDAT).
6
7
h2. Overview
8
9
In this example three NetCDF files are opened and the "Sea Surface Height above Geoid" (zos) variable is extracted and plotted overlapped. The result image (png) files, called like the bounding box used, are saved waiting to be published from the ciop framework. More than a bounding box can be passed to this service in order to have multiple png plot files.
10
Three jobs are for three steps: data access, combiner and zos plot.
11
12
h2. System requirements
13
14
*ESGFClient* is needed to perform the download of the files and [[lib-uvcdat|UV-CDAT]] for the data processing.
15
16
h2. Data Access: ESGFClient
17
18
As input a text file containing three urls is given. These resources come from the use of the OpenSearch interface to the ESGF Data, exploiting the *ESGFGateway*. At the end of the download, every NetCDF file is uploaded on hdfs with the utility ciop-publish.
19
20
h2. Combiner
21
22
Since the ESGFClient is run in parallel due the work of Hadoop framework, all the results hdfs locations need to be listed on one single file and rearranged with the bounding boxes coming as user parameter. This is the work of the *combiner* and this is the content of the result file:
23
24
<pre><code class="xml">
25
141,149,13,21;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
26
140,148,12,20;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc;hdfs:///tmp/sandbox/wcdat/vdataaccess/data/zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc;
27
</code></pre>
28
29
h2. Data Processing: zosplot
30
31
The data processing step of the workflow consists of a python script that exploits the [[lib-uvcdat|UV-CDAT]] libraries. The output from this algorithm is a png file (one for every bounding box) representing the variable zos (Sea Surface Height above Geoid) plotted for three institutes and overlapped.
32
33
h2. Description of File System
34
35
Here is the directory tree of our Sandbox:
36
37
 /application/
38
 |__ application.xml
39
 |__ combiner
40
 |		|__ run
41
 |__ dataaccess
42
 |		|__ run
43
 |__ inputparams
44
 |__ share
45
 |		|__ uvcdat
46
 |			|__ 1.2.0
47
 |			|__ bin
48
 |				|__ .
49
 |				|__ .
50
 |				|__ python
51
 |			|__ Externals
52
 |			|__ include
53
 |	        |__ lib
54
 |	        |__ man
55
 |	        |__ sample_data
56
 |	        |__ share
57
 |__ zosplot
58
	|__ etc
59
	|		|__ zos_plot_params.py
60
	|__ data
61
	|		|__ zos_Omon_bcc-csm1-1_rcp45_r1i1p1_200601-209912.nc
62
	|		|__ zos_Omon_CanCM4_rcp45_r1i1p1_200601-203512.nc
63
	|		|__ zos_Omon_MRI-CGCM3_rcp45_r1i1p1_200601-210012.nc
64
	|__ run
65
	
66
h2. - application.xml
67
68
In this xml file we keep all the job templates for every job to be executed, and some sample workflows in order to test the frameworks.
69
Here we have only one job called zosplot and this is the actual application.xml:
70
71
<pre><code class="xml">
72
<application id="cdat">
73
        <jobTemplates>
74
                <jobTemplate id="dataaccess">
75
                        <streamingExecutable>/application/dataaccess/run</streamingExecutable>
76
                        <defaultParameters>
77
                                <parameter id="openid"/>
78
                                <parameter id="openpass"/>
79
                                <parameter id="esgfclient"/>
80
                        </defaultParameters>
81
                </jobTemplate>
82
                <jobTemplate id="combiner">
83
                        <streamingExecutable>/application/combiner/run</streamingExecutable>
84
                        <defaultParameters>
85
                                <parameter id="bboxes"/>
86
                        </defaultParameters>
87
                        <defaultJobconf>
88
                                <property id="ciop.job.max.tasks">1</property>
89
                        </defaultJobconf>
90
                </jobTemplate>
91
                <jobTemplate id="zosplot">
92
                        <streamingExecutable>/application/zosplot/run</streamingExecutable>
93
                        <defaultParameters>
94
                                <parameter id="cdatscript"/>
95
                        </defaultParameters>
96
                </jobTemplate>
97
        </jobTemplates>
98
        <workflow id="wcdat">                                   <!-- Sample workflow -->
99
                <workflowVersion>1.0</workflowVersion>
100
                <node id="vdataaccess">                                                    <!-- workflow node unique id -->
101
                        <job id="dataaccess"></job>                                        <!-- job defined above -->
102
                        <sources>
103
                                <source refid="file:urls" >/application/inputurls</source>
104
                        </sources>
105
                        <parameters>                                                    <!-- parameters of the job -->
106
                                <parameter id="openid">https://pcmdi9.llnl.gov/esgf-idp/openid/username</parameter>
107
                                <parameter id="openpass">password</parameter>
108
                                <parameter id="esgfclient">ESGFClient</parameter>
109
                        </parameters>
110
                </node>
111
                <node id="vcombiner">                                                    <!-- workflow node unique id -->
112
                        <job id="combiner"></job>                                        <!-- job defined above -->
113
                        <sources>
114
                                <source refid="wf:node" >vdataaccess</source>
115
                        </sources>
116
                        <parameters>                                                    <!-- parameters of the job -->
117
                                <parameter id="bboxes">141,149,13,21;140,148,12,20</parameter>
118
                        </parameters>
119
                </node>
120
                <node id="vzosplot">                                                    <!-- workflow node unique id -->
121
                        <job id="zosplot"></job>                                        <!-- job defined above -->
122
                        <sources>
123
                                <source refid="wf:node" >vcombiner</source>
124
                        </sources>
125
                        <parameters>                                                    <!-- parameters of the job -->
126
                                <parameter id="cdatscript">zos_plot_params.py</parameter>
127
                        </parameters>
128
                </node>
129
        </workflow>
130
</application>
131
</code></pre>
132
133
This file is a kind of User Interface where to define the work and its flow. The first part is the jobTemplates section, where every job composing the workflow needs to be defined. The main tags of this metadata section are the :
134
135
*streamingExecutable*: location of the executable job file;
136
*defaultParameters*: list of parameters name to be passed to the "streamingExecutable" job;
137
138
Once the user filled these metadata for every job, a real workflow has to be defined. 
139
A workflow is defined by a unique workflow id and consists of one or more *node*, one for each job instance. Every node is defined by a *source* which can be a physical input (like a file or a catalogue reference) if its the first node, or the output of the previous node. 
140
Also each parameter defined in the jobTemplate section needs to be filled here with a valid value.
141
142
h2. - Example of executable script: zosplot/run
143
144
This is the main script where to define the list of instruction to be executed in the job. It is a bash script from which we'll list the most significant sections:
145
146
to include the all ciop environment used in this file (e.g. $_CIOP_APPLICATION_PATH, $TMPDIR, etc.).
147
148
<pre><code class="bash">
149
source ${ciop_job_include}
150
</code></pre>
151
152
define the set of error/exit codes to be managed above.
153
154
<pre><code class="bash">
155
SUCCESS=0
156
ERR_MISSING_CCCMA_PARAM=10
157
ERR_MISSING_BCC_PARAM=11
158
ERR_MISSING_MRI_PARAM=12
159
ERR_CDAT=13
160
</code></pre>
161
162
import the parameters defined in the application.xml file through the "ciop-getparam" call.
163
164
<pre><code class="bash">
165
CCCMA=`ciop-getparam cccma`
166
[ $? != 0 ] && exit $ERR_MISSING_CCCMA_PARAM
167
168
BCC=`ciop-getparam bcc`
169
[ $? != 0 ] && exit $ERR_MISSING_BCC_PARAM
170
171
MRI=`ciop-getparam mri`
172
[ $? != 0 ] && exit $ERR_MISSING_MRI_PARAM
173
174
CDAT_SCRIPT=`ciop-getparam cdatscript`
175
[ $? != 0 ] && exit $ERR_MISSING_CDAT_SCRIPT
176
</code></pre>
177
178
ciop-copy is an utility from ciop framework that may copy everything (URLSs, local files, etc...).
179
180
<pre><code class="bash">
181
echo $CCCMA $BCC $MRI | ciop-copy -o $TMPDIR -
182
</code></pre>
183
184
finally run cdat to compile and execute the python script defined by $CDAT_SCRIPT (in this example is zosplot/etc/zos_plot_params.py).
185
186
<pre><code class="bash">
187
while read bbox
188
do
189
        cd $TMPDIR
190
        ciop-log "INFO" "xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox"
191
        xvfb-run python $_CIOP_APPLICATION_PATH/zosplot/etc/$CDAT_SCRIPT -cccma $CCCMA -bcc $BCC -mri $MRI -bbox $bbox &> /tmp/cdatlog
192
        [ $? != 0 ] && exit $ERR_CDAT
193
        ciop-publish `find $TMPDIR -name "*.png" -print`
194
195
done
196
</code></pre>
197
198
after the execution, in the case of processing successfully concluded, the result file is published by the framework utility "ciop-publish" .
199
200
<pre><code class="bash">
201
 ciop-publish `find $TMPDIR -name "*.png" -print`
202
</code></pre>
203
204
205
h2. - zosplot/etc/zos_plot_params.py
206
207
the python script which makes the real job.
208
209
h2. Launch the job or the workflow 
210
211
I order to test and debug every single job, this can be executed separately, although the previous nodes needs to be successfully executed at lest once, or a valid *source* will not be available. To launch a job run:
212
213
<pre><code class="bash">
214
ciop-simjob vzosplot
215
</code></pre>
216
217
where vzplot comes from application.xml, as node id. 
218
In this example during the execution, a lot of log is displayed. The most significant line is represented by a URL to the job like this:
219
220
http://sb-10-10-14-17.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_XXXXXXXX_XXX
221
222
where it is possible to find real time information about the running jobs and eventually their error/debug log.
223
224
225
When the user feels ready to launch the whole workflow, he can use:
226
227
<pre><code class="bash">
228
ciop-simwf wcdat
229
</code></pre>
230
231
In this case the output will appear different from the single job execution. A real-time status of the flow is displayed, also with a url reference to a web interface which graphically shows the status of the workflow and the relative jobs logs.
232
233
h2. The final result
234
235
These are the two image files (one for each given bounding box):
236
237
!{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_141.0_149.0_13.0_21.0.png!
238
239
!{width: 60%}zos_rcp45_r1i1p1_bcc_cccma_mri_140.0_148.0_12.0_20.0.png!	
240
241
 
242
 	
243