Lib-nest » History » Version 2
Herve Caumont, 2013-06-20 10:50
1 | 1 | Herve Caumont | h1. NEST tutorial |
---|---|---|---|
2 | |||
3 | {{>toc}} |
||
4 | |||
5 | 2 | Herve Caumont | NEST: Next ESA Synthetic Aperture Radar (SAR) Toolbox |
6 | |||
7 | 1 | Herve Caumont | h2. Sandbox per-requisites |
8 | |||
9 | *We assume that you have already a Sandbox ready and the following items are completed:* |
||
10 | 2 | Herve Caumont | * *You have accessed your Sandbox as described in the Getting Started guide* |
11 | 1 | Herve Caumont | |
12 | |||
13 | h2. 1. Application concepts and terminology |
||
14 | |||
15 | In order to ease the execution of the tutorial, it is important to understand the concept of an application and its terminology. This section describes the example application used all along this guide. When a word is put in *+underlined bold+*, it is an terminology keyword always designating the same concept. |
||
16 | |||
17 | h3. 1.1 The application workflow |
||
18 | |||
19 | Our example in this tutorial is an application that processes ECMWF data (TIGGE Data). It is composed of 6 steps that run independently but in a specific order and produce results that are inputs for the other remaining steps. |
||
20 | |||
21 | The following figure illustrates the *+Workflow+* of our application as a directed acyclic graph (DAG). This is also how the Developer Cloud Sandbox service handles the execution of processes in terms of parallel computing and orchestration of the processing steps. |
||
22 | |||
23 | !! _<<insert here the DAG that represent the workflow >>_ |
||
24 | |||
25 | Each box represents a *+Job+* which is a step of our application process. The arrows represents the data flow between the *+jobs+*. When a *+job+* is connected to another, it means that the *+output+* of this *+job+* is passed as *+input+* for the other. |
||
26 | |||
27 | It is important to keep in mind that in the Sandbox framework, *+input+* and *+ouput+* are text references (e.g. to data). Indeed, when a *+job+* process *+input+*, it actually reads *line by line* the reference workflow as described in the next figure |
||
28 | |||
29 | !https://ciop.eo.esa.int/attachments/40/focus-align.png! |
||
30 | |||
31 | It is therefore important to define precisely the inter- *+job+* references. |
||
32 | |||
33 | h3. 1.2 The job |
||
34 | |||
35 | Each *+job+* has a set of basic characteristics: |
||
36 | |||
37 | * a unique *+Job name+* in the workflow. (e.g. 'PreProc') |
||
38 | * zero, 1 or several *+sources+* that define the *+jobs+* interdependency. In the example, the *+job+* 'Interfere' has 2 dependencies: 'AlignSlave' and 'MakeTropo'. |
||
39 | * a maximum number of simultaneous *+tasks+* in which it can be forked. This is further explained in section [[Sandbox Application Integration Tutorial#1.3 The processing task|1.3 The processing task]]. |
||
40 | * a *+processing trigger*+ which is a software executable of the *+job template+* that handles *+input+*/*+output+* streaming process. Practically, the executable that reads the *+input+* lines and writes the *+output+* lines. |
||
41 | |||
42 | The job characteristics above are mandatory in the *+workflow+* definition. |
||
43 | If incomplete, the Sandbox framework reports error in the workflow. |
||
44 | |||
45 | |||
46 | h3. 1.3 The processing task |
||
47 | |||
48 | To exploit the parallelism offered by the Sandbox framework, a *+job+* may process its *+input+* in several *+tasks+*. In principle, the Sandbox framework will run those *+tasks+* in parallel. This is an important and sometimes complex paradigm that can be addressed in different ways. |
||
49 | The following questions & answers describe the parallelism paradigm of the Sandbox framework. |
||
50 | |||
51 | * Is it *+task+* parallelism or *+job+* parallelism? |
||
52 | |||
53 | > In this section, we definitely speak about task parallelism. Job parallelism is at level !+1. |
||
54 | |||
55 | * How to divide a *+job+* into *+tasks+*? |
||
56 | |||
57 | > It is actually the application developer who chooses the granularity of the *+job+* division. The computing framework will simply divide the *+input+* flow (*k* lines) in to the *n* *+tasks+*. In the example provided in this tutorial, if the *+job+* 'PreProc' produces *+output+* of 11 lines and the computing resources divide the *+job+* 'AlignSlave' into 4 *+tasks+* and the following division is done: |
||
58 | |||
59 | !https://ciop.eo.esa.int/attachments/41/parallelism.png! |
||
60 | |||
61 | Where stands the processing loop? |
||
62 | |||
63 | > The processing loop stands in the +*processing trigger*+. As shown in this tutorial with the example, the *+processing triggers+* implements a loop that reads *line by line* the *+task*+ input. |
||
64 | |||
65 | h2. 2. Home Directory |
||
66 | |||
67 | 2 | Herve Caumont | The *+home+* directory of your Sandbox is attached to a separate savable disk and thus persistent. This disk is mounted on the /home/{USERNAME} folder. In this folder you may upload, compile, test manually, etc. all your data. This is a free space where you can do anything BUT: |
68 | 1 | Herve Caumont | |
69 | > *+Be careful to never link any element (e.g. executable, auxiliary data) from the application directory to the home directory which is critical for the application+*. Indeed, the *+home+* directory is not present when using CIOP in Runtime Environment mode and therefore any linked elements won't be available and cause the processing phase to fail. |
||
70 | |||
71 | h2. 3. Application Directory |
||
72 | |||
73 | 2 | Herve Caumont | The *+application directory+* of your Sandbox is attached to a separate savable disk and thus persistent. This disk is mounted on the /application folder. |
74 | 1 | Herve Caumont | The application directory is the place where integrated application resides. |
75 | It should be a clean environment and thus +*SHOULD NOT*+ contain any temporary files or used to do compilations and/or manual testing. Instead, it is used for the simulation the application *+jobs+* and +*workflow*+. |
||
76 | |||
77 | 2 | Herve Caumont | At the instantiation of your Sandbox, the *+application directory+* contains this sample application example unless you configured the Sandbox with one of your application disk previously saved in your application library. |
78 | 1 | Herve Caumont | In the next sections the elements of the application directory are described. |
79 | |||
80 | h3. 3.1 Files and folders structure |
||
81 | |||
82 | The application directory follows some best practices in its folders and files structure to ease the subsequent deployment of the application to the Runtime Environment. |
||
83 | The folder structure of the application example with the description of each item is shown below: |
||
84 | |||
85 | !https://ciop.eo.esa.int/attachments/44/application_folder.png! |
||
86 | |||
87 | |||
88 | > Even if the names are quite similar in the our tutorial example, *+job+* and +*job template*+ are not the same concept. _A *job* is an instance of a +*job template*+ in a given +*workflow*+. This paradigm allows to have several *+jobs+* in a +*workflow*+ that point to the same *+job template+*. This is explained in more detail in the next section. |
||
89 | |||
90 | h3. 3.2 The Application XML definition file |
||
91 | |||
92 | The Application XML definition file is the reference of your application for the CIOP computing framework. It contains all the characteristics of the *+job templates+* and the *+workflows+*. |
||
93 | |||
94 | The Application XML definition file is is described in the page [[Application XML definition file]]. |
||
95 | |||
96 | h2. 4 -- Using sample datasets -- (section under revision) |
||
97 | |||
98 | This section guides you through the tutorial example to introduce the data manipulation tools. |
||
99 | |||
100 | 2 | Herve Caumont | There are mainly two command line tools to discover and access data previously selected in your Sandbox ([[Getting_Started#2.1 Sandbox EO Data Services|see here]]): |
101 | * *ciop-catquery* to query the Sandbox catalogue containing all the metadata of the selected sample dataset |
||
102 | * *ciop-copy* to copy the data from logical of physical location to a local directory of the Sandbox |
||
103 | 1 | Herve Caumont | |
104 | Use |
||
105 | <pre>ciop-<command> -h</pre> |
||
106 | to display the CLI reference |
||
107 | |||
108 | These commands can be used in the processing triggers of a +*job*+. |
||
109 | |||
110 | 2 | Herve Caumont | h3. 4.1 Query Sandbox catalogue |
111 | 1 | Herve Caumont | |
112 | For the tutorial purpose, the first test is to ensure the test dataset needed for the Application integration and testing is complete. |
||
113 | |||
114 | (to be updated) |
||
115 | |||
116 | h3. 4.2 Copy data |
||
117 | |||
118 | To copy data from a reference link as displayed in the previous section, just use the following command: |
||
119 | |||
120 | <pre><code class="ruby">[user@sb ~]$ ciop-copy http://localhost/catalogue/sandbox/ASA_IM__0P/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1/rdf</code></pre> |
||
121 | |||
122 | output: |
||
123 | |||
124 | <pre> |
||
125 | [INFO ][ciop-copy][starting] url 'http://localhost/catalogue/sandbox/ASA_IM__0P/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1/rdf' > local '/application/' |
||
126 | [INFO ][ciop-copy][success] got URIs 'https://eo-virtual-archive4.esa.int/supersites/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1 ' |
||
127 | [INFO ][ciop-copy][starting] url 'https://eo-virtual-archive4.esa.int/supersites/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1' > local '/application/' |
||
128 | [INFO ][ciop-copy][success] url 'https://eo-virtual-archive4.esa.int/supersites/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1' > local '/application/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1' |
||
129 | /home/user/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1 |
||
130 | </pre> |
||
131 | |||
132 | The command displays information on the _stderr_ by default and returns on _stdout_ the path of the copied data. |
||
133 | |||
134 | Many other data schemas are supported by the ciop-copy CLI such as http, https, hdfs, etc. |
||
135 | There are also many other options to specify the output directory or to unpack compressed data. |
||
136 | The complete reference is available here [[ciop-copy CLI reference|ciop-copy usage]] or by using the inline help typing: |
||
137 | <pre><code class="ruby">[user@sb ~]$ciop-copy -h</code></pre> |
||
138 | |||
139 | h3. 4.3 Using other sources of data in a job |
||
140 | |||
141 | So far we have introduced two types of data sources: |
||
142 | * data coming from a catalogue series |
||
143 | * data coming from a previous job in the workflow. |
||
144 | |||
145 | In the first case, we define the workflow for the job imager: |
||
146 | |||
147 | <pre><code class="xml"><workflow id="testVomir"> <!-- Sample workflow --> |
||
148 | <workflowVersion>1.0</workflowVersion> |
||
149 | <node id="Vimage"> <!-- workflow node unique id --> |
||
150 | <job id="imager"></job> <!-- job defined above --> |
||
151 | <sources> |
||
152 | <source refid="cas:serie" >ATS_TOA_1P</source> |
||
153 | </sources> |
||
154 | <parameters> <!-- parameters of the job --> |
||
155 | <parameter id="volcano_db"></parameter> |
||
156 | </parameters> |
||
157 | </node> |
||
158 | <node id="Quarc"> |
||
159 | <job id="quarcXML"/> |
||
160 | <sources> |
||
161 | <source refid="wf:node" >Vimage</source> |
||
162 | </sources> |
||
163 | </node></code></pre> |
||
164 | |||
165 | In the second case, we define the workflow for the Quarc job: |
||
166 | |||
167 | <pre><code class="xml"><workflow id="testVomir"> <!-- Sample workflow --> |
||
168 | <workflowVersion>1.0</workflowVersion> |
||
169 | <node id="Vimage"> <!-- workflow node unique id --> |
||
170 | <job id="imager"></job> <!-- job defined above --> |
||
171 | <sources> |
||
172 | <source refid="cas:serie" >ATS_TOA_1P</source> |
||
173 | </sources> |
||
174 | <parameters> <!-- parameters of the job --> |
||
175 | <parameter id="volcano_db"></parameter> |
||
176 | </parameters> |
||
177 | </node> |
||
178 | <node id="Quarc"> |
||
179 | <job id="quarcXML"/> |
||
180 | <sources> |
||
181 | <source refid="wf:node" >Vimage</source> |
||
182 | </sources> |
||
183 | </node></code></pre> |
||
184 | |||
185 | It may be the case where the input data does not come from EO catalogues and thus there is the need to define another source of data. |
||
186 | |||
187 | <pre><code class="xml"><workflow id="someworkflow"> <!-- Sample workflow --> |
||
188 | <workflowVersion>1.0</workflowVersion> |
||
189 | <node id="somenode"> <!-- workflow node unique id --> |
||
190 | <job id="somejobid"></job> <!-- job defined above --> |
||
191 | <sources> |
||
192 | <source refid="file:urls" >/application/test.urls</source> |
||
193 | </sources> |
||
194 | </node> |
||
195 | </code></pre> |
||
196 | |||
197 | where the file test.urls contains the input lines that will be pipped to the processing trigger executable |
||
198 | |||
199 | h2. 5. Job integration |
||
200 | |||
201 | In this section, the +*job template+* 'align' and its instance the *+job+* 'AlignSlave' is integrated using the tools previously introduced in this tutorial. |
||
202 | |||
203 | h3. 5.1 Installation and configuration of the GMTSAR toolbox on the Sandbox |
||
204 | |||
205 | 2 | Herve Caumont | The steps below install the GMTSAR toolbox on the Sandbox. These are specific but it shows the common approach to follow when installing software on the Sandbox. |
206 | 1 | Herve Caumont | > The steps below are done in the +*home directory*+ |
207 | |||
208 | * Step - Download the GMTSAR |
||
209 | |||
210 | GMTSAR software is available on the University of California web server: |
||
211 | |||
212 | <pre><code class="ruby"> |
||
213 | [user@sb ~]$ wget http://topex.ucsd.edu/gmtsar/tar/GMTSAR.tar |
||
214 | </code></pre> |
||
215 | |||
216 | Then the GMTSAR.tar archive is unpacked |
||
217 | |||
218 | <pre><code class="ruby"> |
||
219 | [user@sb ~]$ tar xvf GMTSAR.tar |
||
220 | </code></pre> |
||
221 | |||
222 | GMTSAR relies on GMT with the dependencies netCDF, GMT is installed via yum: |
||
223 | |||
224 | <pre><code class="ruby"> |
||
225 | [user@sb ~]$ cd GMTSAR |
||
226 | [user@sb GMTSAR]$ sudo yum search gmt |
||
227 | [user@sb GMTSAR]$ sudo yum install GMT-devel |
||
228 | [user@sb GMTSAR]$ sudo yum install netcdf-devel |
||
229 | [user@sb GMTSAR]$ make |
||
230 | </code></pre> |
||
231 | |||
232 | The steps above compile GMTSAR in the +*home directory*+. The required files (binaries, libraries, etc.) are copied to the /Application environment (remember that the +*home directory*+ is only available in the CIOP Sandbox mode and not in the Runtime Environment). |
||
233 | |||
234 | h3. 5.2 Job template definition in the application.xml |
||
235 | |||
236 | The application.xml file has two main blocks: the job template section and the workflow template section. |
||
237 | |||
238 | The first part is to define the *+job templates+* in the workflow XML application definition file. |
||
239 | Each processing block of the GMTSAR workflow needs a *+job template+*. |
||
240 | |||
241 | Here is the *+job template+* for the 'align' |
||
242 | |||
243 | <pre><code class="xml"> |
||
244 | <jobTemplate id="preproc"> |
||
245 | <streamingExecutable>/application/preproc/run</streamingExecutable> <!-- processing trigger --> |
||
246 | <defaultParameters> <!-- default parameters of the job --> |
||
247 | <!-- Default values are specified here --> |
||
248 | <parameter id="SAT"></parameter> <!-- no default value --> |
||
249 | <parameter id="master"></parameter> |
||
250 | <parameter id="num_patches"></parameter> <!-- no default value --> |
||
251 | <parameter id="near_range"></parameter> <!-- no default value --> |
||
252 | <parameter id="earth_radius"></parameter> <!-- no default value --> |
||
253 | <parameter id="fd1">1</parameter> <!-- no default value --> |
||
254 | <parameter id="stop_on_error">false</parameter> <!-- dont stop on error by default --> |
||
255 | </defaultParameters> |
||
256 | <defaultJobconf> |
||
257 | <property id="ciop.job.max.tasks">1</property> <!-- Maximum number of parallel tasks --> |
||
258 | </defaultJobconf> |
||
259 | </jobTemplate> |
||
260 | </code></pre> |
||
261 | |||
262 | To test this +*job*+ with the +*ciop-symjob*+, we need to fill the second part of the +*application.xml*+ to add a *+node+*: |
||
263 | |||
264 | <pre><code class="xml"> |
||
265 | <node id="PreProc"> <!-- workflow node unique id --> |
||
266 | <job id="preproc"></job> <!-- job template defined before --> |
||
267 | <sources> |
||
268 | <!-- Source is the series of data selection --> |
||
269 | <source refid="cas:serie">ASA_IM__0P</source> |
||
270 | </sources> |
||
271 | <parameters> <!-- parameters of the job --> |
||
272 | <parameter id="SAT">ENV</parameter> |
||
273 | <parameter id="master">http://localhost/catalogue/sandbox/ASA_IM__0P/ASA_IM__0CNPDE20040602_091147_000000152027_00222_11799_1335.N1/rdf</parameter> |
||
274 | <parameter id="near">978992.922</parameter> |
||
275 | <parameter id="radius">6378000</parameter> |
||
276 | <parameter id="stop_on_error">true</parameter> <!-- during integration, preferably stop on error --> |
||
277 | </parameters> |
||
278 | </node> |
||
279 | </code></pre> |
||
280 | |||
281 | > The complete application.xml is available here: TBW |
||
282 | > The application.xml and all its elements are described in details in the page [[Application XML definition file]]. |
||
283 | |||
284 | 2 | Herve Caumont | Since this processing is the first in the *+workflow+* chain, it has a special source which is the serie 'ASA_IM__0P'. Practically, it means that when the *+job+* is submitted for execution, the computing framework will query the Sandbox catalogue for the data ASA_IM__0P registered as sample dataset and will prepare a list of data reference as *+input+* for the job 'PreProc'. In our example, the resulting list is: |
285 | 1 | Herve Caumont | |
286 | <pre> |
||
287 | http://localhost/catalogue/sandbox/ASA_IM__0P/ASA_IM__0CNPDE20090412_092436_000000162078_00079_37207_1556.N1/rdf |
||
288 | http://localhost/catalogue/sandbox/ASA_IM__0P/ASA_IM__0CNPAM20080427_092430_000000172068_00079_32197_3368.N1/rdf |
||
289 | </pre> |
||
290 | |||
291 | h3. 5.3 Processing trigger script |
||
292 | |||
293 | In section 1.2, we have seen that each +*job*+ must have a processing trigger which is specified in <streamingExecutable> element of the +*job template*+. In our example, this executable shall be a shell script: |
||
294 | |||
295 | <pre><code class="ruby"> |
||
296 | # FIRST OF ALL, LOAD CIOP INCLUDES |
||
297 | source ${ciop_job_include} |
||
298 | |||
299 | # If you want to have a complete debug information during implementation |
||
300 | ciop-enable-debug |
||
301 | |||
302 | # All return codes are predefined |
||
303 | SUCCESS=0 |
||
304 | ERR_BADARG=2 |
||
305 | ERR_MISSING_PREPROC_BIN=3 |
||
306 | ERR_MISSING_NEAR_PARAM=4 |
||
307 | ERR_MISSING_RADIUS_PARAM=5 |
||
308 | ERR_MISSING_MASTER_PARAM=6 |
||
309 | ERR_MISSING_SAT_PARAM=7 |
||
310 | ERR_MISSING_FD1_PARAM=8 |
||
311 | ERR_MISSING_NUMPATCH_PARAM=9 |
||
312 | ERR_INPUT_DATA_COPY=18 |
||
313 | ERR_PREPROC_ERROR=19 |
||
314 | ERR_NOOUTPUT=20 |
||
315 | DEBUG_EXIT=66 |
||
316 | |||
317 | # This functions handle the exit of the executable |
||
318 | # with the corresponding error codes and will return a short message |
||
319 | # with the termination reason. It is important to have a synthetic and brief |
||
320 | # message because it will be raised to many upper level of the computing framework |
||
321 | # up to the user interface |
||
322 | function cleanExit () |
||
323 | { |
||
324 | local retval=$? |
||
325 | local msg="" |
||
326 | case "$retval" in |
||
327 | $SUCCESS) |
||
328 | msg="Processing successfully concluded";; |
||
329 | $ERR_BADARG) |
||
330 | msg="function checklibs called with non-directory parameter, returning $res";; |
||
331 | $ERR_MISSING_PREPROC_BIN) |
||
332 | msg="binary 'pre_proc' not found in path, returning $res";; |
||
333 | $ERR_MISSING_NEAR_PARAM) |
||
334 | msg="parameter 'near_range' missing or empty, returning $res";; |
||
335 | $ERR_MISSING_RADIUS_PARAM) |
||
336 | msg="parameter 'earth_radius' missing or empty, returning $res";; |
||
337 | $ERR_MISSING_MASTER_PARAM) |
||
338 | msg="parameter 'master' missing or empty, returning $res";; |
||
339 | $ERR_MISSING_FD1_PARAM) |
||
340 | msg="parameter 'fd1' missing or empty, returning $res";; |
||
341 | $ERR_MISSING_SAT_PARAM) |
||
342 | msg="parameter 'sat' missing or empty, returning $res";; |
||
343 | $ERR_MISSING_NUMPATCH_PARAM) |
||
344 | msg="parameter 'num_patch' missing or empty, returning $res";; |
||
345 | $ERR_INPUT_DATA_COPY) |
||
346 | msg="Unable to retrieve an input file";; |
||
347 | $ERR_PREPROC_ERROR) |
||
348 | msg="Error during processing, aborting task [$res]";; |
||
349 | $ERR_NOOUTPUT) |
||
350 | msg="No output results";; |
||
351 | $DEBUG_EXIT) |
||
352 | msg="Breaking at debug exit";; |
||
353 | *) |
||
354 | msg="Unknown error";; |
||
355 | esac |
||
356 | [ "$retval" != 0 ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg" |
||
357 | exit "$retval" |
||
358 | } |
||
359 | |||
360 | # trap an exit signal to exit properly |
||
361 | trap cleanExit EXIT |
||
362 | |||
363 | # Use ciop-log to log message at different level : INFO, WARN, DEBUG |
||
364 | ciop-log "DEBUG" '##########################################################' |
||
365 | ciop-log "DEBUG" '# Set of useful environment variables #' |
||
366 | ciop-log "DEBUG" '##########################################################' |
||
367 | ciop-log "DEBUG" "TMPDIR = $TMPDIR" # The temporary directory for the task. |
||
368 | ciop-log "DEBUG" "_JOB_ID = ${_JOB_ID}" # The job id |
||
369 | ciop-log "DEBUG" "_JOB_LOCAL_DIR = ${_JOB_LOCAL_DIR}" # The job specific shared scratch space |
||
370 | ciop-log "DEBUG" "_TASK_ID = ${_TASK_ID}" # The task id |
||
371 | ciop-log "DEBUG" "_TASK_LOCAL_DIR = ${_TASK_LOCAL_DIR}" # The task specific scratch space |
||
372 | ciop-log "DEBUG" "_TASK_NUM = ${_TASK_NUM}" # The number of tasks |
||
373 | ciop-log "DEBUG" "_TASK_INDEX = ${_TASK_INDEX}" # The id of the task within the job |
||
374 | |||
375 | # Get the processing trigger directory to link binaries and libraries |
||
376 | PREPROC_BASE_DIR=`dirname $0` |
||
377 | export PATH=$PREPROC_BASE_DIR/bin:$PATH |
||
378 | export LD_LIBRARY_PATH=$PREPROC_BASE_DIR/lib:$LD_LIBRARY_PATH |
||
379 | |||
380 | ${_CIOP_APPLICATION_PATH}/GMTSAR/gmtsar_config |
||
381 | |||
382 | # Test that all my necessary binaries are accessible |
||
383 | # if not, exit with the corresponding error. |
||
384 | PREPROC_BIN=`which pre_proc_batch.csh` |
||
385 | [ -z "$PREPROC_BIN" ] && exit $ERR_MISSING_PREPROC_BIN |
||
386 | |||
387 | # Processor Environment |
||
388 | # definition and creation of input/output directory |
||
389 | OUTPUTDIR="$_TASK_LOCAL_DIR/output" # results directory |
||
390 | INPUTDIR="$_TASK_LOCAL_DIR/input" # data input directory |
||
391 | MASTERDIR="$_TASK_LOCAL_DIR/master" |
||
392 | mkdir -p $OUTPUTDIR $INPUTDIR |
||
393 | |||
394 | # Processing Variables |
||
395 | # Retrieve the near variable |
||
396 | NUMPATCH=`ciop-getparam num_patches` |
||
397 | [ $? != 0 ] && exit $ERR_MISSING_NUMPATCH_PARAM |
||
398 | NEAR=`ciop-getparam near_range` |
||
399 | [ $? != 0 ] && exit $ERR_MISSING_NEAR_PARAM |
||
400 | RADIUS=`ciop-getparam earth_radius` |
||
401 | [ $? != 0 ] && exit $ERR_MISSING_RADIUS_PARAM |
||
402 | FD1=`ciop-getparam fd1` |
||
403 | [ $? != 0 ] && exit $ERR_MISSING_FD1_PARAM |
||
404 | SAT=`ciop-getparam SAT` |
||
405 | [ $? != 0 ] && exit $ERR_MISSING_SAT_PARAM |
||
406 | MASTER=`ciop-getparam master` |
||
407 | [ $? != 0 ] && exit $ERR_MISSING_MASTER_PARAM |
||
408 | STOPONERROR=`ciop-getparam stop_on_error` |
||
409 | [ $? != 0 ] && STOPONERROR=false |
||
410 | |||
411 | # Create the batch.config parameter file |
||
412 | cat >${_TASK_LOCAL_DIR}/batch.config << EOF |
||
413 | |||
414 | num_patches = $NUMPATCH |
||
415 | earth_radius = $RADIUS |
||
416 | near_range = $NEAR |
||
417 | fd1 = $FD1 |
||
418 | |||
419 | EOF |
||
420 | |||
421 | # the parameter 'master' is a reference to a data file |
||
422 | # we need to copy it for the rest of our processing |
||
423 | # This parameter is at job level so if another prallel task on the same |
||
424 | # computing resource has already copied it, we save a useless copy |
||
425 | masterFile=`ciop-copy -c -o "$MASTERDIR" -r 10 "$MASTER"` |
||
426 | [[ -s $masterFile ]] || { |
||
427 | ciop-log "ERROR" "Unable to retrieve master input at $url" ; exit $ERR_INPUT_DATA_COPY ; |
||
428 | } |
||
429 | ciop-log "INFO" "Retrieved master input at $masterFile" |
||
430 | |||
431 | echo $masterFile >${_TASK_LOCAL_DIR}/data.in |
||
432 | |||
433 | # Begin processing loop |
||
434 | # Read line by line the input in url variable |
||
435 | while read url |
||
436 | do |
||
437 | # First we copy the data in the INPUT dire |
||
438 | ciop-log "INFO" "Copying data $url" "preproc" |
||
439 | # ciop-copy $url in $INPUTDIR and retry 10 times in case of failure |
||
440 | # local path of the copied file is returned in the $tmpFile variable |
||
441 | tmpFile=`ciop-copy -o "$INPUTDIR" -r 10 "$url"` |
||
442 | [[ -s $tmpFile ]] || { |
||
443 | ciop-log "ERROR" "Unable to retrieve inputfile at $url" ; |
||
444 | [[ $STOPONERROR == true ]] && exit $ERR_INPUT_DATA_COPY ; |
||
445 | } |
||
446 | ciop-log "INFO" "Retrieved inputfile $tmpFile" |
||
447 | |||
448 | echo $tmpFile >${_TASK_LOCAL_DIR}/data.in |
||
449 | |||
450 | done |
||
451 | |||
452 | # here we start the processing of the stack of data |
||
453 | ciop-log "INFO" "Processing stack of data" "preproc" |
||
454 | ciop-log "DEBUG" "$PREPROC_BIN $SAT data.in batch.config" |
||
455 | |||
456 | $PREPROC_BIN $SAT ${_TASK_LOCAL_DIR}/data.in ${_TASK_LOCAL_DIR}/batch.config >$OUTPUTDIR/preproc.log 2>&1 |
||
457 | rcpp=$? |
||
458 | |||
459 | if [ "$rcpp" != 0 ]; then |
||
460 | ciop-log "ERROR" "$PREPROC_BIN failed to process, return code $?" "preproc" |
||
461 | cat $OUTPUTDIR/preproc.log >&2 |
||
462 | exit $ERR_PREPROC_ERROR |
||
463 | fi |
||
464 | |||
465 | ciop-log "INFO" "Processing complete" "preproc" |
||
466 | |||
467 | # The the results are "published" for next job |
||
468 | # Practically, the output shall be published to a job shared space |
||
469 | # and the directory referenced as an url for next job |
||
470 | ciop-publish $OUTPUTDIR/ |
||
471 | |||
472 | exit 0 |
||
473 | |||
474 | </code></pre> |
||
475 | |||
476 | > /\ !!! Keep in mind that the execution shall take place in a non-interactive environment so +error catching+ and +logging+ are very important. They enforce the robustness of your application and avoid loosing time later in debugging !!! |
||
477 | |||
478 | Here is a summary of the framework tools used in this script with their eventual online help: |
||
479 | * *source ${ciop_job_include}* --> include library for many functions such as ciop-log, ciop-enable-debug and ciop-getparam |
||
480 | * *ciop-enable-debug* --> this enable the DEBUG level for logging system, otherwise just INFO and WARN message are displayed |
||
481 | * *ciop-log* --> log message both in interactive computing framework and in processing stdout/err files. [[ciop-log CLI reference|ciop-log usage]] |
||
482 | * *ciop-getparam* --> retrieve job parameter. [[ciop-getparam CLI reference|ciop-getparam usage]] |
||
483 | 2 | Herve Caumont | * *ciop-catquery* --> query the EO Catalogue of the Sandbox. [[ciop-catquery CLI reference|ciop-catquery usage]] |
484 | 1 | Herve Caumont | * *ciop-copy* --> copy remote file to local directory. [[ciop-copy CLI reference|ciop-copy usage]] |
485 | * *ciop-publish* --> copy *+task+* result files in +*workflow*+ shared space. [[ciop-publish CLI reference|ciop-publish usage]] |
||
486 | |||
487 | h3. 5.4 Simulating a single job of the workflow |
||
488 | |||
489 | *ciop-simjob* --> simulates the execution of one processing +*job*+ of the +*work=flow*+. [[ciop-simjob CLI reference|ciop-simjob usage]] |
||
490 | |||
491 | We will use it to test the first processing block of GMTSAR: |
||
492 | |||
493 | <pre><code class="ruby">ciop-simjob -f PreProc</code></pre> |
||
494 | |||
495 | This will output to _stdout_ the URL of the Hadoop Map/Reduce job. Open the link to check if the processing is correctly executed. |
||
496 | The command will show the progress messages: |
||
497 | |||
498 | <pre> |
||
499 | Deleted hdfs://sb-10-10-14-24.lab14.sandbox.ciop.int:8020/tmp/sandbox/sample/input.0 |
||
500 | rmr: cannot remove /tmp/sandbox/sample/PreProc/logs: No such file or directory. |
||
501 | mkdir: cannot create directory /tmp/sandbox/sample/PreProc: File exists |
||
502 | Deleted hdfs://sb-10-10-14-24.lab14.sandbox.ciop.int:8020/tmp/sandbox/sample/workflow-params.xml |
||
503 | Submitting job 25764 ... |
||
504 | 12/11/21 12:26:56 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead. |
||
505 | packageJobJar: [/var/lib/hadoop-0.20/cache/emathot/hadoop-unjar5187515757952179540/] [] /tmp/streamjob7738227981987732817.jar tmpDir=null |
||
506 | 12/11/21 12:26:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
||
507 | 12/11/21 12:26:58 WARN snappy.LoadSnappy: Snappy native library not loaded |
||
508 | 12/11/21 12:26:58 INFO mapred.FileInputFormat: Total input paths to process : 1 |
||
509 | 12/11/21 12:26:58 INFO streaming.StreamJob: getLocalDirs(): [/var/lib/hadoop-0.20/cache/emathot/mapred/local] |
||
510 | 12/11/21 12:26:58 INFO streaming.StreamJob: Running job: job_201211101342_0045 |
||
511 | 12/11/21 12:26:58 INFO streaming.StreamJob: To kill this job, run: |
||
512 | 12/11/21 12:26:58 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=sb-10-10-14-24.lab14.sandbox.ciop.int:8021 -kill job_201211101342_0045 |
||
513 | 12/11/21 12:26:58 INFO streaming.StreamJob: Tracking URL: http://sb-10-10-14-24.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_201211101342_0045 |
||
514 | 12/11/21 12:26:59 INFO streaming.StreamJob: map 0% reduce 0% |
||
515 | 12/11/21 12:27:06 INFO streaming.StreamJob: map 17% reduce 0% |
||
516 | 12/11/21 12:27:07 INFO streaming.StreamJob: map 33% reduce 0% |
||
517 | 12/11/21 12:27:13 INFO streaming.StreamJob: map 67% reduce 0% |
||
518 | 12/11/21 12:27:18 INFO streaming.StreamJob: map 83% reduce 0% |
||
519 | 12/11/21 12:27:19 INFO streaming.StreamJob: map 100% reduce 0% |
||
520 | 12/11/21 12:27:24 INFO streaming.StreamJob: map 100% reduce 33% |
||
521 | 12/11/21 12:27:27 INFO streaming.StreamJob: map 100% reduce 100% |
||
522 | ^@12/11/21 12:28:02 INFO streaming.StreamJob: map 100% reduce 0% |
||
523 | 12/11/21 12:28:05 INFO streaming.StreamJob: map 100% reduce 100% |
||
524 | 12/11/21 12:28:05 INFO streaming.StreamJob: To kill this job, run: |
||
525 | 12/11/21 12:28:05 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=sb-10-10-14-24.lab14.sandbox.ciop.int:8021 -kill job_201211101342_0045 |
||
526 | 12/11/21 12:28:05 INFO streaming.StreamJob: Tracking URL: http://sb-10-10-14-24.lab14.sandbox.ciop.int:50030/jobdetails.jsp?jobid=job_201211101342_0045 |
||
527 | 12/11/21 12:28:05 ERROR streaming.StreamJob: Job not successful. Error: NA |
||
528 | 12/11/21 12:28:05 INFO streaming.StreamJob: killJob... |
||
529 | Streaming Command Failed! |
||
530 | [INFO ][log] All data, output and logs available at /share//tmp/sandbox/sample/PreProc |
||
531 | </pre> |
||
532 | |||
533 | At this point you can use your browser to display the URL of the _*Traking URL*_ |
||
534 | |||
535 | !https://ciop.eo.esa.int/attachments/46/single_job_debug_1.png! |
||
536 | |||
537 | This is a single thread job, see the reduce in the table |
||
538 | You can click on the kill job in the reduce line. The page shows the task attempt, usually one. |
||
539 | |||
540 | !https://ciop.eo.esa.int/attachments/47/single_job_debug_2.png! |
||
541 | |||
542 | In this case, the job ended with exit code 19 |
||
543 | |||
544 | Click on the task link, the same info as before is shown but there are more details (e.g. log in the last column) |
||
545 | |||
546 | You can click on _*all*_ |
||
547 | |||
548 | _stout_ and _stderr_ appears, you can debug the processing job with this information. |
||
549 | |||
550 | h3. 5.6 -- Simulating a complete workflow -- (Section under revision) |
||
551 | |||
552 | h2. 6. -- Application deployment -- (section under revision) |
||
553 | |||
554 | This section describes the procedure to deploy your application once ready and successfully integrated. |
||
555 | |||
556 | h3. 6.1 Deploy as a service |
||
557 | |||
558 | h3. 6.2 Test application in pre-operation |
||
559 | |||
560 | h3. 6.3 Plan the production |