Project

General

Profile

Lib-beam » History » Version 1

Herve Caumont, 2013-06-18 14:49

1 1 Herve Caumont
h1. BEAM Arithm tutorial
2
3
{{>toc}}
4
5
h2. Introduction
6
7
BEAM is an open-source toolbox and development platform for viewing, analysing and processing of remote sensing raster data. Originally developed to facilitate the utilisation of image data from Envisat's optical instruments, BEAM now supports a growing number of other raster data formats such as GeoTIFF and NetCDF as well as data formats of other EO sensors such as MODIS, AVHRR, AVNIR, PRISM and CHRIS/Proba. Various data and algorithms are supported by dedicated extension plug-ins.
8
9
BEAM Graph Processing Tool (gpt) is a tool used to execute BEAM raster data operators in batch-mode. The operators can be used stand-alone or combined as a directed acyclic graph (DAG). Processing graphs are represented using XML.
10
11
Our tutorial uses the BandMaths operator and the Level 3 Binning Processor applied to Envisat MERIS Level 1 Reduced Resolution products to create an application to represent algal blooms.
12
13
> Definition (source Wikipedia): An algal bloom is a rapid increase or accumulation in the population of algae (typically microscopic) in an aquatic system. Algal blooms may occur in freshwater as well as marine environments. Typically, only one or a small number of phytoplankton species are involved, and some blooms may be recognized by discoloration of the water resulting from the high density of pigmented cells. Although there is no officially recognized threshold level, algae can be considered to be blooming at concentrations of hundreds to thousands of cells per milliliter, depending on the severity. Algal bloom concentrations may reach millions of cells per milliliter. Algal blooms are often green, but they can also be other colors such as yellow-brown or red, depending on the species of algae.
14
15
h2. The application
16
17
As introduced above, our applications uses the *BandMaths* operator and the *Level 3 Binning* processor.
18
19
h3. The BandMaths Operator
20
21
The *BandMaths* operator can be used to create a product with multiple bands based on mathematical expression. All products specified as source must have the same width and height, otherwise the operator will fail. The geo-coding information and metadata for the target product is taken from the first source product.  
22
23
In our application we will apply the mathematical expression below to all input MERIS Level 1 Reduced Resolution products to detect the algal blooms:
24
25
<pre>
26
l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)
27
</pre>
28
29
h3. The Level 3 Binning Processor
30
31
The term binning refers to the process of distributing the contributions of Level 2 pixels in satellite coordinates to a fixed Level 3 grid using a geographic reference system. In most cases a sinusoidal projection is used to realize Level 3 grid comprising a fixed number of equal area bins with global coverage. This is for example true for the SeaWiFS Level 3 products.
32
33
As long as the area of an input pixel is small compared to the area of a bin, a simple binning is sufficient. In this case, the geodetic center coordinate of the Level 2 pixel is used to find the bin in the Level 3 grid whose area is intersected by this point. If the area of the contributing pixel is equal or even larger than the bin area, this simple binning will produce composites with insufficient accuracy and visual artefacts such as Moiré effects will dominate the resulting datasets.
34
35
h3. The application workflow
36
37
Our application can be described as an activity diagram where the BandMaths operator is applied to all input MERIS Level 1 products whose outputs are used as inputs to the Level 3 Binning processor. Since the BandMaths operator is an independent chore, each MERIS Level 1 can be processed in parallel. The Level 3 binning processor instead needs all the outputs to increment the values of the bins and generate the level 3 product.
38
39
h2. BeamArithm implementation
40
 
41
h3. Tutorial approach
42
 
43
The goal of this tutorial is to get you acquainted to CIOP as an environment to implement scientific applications. 
44
The driver is to analyse the implemented application rather than have you install software, edit files, copy data etc. All these steps have been already done!
45
 
46
h3. Tutorial requirements
47
 
48
You need access to a running sandbox. The procedure to start a sandbox is outside the scope of this tutorial.
49
 
50
h3. Tutorial files and artifacts installation on the sandbox
51
 
52
Log on your sandbox. List the available tutorials with:
53
 
54
<pre>
55
[user@sb ~]$ ciop-tutorial list 
56
</pre>
57
 
58
This will list the available tutorials:
59
 
60
<pre>
61
...
62
beam-arithm
63
...
64
</pre>
65
66
Get the tutorial description with:
67
68
<pre>
69
[user@sb ~]$ ciop-tutorial info  beam-arithm
70
</pre>
71
 
72
This displays the tutorial information:
73
 
74
<pre>
75
TBW
76
</pre>
77
 
78
Install the tutorial:
79
 
80
<pre>
81
[user@sb ~]$ ciop-tutorial install beam-arithm
82
</pre>
83
 
84
This will take a few minutes. Once the installation is concluded you will the BeamArithm application ready to run.
85
 
86
> Tip: check the [[ciop-tutorial]] CLI reference 
87
88
h3. Execute the BeamArithm processing steps one by one
89
 
90
CIOP allows you to process independently the nodes of the workflow.
91
 
92
> It may sound obvious but to run the second node of the workflow, the first node has to have run successfully at least once
93
 
94
List the nodes of the workflow
95
 
96
<pre>
97
[user@sb ~]$ ciop-simjob -n
98
</pre>
99
 
100
This will output:
101
 
102
<pre>
103
node_expression
104
node_binning
105
</pre>
106
 
107
where *node_expression* is the _BandMaths operator_ and the *node_binning* is the _Level 3 Binning Processor_.
108
109
> Tip: check the [[ciop-simjob]] CLI reference 
110
 
111
Execute the *node_expression* workflow node:
112
 
113
<pre>
114
[user@sb ~]$ ciop-simjob node_expression
115
</pre>
116
 
117
The CIOP framework will take the MERIS Level 1 products and execute the BEAM BandMaths operator taking advantage of the Hadoop Map/Reduce cluster.
118
The output of the command above provides you with a tracking URL. Open it on your favorite browser. 
119
 
120
> Tip: if you execute a node more than once, do not forget to use the flag -f to remove the results of the previous execution
121
 
122
After a few minutes, the outputs generated will be listed as hdfs resources. You can inspect the results in the HDFS mount point of your sandbox with:
123
 
124
<pre>
125
[user@sb ~]$ ls -l /share/tmp/sandbox/node_expression/data
126
</pre>
127
 
128
> Tip: remember CIOP relies of the Hadoop HDFS distributed storage to manage input and output data. More information about the sandbox: [[Understanding the sandbox]]
129
 
130
Now, execute the *node_binning* node:
131
 
132
<pre>
133
[user@sb ~]$ ciop-simjob node_binning
134
</pre>
135
 
136
As for the *node_expression*, after a few minutes, the list of the generated products is shown.
137
138
h3. Execute the BeamArithm processing workflow
139
 
140
While executing the single nodes can be very practical for debugging the application individual processing steps, CIOP allows you to process the entire workflow automatically. To do so run the command:
141
 
142
<pre>
143
[user@sb ~]$ ciop-simwf
144
</pre>
145
146
This will display an ascii status output of the application workflow execution. 
147
148
> Tip: check the [[ciop-simwf]] CLI reference 
149
150
> Tip: each workflow run has a unique identifier and the results of a run are never overwritten when executing the workflow again.
151
152
After a few minutes, the same outputs are generated and available as hdfs resources. As for the single node execution, these resources can be accessed on the HDFS mount point.
153
To do so, you need the run identifier. Obtain it with the command:
154
155
<pre>
156
[user@sb ~]$ ciop-simwf -l
157
</pre> 
158
159
You should have a single run identifier. Use it to list the generated results:
160
161
<pre>
162
[user@sb ~]$ ls -l /share/tmp/sandbox/run/<run identifier>/node_binning/data
163
</pre>
164
165
This will list the same results as the single *node_binning* execution.
166
167
Now that you have seen CIOP manage the BeamArithm application, we will go through all files composing the application.
168
169
h2. BeamArithm CIOP application breakdown
170
171
h3. The application descriptor file: application.xml
172
173
Each CIOP application is described with an application descriptor file. This file is always named application.xml and is found in the /application file system in your sandbox.
174
175
This file contains two main sections: 
176
* a section where the application job templates are described
177
* a section with the workflow definition combining the job templates as a DAG
178
179
> Tip: check the DAG definition here: [[CIOP terminology and definitions]]
180
181
> Tip: learn about the application descriptor file here: [[Understanding the sandbox]]
182
 
183
h4. Job templates
184
185
The listing below show the job templates section of the application descriptor file.
186
187
<pre><code class="xml">
188
<jobTemplates>
189
		<!-- BEAM BandMaths operator job template  -->
190
		<jobTemplate id="expression">
191
			<streamingExecutable>/application/expression/run</streamingExecutable>
192
			<defaultParameters>						
193
				<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
194
			</defaultParameters>
195
		</jobTemplate>
196
		<!-- BEAM Level 3 processor job template  -->
197
		<jobTemplate id="binning">
198
			<streamingExecutable>/application/binning/run</streamingExecutable>
199
			<defaultParameters>						
200
				<parameter id="cellsize">9.28</parameter>
201
				<parameter id="bandname">out</parameter>
202
				<parameter id="bitmask">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
203
				<parameter id="bbox">-180,-90,180,90</parameter>
204
				<parameter id="algorithm">Minimum/Maximum</parameter>
205
				<parameter id="outputname">binned</parameter>
206
				<parameter id="resampling">binning</parameter>
207
				<parameter id="palette">#MCI_Palette
208
color0=0,0,0
209
color1=0,0,154
210
color2=54,99,250
211
color3=110,201,136
212
color4=166,245,8
213
color5=222,224,0
214
color6=234,136,0
215
color7=245,47,0
216
color8=255,255,255
217
numPoints=9
218
sample0=98.19878118960284
219
sample1=98.64947122314665
220
sample2=99.10016125669047
221
sample3=99.5508512902343
222
sample4=100.0015413237781
223
sample5=100.4522313573219
224
sample6=100.90292139086574
225
sample7=101.35361142440956
226
sample8=101.80430145795337</parameter>
227
				<parameter id="band">1</parameter>
228
				<parameter id="tailor">true</parameter>
229
			</defaultParameters>
230
			<defaultJobconf>
231
		        	<property id="ciop.job.max.tasks">1</property>
232
		        </defaultJobconf>
233
		</jobTemplate>
234
	</jobTemplates>
235
</code></pre>
236
237
> Tip: check the validity of the application descriptor file with [[ciop-appcheck]]
238
239
> Tip: learn more about the application descriptor file here: [[Understanding the sandbox]]
240
241
Each *job template* has the mandatory element defining the streaming executable.
242
243
Example: the streaming executable for the job template *expression* is:
244
245
<pre><code class="xml">
246
<streamingExecutable>/application/expression/run</streamingExecutable>
247
</code></pre>
248
249
> Tip: do not forget to _chmod_ the streaming executable with executable rights, e.g. @chmod 755 /application/expression/run@
250
Both job templates, expression and binning, define a set of defaults parameters.
251
252
Example: the job template expression defines a default expression for the parameter *expression*:
253
254
<pre><code class="xml">
255
<defaultParameters>						
256
	<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
257
</defaultParameters>
258
</code></pre>
259
260
The job template *binning* defines the default job configuration.
261
262
As explained above, the job template *binning* does a temporal and spatial aggregation of the *expression* job outputs. The *binning* job will thus be a single job instance (the •expression* job instead exploits the parallelism offered by CIOP).
263
To express such a job configuration we've added the XML tags:
264
265
<pre><code class="xml">
266
<defaultJobconf>
267
	<property id="ciop.job.max.tasks">1</property>
268
</defaultJobconf>
269
</code></pre>
270
271
> Tip: for a list of possible of job default properties read [[Application descriptor]]
272
273
h4. Streaming executable for expression job template
274
275
It is important to keep in mind that job input and ouput are text references (e.g. to data). 
276
Indeed, when a job process input, it actually reads line by line the reference workflow and the child job will read via _stdin_ the outputs of the parent job (or nay other source e.g. catalogued data).
277
278
> Tip: if you need to combine results of a parent job with other values (e.g. bounding boxes to process over a set on input products) you will have to add a simple job that combines the outputs and values
279
280
The job template *expression* steaming executable is Bourne Again SHell (bash) script:
281
282
<pre>
283
#!/bin/bash
284
285
# source the ciop functions (e.g. ciop-log)
286
source ${ciop_job_include}
287
288
export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
289
export PATH=$BEAM_HOME/bin:$PATH
290
291
# define the exit codes
292
SUCCESS=0
293
ERR_NOINPUT=1
294
ERR_BEAM=2
295
ERR_NOPARAMS=5
296
297
# add a trap to exit gracefully
298
function cleanExit ()
299
{
300
   local retval=$?
301
   local msg=""
302
   case "$retval" in
303
     $SUCCESS)      msg="Processing successfully concluded";;
304
     $ERR_NOPARAMS) msg="Expression not defined";;
305
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
306
     *)             msg="Unknown error";;
307
   esac
308
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg"
309
   exit $retval
310
}
311
trap cleanExit EXIT
312
313
# create the output folder to store the output products
314
mkdir -p $TMPDIR/output
315
export OUTPUTDIR=$TMPDIR/output
316
317
# retrieve the parameters value from workflow or job default value
318
expression="`ciop-getparam expression`"
319
320
# run a check on the expression value, it can't be empty
321
[ -z "$expression" ] && exit $ERR_NOPARAMS
322
323
324
# loop and process all MERIS products
325
while read inputfile 
326
do
327
	# report activity in log
328
	ciop-log "INFO" "Retrieving $inputfile from storage"
329
330
	# retrieve the remote geotiff product to the local temporary folder
331
	retrieved=`ciop-copy -o $TMPDIR $inputfile`
332
	
333
	# check if the file was retrieved
334
	[ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT
335
	
336
	# report activity
337
	ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression"
338
	outputname=`basename $retrieved`
339
340
	BEAM_REQUEST=$TMPDIR/beam_request.xml
341
cat << EOF > $BEAM_REQUEST
342
<?xml version="1.0" encoding="UTF-8"?>
343
<graph>
344
  <version>1.0</version>
345
  <node id="1">
346
    <operator>Read</operator>
347
      <parameters>
348
        <file>$retrieved</file>
349
      </parameters>
350
  </node>
351
  <node id="2">
352
    <operator>BandMaths</operator>
353
    <sources>
354
      <source>1</source>
355
    </sources>
356
    <parameters>
357
      <targetBands>
358
        <targetBand>
359
          <name>out</name>
360
          <expression>$expression</expression>
361
          <description>Processed Band</description>
362
          <type>float32</type>
363
        </targetBand>
364
      </targetBands>
365
    </parameters>
366
  </node>
367
  <node id="write">
368
    <operator>Write</operator>
369
    <sources>
370
       <source>2</source>
371
    </sources>
372
    <parameters>
373
      <file>$OUTPUTDIR/$outputname</file>
374
   </parameters>
375
  </node>
376
</graph>
377
EOF
378
   gpt.sh $BEAM_REQUEST &> /dev/null
379
   res=$?
380
   [ $res != 0 ] && exit $ERR_BEAM
381
382
	cd $OUTPUTDIR
383
	
384
	outputname=`echo $(basename $retrieved)`.dim
385
	outputfolder=`echo $(basename $retrieved)`.data
386
387
	tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
388
	cd - &> /dev/null
389
	
390
	ciop-log "INFO" "Publishing $outputname.dim and $outputname.data"
391
	ciop-publish $OUTPUTDIR/$outputname.tgz
392
	cd - &> /dev/null	
393
	
394
	# cleanup
395
	rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 
396
397
done
398
399
exit 0
400
</pre>
401
402
The first line tells Linux to use the bash interpreter to run this script. 
403
404
> Tip: always set the interpreter, there is no other way to tell CIOP how to execute the streaming executable
405
406
The block after is mandatory as it defines the CIOP functions (ciop-log, ciop-getparam, etc.) needed to write the streaming executable script:
407
408
<pre>
409
# source the ciop functions (e.g. ciop-log)
410
source ${ciop_job_include}
411
</pre>
412
413
After that, we set a few environment variables needed to have BEAM working:
414
415
<pre>
416
export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
417
export PATH=$BEAM_HOME/bin:$PATH
418
</pre>
419
420
After that, we set the error handling. Although this block is not mandatory, it is a good practice to set clear error codes and use a _trap_ function:
421
422
<pre>
423
# define the exit codes
424
SUCCESS=0
425
ERR_NOINPUT=1
426
ERR_BEAM=2
427
ERR_NOPARAMS=5
428
429
# add a trap to exit gracefully
430
function cleanExit ()
431
{
432
   local retval=$?
433
   local msg=""
434
   case "$retval" in
435
     $SUCCESS)      msg="Processing successfully concluded";;
436
     $ERR_NOPARAMS) msg="Expression not defined";;
437
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
438
     *)             msg="Unknown error";;
439
   esac
440
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg"
441
   exit $retval
442
}
443
trap cleanExit EXIT
444
</pre>
445
446
CIOP framework provides a temporary location unique to the job/parameter execution (very important if more than one processing node is used). 
447
In it, we'll define where our results will be written:
448
449
<pre>
450
# create the output folder to store the output products
451
mkdir -p $TMPDIR/output
452
export OUTPUTDIR=$TMPDIR/output
453
</pre>
454
455
Then, we read the processing parameters using ciop-getparam and do a simple check on the value (it cannot be empty):
456
457
<pre>
458
# retrieve the parameters value from workflow or job default value
459
expression="`ciop-getparam expression`"
460
461
# run a check on the expression value, it can't be empty
462
[ -z "$expression" ] && exit $ERR_NOPARAMS
463
</pre>
464
465
At this point we loop the input MERIS Level 1 products and copy them locally to the TMPDIR location:
466
467
<pre>
468
# loop and process all MERIS products
469
while read inputfile 
470
do
471
	# report activity in log
472
	ciop-log "INFO" "Retrieving $inputfile from storage"
473
474
	# retrieve the remote geotiff product to the local temporary folder
475
	retrieved=`ciop-copy -o $TMPDIR $inputfile`
476
	
477
	# check if the file was retrieved
478
	[ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT
479
	
480
	...
481
done	
482
</pre>
483
484
> Tip: always report activity using ciop-log, if you don't report activity CIOP will kill the process if the walltime is reached
485
486
We finally apply the BandMaths operator to the retrieved MERIS Level 1 product:
487
488
<pre>
489
# loop and process all MERIS products
490
while read inputfile 
491
do
492
	...
493
494
	# report activity
495
	ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression"
496
	outputname=`basename $retrieved`
497
498
	BEAM_REQUEST=$TMPDIR/beam_request.xml
499
cat << EOF > $BEAM_REQUEST
500
<?xml version="1.0" encoding="UTF-8"?>
501
<graph>
502
  <version>1.0</version>
503
  <node id="1">
504
    <operator>Read</operator>
505
      <parameters>
506
        <file>$retrieved</file>
507
      </parameters>
508
  </node>
509
  <node id="2">
510
    <operator>BandMaths</operator>
511
    <sources>
512
      <source>1</source>
513
    </sources>
514
    <parameters>
515
      <targetBands>
516
        <targetBand>
517
          <name>out</name>
518
          <expression>$expression</expression>
519
          <description>Processed Band</description>
520
          <type>float32</type>
521
        </targetBand>
522
      </targetBands>
523
    </parameters>
524
  </node>
525
  <node id="write">
526
    <operator>Write</operator>
527
    <sources>
528
       <source>2</source>
529
    </sources>
530
    <parameters>
531
      <file>$OUTPUTDIR/$outputname</file>
532
   </parameters>
533
  </node>
534
</graph>
535
EOF
536
   gpt.sh $BEAM_REQUEST &> /dev/null
537
   res=$?
538
   [ $res != 0 ] && exit $ERR_BEAM
539
540
   ...
541
   
542
done
543
544
At this stage the produced results are packaged and published in the CIOP distirbuted filesystem and available for the *binnning* job using ciop-publish:
545
546
<pre>
547
# loop and process all MERIS products
548
while read inputfile 
549
do
550
	...
551
	
552
	cd $OUTPUTDIR
553
	
554
	outputname=`echo $(basename $retrieved)`.dim
555
	outputfolder=`echo $(basename $retrieved)`.data
556
557
	tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
558
	cd - &> /dev/null
559
	
560
	ciop-log "INFO" "Publishing $outputname.dim and $outputname.data"
561
	ciop-publish $OUTPUTDIR/$outputname.tgz
562
	cd - &> /dev/null	
563
	
564
	# cleanup
565
	rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 
566
done
567
</pre>
568
569
Tip: ciop-publish does more than a simple copy of data, it also "echoes" the destination URL and this string(s) will be used as input for the *binning* job
570
571
572
This concludes the tutorial.