Project

General

Profile

Lib-beam » History » Version 3

Herve Caumont, 2013-06-20 10:50

1 1 Herve Caumont
h1. BEAM Arithm tutorial
2
3
{{>toc}}
4
5 3 Herve Caumont
BEAM - Basic ERS & Envisat (A) ATSR and Meris
6
7 1 Herve Caumont
h2. Introduction
8
9 3 Herve Caumont
BEAM is an open-source toolbox and development platform for viewing, analyzing and processing of remote sensing raster data. Originally developed to facilitate the utilization of image data from Envisat's optical instruments, BEAM now supports a growing number of other raster data formats such as GeoTIFF and NetCDF as well as data formats of other EO sensors such as MODIS, AVHRR, AVNIR, PRISM and CHRIS/Proba. Various data and algorithms are supported by dedicated extension plug-ins.
10 1 Herve Caumont
11
BEAM Graph Processing Tool (gpt) is a tool used to execute BEAM raster data operators in batch-mode. The operators can be used stand-alone or combined as a directed acyclic graph (DAG). Processing graphs are represented using XML.
12
13
Our tutorial uses the BandMaths operator and the Level 3 Binning Processor applied to Envisat MERIS Level 1 Reduced Resolution products to create an application to represent algal blooms.
14
15
> Definition (source Wikipedia): An algal bloom is a rapid increase or accumulation in the population of algae (typically microscopic) in an aquatic system. Algal blooms may occur in freshwater as well as marine environments. Typically, only one or a small number of phytoplankton species are involved, and some blooms may be recognized by discoloration of the water resulting from the high density of pigmented cells. Although there is no officially recognized threshold level, algae can be considered to be blooming at concentrations of hundreds to thousands of cells per milliliter, depending on the severity. Algal bloom concentrations may reach millions of cells per milliliter. Algal blooms are often green, but they can also be other colors such as yellow-brown or red, depending on the species of algae.
16
17
h2. The application
18
19
As introduced above, our applications uses the *BandMaths* operator and the *Level 3 Binning* processor.
20
21
h3. The BandMaths Operator
22
23
The *BandMaths* operator can be used to create a product with multiple bands based on mathematical expression. All products specified as source must have the same width and height, otherwise the operator will fail. The geo-coding information and metadata for the target product is taken from the first source product.  
24
25
In our application we will apply the mathematical expression below to all input MERIS Level 1 Reduced Resolution products to detect the algal blooms:
26
27
<pre>
28
l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)
29
</pre>
30
31
h3. The Level 3 Binning Processor
32
33
The term binning refers to the process of distributing the contributions of Level 2 pixels in satellite coordinates to a fixed Level 3 grid using a geographic reference system. In most cases a sinusoidal projection is used to realize Level 3 grid comprising a fixed number of equal area bins with global coverage. This is for example true for the SeaWiFS Level 3 products.
34
35
As long as the area of an input pixel is small compared to the area of a bin, a simple binning is sufficient. In this case, the geodetic center coordinate of the Level 2 pixel is used to find the bin in the Level 3 grid whose area is intersected by this point. If the area of the contributing pixel is equal or even larger than the bin area, this simple binning will produce composites with insufficient accuracy and visual artefacts such as Moiré effects will dominate the resulting datasets.
36
37
h3. The application workflow
38
39
Our application can be described as an activity diagram where the BandMaths operator is applied to all input MERIS Level 1 products whose outputs are used as inputs to the Level 3 Binning processor. Since the BandMaths operator is an independent chore, each MERIS Level 1 can be processed in parallel. The Level 3 binning processor instead needs all the outputs to increment the values of the bins and generate the level 3 product.
40
41
h2. BeamArithm implementation
42
 
43
h3. Tutorial approach
44
 
45
The goal of this tutorial is to get you acquainted to CIOP as an environment to implement scientific applications. 
46 2 Herve Caumont
The driver is to analyze the implemented application rather than have you install software, edit files, copy data etc. All these steps have been already done!
47 1 Herve Caumont
 
48
h3. Tutorial requirements
49
 
50 2 Herve Caumont
You need access to a running Sandbox. The procedure to start a Sandbox is outside the scope of this tutorial.
51 1 Herve Caumont
 
52 2 Herve Caumont
h3. Tutorial files and artifacts installation on the Sandbox
53 1 Herve Caumont
 
54 2 Herve Caumont
Log on your Sandbox. List the available tutorials with:
55 1 Herve Caumont
 
56
<pre>
57
[user@sb ~]$ ciop-tutorial list 
58
</pre>
59
 
60
This will list the available tutorials:
61
 
62
<pre>
63
...
64
beam-arithm
65
...
66
</pre>
67
68
Get the tutorial description with:
69
70
<pre>
71
[user@sb ~]$ ciop-tutorial info  beam-arithm
72
</pre>
73
 
74
This displays the tutorial information:
75
 
76
<pre>
77
TBW
78
</pre>
79
 
80
Install the tutorial:
81
 
82
<pre>
83
[user@sb ~]$ ciop-tutorial install beam-arithm
84
</pre>
85
 
86 2 Herve Caumont
This will take a few minutes. Once the installation is concluded you get the BeamArithm application ready to run.
87 1 Herve Caumont
 
88 2 Herve Caumont
> Tip: check the [[ciop-tutorial]] Command Line (CLI) reference (UPCOMING)
89 1 Herve Caumont
90
h3. Execute the BeamArithm processing steps one by one
91
 
92
CIOP allows you to process independently the nodes of the workflow.
93
 
94
> It may sound obvious but to run the second node of the workflow, the first node has to have run successfully at least once
95
 
96
List the nodes of the workflow
97
 
98
<pre>
99
[user@sb ~]$ ciop-simjob -n
100
</pre>
101
 
102
This will output:
103
 
104
<pre>
105
node_expression
106
node_binning
107
</pre>
108
 
109
where *node_expression* is the _BandMaths operator_ and the *node_binning* is the _Level 3 Binning Processor_.
110
111
> Tip: check the [[ciop-simjob]] CLI reference 
112
 
113
Execute the *node_expression* workflow node:
114
 
115
<pre>
116
[user@sb ~]$ ciop-simjob node_expression
117
</pre>
118
 
119
The CIOP framework will take the MERIS Level 1 products and execute the BEAM BandMaths operator taking advantage of the Hadoop Map/Reduce cluster.
120
The output of the command above provides you with a tracking URL. Open it on your favorite browser. 
121
 
122
> Tip: if you execute a node more than once, do not forget to use the flag -f to remove the results of the previous execution
123
 
124
After a few minutes, the outputs generated will be listed as hdfs resources. You can inspect the results in the HDFS mount point of your sandbox with:
125
 
126
<pre>
127
[user@sb ~]$ ls -l /share/tmp/sandbox/node_expression/data
128
</pre>
129
 
130
> Tip: remember CIOP relies of the Hadoop HDFS distributed storage to manage input and output data. More information about the sandbox: [[Understanding the sandbox]]
131
 
132
Now, execute the *node_binning* node:
133
 
134
<pre>
135
[user@sb ~]$ ciop-simjob node_binning
136
</pre>
137
 
138
As for the *node_expression*, after a few minutes, the list of the generated products is shown.
139
140
h3. Execute the BeamArithm processing workflow
141
 
142
While executing the single nodes can be very practical for debugging the application individual processing steps, CIOP allows you to process the entire workflow automatically. To do so run the command:
143
 
144
<pre>
145
[user@sb ~]$ ciop-simwf
146
</pre>
147
148
This will display an ascii status output of the application workflow execution. 
149
150
> Tip: check the [[ciop-simwf]] CLI reference 
151
152
> Tip: each workflow run has a unique identifier and the results of a run are never overwritten when executing the workflow again.
153
154
After a few minutes, the same outputs are generated and available as hdfs resources. As for the single node execution, these resources can be accessed on the HDFS mount point.
155
To do so, you need the run identifier. Obtain it with the command:
156
157
<pre>
158
[user@sb ~]$ ciop-simwf -l
159
</pre> 
160
161
You should have a single run identifier. Use it to list the generated results:
162
163
<pre>
164
[user@sb ~]$ ls -l /share/tmp/sandbox/run/<run identifier>/node_binning/data
165
</pre>
166
167
This will list the same results as the single *node_binning* execution.
168
169
Now that you have seen CIOP manage the BeamArithm application, we will go through all files composing the application.
170
171
h2. BeamArithm CIOP application breakdown
172
173
h3. The application descriptor file: application.xml
174
175
Each CIOP application is described with an application descriptor file. This file is always named application.xml and is found in the /application file system in your sandbox.
176
177
This file contains two main sections: 
178
* a section where the application job templates are described
179
* a section with the workflow definition combining the job templates as a DAG
180
181
> Tip: check the DAG definition here: [[CIOP terminology and definitions]]
182
183
> Tip: learn about the application descriptor file here: [[Understanding the sandbox]]
184
 
185
h4. Job templates
186
187
The listing below show the job templates section of the application descriptor file.
188
189
<pre><code class="xml">
190
<jobTemplates>
191
		<!-- BEAM BandMaths operator job template  -->
192
		<jobTemplate id="expression">
193
			<streamingExecutable>/application/expression/run</streamingExecutable>
194
			<defaultParameters>						
195
				<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
196
			</defaultParameters>
197
		</jobTemplate>
198
		<!-- BEAM Level 3 processor job template  -->
199
		<jobTemplate id="binning">
200
			<streamingExecutable>/application/binning/run</streamingExecutable>
201
			<defaultParameters>						
202
				<parameter id="cellsize">9.28</parameter>
203
				<parameter id="bandname">out</parameter>
204
				<parameter id="bitmask">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
205
				<parameter id="bbox">-180,-90,180,90</parameter>
206
				<parameter id="algorithm">Minimum/Maximum</parameter>
207
				<parameter id="outputname">binned</parameter>
208
				<parameter id="resampling">binning</parameter>
209
				<parameter id="palette">#MCI_Palette
210
color0=0,0,0
211
color1=0,0,154
212
color2=54,99,250
213
color3=110,201,136
214
color4=166,245,8
215
color5=222,224,0
216
color6=234,136,0
217
color7=245,47,0
218
color8=255,255,255
219
numPoints=9
220
sample0=98.19878118960284
221
sample1=98.64947122314665
222
sample2=99.10016125669047
223
sample3=99.5508512902343
224
sample4=100.0015413237781
225
sample5=100.4522313573219
226
sample6=100.90292139086574
227
sample7=101.35361142440956
228
sample8=101.80430145795337</parameter>
229
				<parameter id="band">1</parameter>
230
				<parameter id="tailor">true</parameter>
231
			</defaultParameters>
232
			<defaultJobconf>
233
		        	<property id="ciop.job.max.tasks">1</property>
234
		        </defaultJobconf>
235
		</jobTemplate>
236
	</jobTemplates>
237
</code></pre>
238
239
> Tip: check the validity of the application descriptor file with [[ciop-appcheck]]
240
241
> Tip: learn more about the application descriptor file here: [[Understanding the sandbox]]
242
243
Each *job template* has the mandatory element defining the streaming executable.
244
245
Example: the streaming executable for the job template *expression* is:
246
247
<pre><code class="xml">
248
<streamingExecutable>/application/expression/run</streamingExecutable>
249
</code></pre>
250
251
> Tip: do not forget to _chmod_ the streaming executable with executable rights, e.g. @chmod 755 /application/expression/run@
252
Both job templates, expression and binning, define a set of defaults parameters.
253
254
Example: the job template expression defines a default expression for the parameter *expression*:
255
256
<pre><code class="xml">
257
<defaultParameters>						
258
	<parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
259
</defaultParameters>
260
</code></pre>
261
262
The job template *binning* defines the default job configuration.
263
264
As explained above, the job template *binning* does a temporal and spatial aggregation of the *expression* job outputs. The *binning* job will thus be a single job instance (the •expression* job instead exploits the parallelism offered by CIOP).
265
To express such a job configuration we've added the XML tags:
266
267
<pre><code class="xml">
268
<defaultJobconf>
269
	<property id="ciop.job.max.tasks">1</property>
270
</defaultJobconf>
271
</code></pre>
272
273
> Tip: for a list of possible of job default properties read [[Application descriptor]]
274
275
h4. Streaming executable for expression job template
276
277
It is important to keep in mind that job input and ouput are text references (e.g. to data). 
278
Indeed, when a job process input, it actually reads line by line the reference workflow and the child job will read via _stdin_ the outputs of the parent job (or nay other source e.g. catalogued data).
279
280
> Tip: if you need to combine results of a parent job with other values (e.g. bounding boxes to process over a set on input products) you will have to add a simple job that combines the outputs and values
281
282
The job template *expression* steaming executable is Bourne Again SHell (bash) script:
283
284
<pre>
285
#!/bin/bash
286
287
# source the ciop functions (e.g. ciop-log)
288
source ${ciop_job_include}
289
290
export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
291
export PATH=$BEAM_HOME/bin:$PATH
292
293
# define the exit codes
294
SUCCESS=0
295
ERR_NOINPUT=1
296
ERR_BEAM=2
297
ERR_NOPARAMS=5
298
299
# add a trap to exit gracefully
300
function cleanExit ()
301
{
302
   local retval=$?
303
   local msg=""
304
   case "$retval" in
305
     $SUCCESS)      msg="Processing successfully concluded";;
306
     $ERR_NOPARAMS) msg="Expression not defined";;
307
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
308
     *)             msg="Unknown error";;
309
   esac
310
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg"
311
   exit $retval
312
}
313
trap cleanExit EXIT
314
315
# create the output folder to store the output products
316
mkdir -p $TMPDIR/output
317
export OUTPUTDIR=$TMPDIR/output
318
319
# retrieve the parameters value from workflow or job default value
320
expression="`ciop-getparam expression`"
321
322
# run a check on the expression value, it can't be empty
323
[ -z "$expression" ] && exit $ERR_NOPARAMS
324
325
326
# loop and process all MERIS products
327
while read inputfile 
328
do
329
	# report activity in log
330
	ciop-log "INFO" "Retrieving $inputfile from storage"
331
332
	# retrieve the remote geotiff product to the local temporary folder
333
	retrieved=`ciop-copy -o $TMPDIR $inputfile`
334
	
335
	# check if the file was retrieved
336
	[ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT
337
	
338
	# report activity
339
	ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression"
340
	outputname=`basename $retrieved`
341
342
	BEAM_REQUEST=$TMPDIR/beam_request.xml
343
cat << EOF > $BEAM_REQUEST
344
<?xml version="1.0" encoding="UTF-8"?>
345
<graph>
346
  <version>1.0</version>
347
  <node id="1">
348
    <operator>Read</operator>
349
      <parameters>
350
        <file>$retrieved</file>
351
      </parameters>
352
  </node>
353
  <node id="2">
354
    <operator>BandMaths</operator>
355
    <sources>
356
      <source>1</source>
357
    </sources>
358
    <parameters>
359
      <targetBands>
360
        <targetBand>
361
          <name>out</name>
362
          <expression>$expression</expression>
363
          <description>Processed Band</description>
364
          <type>float32</type>
365
        </targetBand>
366
      </targetBands>
367
    </parameters>
368
  </node>
369
  <node id="write">
370
    <operator>Write</operator>
371
    <sources>
372
       <source>2</source>
373
    </sources>
374
    <parameters>
375
      <file>$OUTPUTDIR/$outputname</file>
376
   </parameters>
377
  </node>
378
</graph>
379
EOF
380
   gpt.sh $BEAM_REQUEST &> /dev/null
381
   res=$?
382
   [ $res != 0 ] && exit $ERR_BEAM
383
384
	cd $OUTPUTDIR
385
	
386
	outputname=`echo $(basename $retrieved)`.dim
387
	outputfolder=`echo $(basename $retrieved)`.data
388
389
	tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
390
	cd - &> /dev/null
391
	
392
	ciop-log "INFO" "Publishing $outputname.dim and $outputname.data"
393
	ciop-publish $OUTPUTDIR/$outputname.tgz
394
	cd - &> /dev/null	
395
	
396
	# cleanup
397
	rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 
398
399
done
400
401
exit 0
402
</pre>
403
404
The first line tells Linux to use the bash interpreter to run this script. 
405
406
> Tip: always set the interpreter, there is no other way to tell CIOP how to execute the streaming executable
407
408
The block after is mandatory as it defines the CIOP functions (ciop-log, ciop-getparam, etc.) needed to write the streaming executable script:
409
410
<pre>
411
# source the ciop functions (e.g. ciop-log)
412
source ${ciop_job_include}
413
</pre>
414
415
After that, we set a few environment variables needed to have BEAM working:
416
417
<pre>
418
export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
419
export PATH=$BEAM_HOME/bin:$PATH
420
</pre>
421
422
After that, we set the error handling. Although this block is not mandatory, it is a good practice to set clear error codes and use a _trap_ function:
423
424
<pre>
425
# define the exit codes
426
SUCCESS=0
427
ERR_NOINPUT=1
428
ERR_BEAM=2
429
ERR_NOPARAMS=5
430
431
# add a trap to exit gracefully
432
function cleanExit ()
433
{
434
   local retval=$?
435
   local msg=""
436
   case "$retval" in
437
     $SUCCESS)      msg="Processing successfully concluded";;
438
     $ERR_NOPARAMS) msg="Expression not defined";;
439
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
440
     *)             msg="Unknown error";;
441
   esac
442
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg"
443
   exit $retval
444
}
445
trap cleanExit EXIT
446
</pre>
447
448
CIOP framework provides a temporary location unique to the job/parameter execution (very important if more than one processing node is used). 
449
In it, we'll define where our results will be written:
450
451
<pre>
452
# create the output folder to store the output products
453
mkdir -p $TMPDIR/output
454
export OUTPUTDIR=$TMPDIR/output
455
</pre>
456
457
Then, we read the processing parameters using ciop-getparam and do a simple check on the value (it cannot be empty):
458
459
<pre>
460
# retrieve the parameters value from workflow or job default value
461
expression="`ciop-getparam expression`"
462
463
# run a check on the expression value, it can't be empty
464
[ -z "$expression" ] && exit $ERR_NOPARAMS
465
</pre>
466
467
At this point we loop the input MERIS Level 1 products and copy them locally to the TMPDIR location:
468
469
<pre>
470
# loop and process all MERIS products
471
while read inputfile 
472
do
473
	# report activity in log
474
	ciop-log "INFO" "Retrieving $inputfile from storage"
475
476
	# retrieve the remote geotiff product to the local temporary folder
477
	retrieved=`ciop-copy -o $TMPDIR $inputfile`
478
	
479
	# check if the file was retrieved
480
	[ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT
481
	
482
	...
483
done	
484
</pre>
485
486
> Tip: always report activity using ciop-log, if you don't report activity CIOP will kill the process if the walltime is reached
487
488
We finally apply the BandMaths operator to the retrieved MERIS Level 1 product:
489
490
<pre>
491
# loop and process all MERIS products
492
while read inputfile 
493
do
494
	...
495
496
	# report activity
497
	ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression"
498
	outputname=`basename $retrieved`
499
500
	BEAM_REQUEST=$TMPDIR/beam_request.xml
501
cat << EOF > $BEAM_REQUEST
502
<?xml version="1.0" encoding="UTF-8"?>
503
<graph>
504
  <version>1.0</version>
505
  <node id="1">
506
    <operator>Read</operator>
507
      <parameters>
508
        <file>$retrieved</file>
509
      </parameters>
510
  </node>
511
  <node id="2">
512
    <operator>BandMaths</operator>
513
    <sources>
514
      <source>1</source>
515
    </sources>
516
    <parameters>
517
      <targetBands>
518
        <targetBand>
519
          <name>out</name>
520
          <expression>$expression</expression>
521
          <description>Processed Band</description>
522
          <type>float32</type>
523
        </targetBand>
524
      </targetBands>
525
    </parameters>
526
  </node>
527
  <node id="write">
528
    <operator>Write</operator>
529
    <sources>
530
       <source>2</source>
531
    </sources>
532
    <parameters>
533
      <file>$OUTPUTDIR/$outputname</file>
534
   </parameters>
535
  </node>
536
</graph>
537
EOF
538
   gpt.sh $BEAM_REQUEST &> /dev/null
539
   res=$?
540
   [ $res != 0 ] && exit $ERR_BEAM
541
542
   ...
543
   
544
done
545
546
At this stage the produced results are packaged and published in the CIOP distirbuted filesystem and available for the *binnning* job using ciop-publish:
547
548
<pre>
549
# loop and process all MERIS products
550
while read inputfile 
551
do
552
	...
553
	
554
	cd $OUTPUTDIR
555
	
556
	outputname=`echo $(basename $retrieved)`.dim
557
	outputfolder=`echo $(basename $retrieved)`.data
558
559
	tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
560
	cd - &> /dev/null
561
	
562
	ciop-log "INFO" "Publishing $outputname.dim and $outputname.data"
563
	ciop-publish $OUTPUTDIR/$outputname.tgz
564
	cd - &> /dev/null	
565
	
566
	# cleanup
567
	rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 
568
done
569
</pre>
570
571
Tip: ciop-publish does more than a simple copy of data, it also "echoes" the destination URL and this string(s) will be used as input for the *binning* job
572
573
574
This concludes the tutorial.