Project

General

Profile

Actions

Lib-beam » History » Revision 3

« Previous | Revision 3/4 (diff) | Next »
Herve Caumont, 2013-06-20 10:50


BEAM Arithm tutorial

BEAM - Basic ERS & Envisat (A) ATSR and Meris

Introduction

BEAM is an open-source toolbox and development platform for viewing, analyzing and processing of remote sensing raster data. Originally developed to facilitate the utilization of image data from Envisat's optical instruments, BEAM now supports a growing number of other raster data formats such as GeoTIFF and NetCDF as well as data formats of other EO sensors such as MODIS, AVHRR, AVNIR, PRISM and CHRIS/Proba. Various data and algorithms are supported by dedicated extension plug-ins.

BEAM Graph Processing Tool (gpt) is a tool used to execute BEAM raster data operators in batch-mode. The operators can be used stand-alone or combined as a directed acyclic graph (DAG). Processing graphs are represented using XML.

Our tutorial uses the BandMaths operator and the Level 3 Binning Processor applied to Envisat MERIS Level 1 Reduced Resolution products to create an application to represent algal blooms.

Definition (source Wikipedia): An algal bloom is a rapid increase or accumulation in the population of algae (typically microscopic) in an aquatic system. Algal blooms may occur in freshwater as well as marine environments. Typically, only one or a small number of phytoplankton species are involved, and some blooms may be recognized by discoloration of the water resulting from the high density of pigmented cells. Although there is no officially recognized threshold level, algae can be considered to be blooming at concentrations of hundreds to thousands of cells per milliliter, depending on the severity. Algal bloom concentrations may reach millions of cells per milliliter. Algal blooms are often green, but they can also be other colors such as yellow-brown or red, depending on the species of algae.

The application

As introduced above, our applications uses the BandMaths operator and the Level 3 Binning processor.

The BandMaths Operator

The BandMaths operator can be used to create a product with multiple bands based on mathematical expression. All products specified as source must have the same width and height, otherwise the operator will fail. The geo-coding information and metadata for the target product is taken from the first source product.

In our application we will apply the mathematical expression below to all input MERIS Level 1 Reduced Resolution products to detect the algal blooms:

l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)

The Level 3 Binning Processor

The term binning refers to the process of distributing the contributions of Level 2 pixels in satellite coordinates to a fixed Level 3 grid using a geographic reference system. In most cases a sinusoidal projection is used to realize Level 3 grid comprising a fixed number of equal area bins with global coverage. This is for example true for the SeaWiFS Level 3 products.

As long as the area of an input pixel is small compared to the area of a bin, a simple binning is sufficient. In this case, the geodetic center coordinate of the Level 2 pixel is used to find the bin in the Level 3 grid whose area is intersected by this point. If the area of the contributing pixel is equal or even larger than the bin area, this simple binning will produce composites with insufficient accuracy and visual artefacts such as Moiré effects will dominate the resulting datasets.

The application workflow

Our application can be described as an activity diagram where the BandMaths operator is applied to all input MERIS Level 1 products whose outputs are used as inputs to the Level 3 Binning processor. Since the BandMaths operator is an independent chore, each MERIS Level 1 can be processed in parallel. The Level 3 binning processor instead needs all the outputs to increment the values of the bins and generate the level 3 product.

BeamArithm implementation

Tutorial approach

The goal of this tutorial is to get you acquainted to CIOP as an environment to implement scientific applications.
The driver is to analyze the implemented application rather than have you install software, edit files, copy data etc. All these steps have been already done!

Tutorial requirements

You need access to a running Sandbox. The procedure to start a Sandbox is outside the scope of this tutorial.

Tutorial files and artifacts installation on the Sandbox

Log on your Sandbox. List the available tutorials with:

[user@sb ~]$ ciop-tutorial list 

This will list the available tutorials:

...
beam-arithm
...

Get the tutorial description with:

[user@sb ~]$ ciop-tutorial info  beam-arithm

This displays the tutorial information:

TBW

Install the tutorial:

[user@sb ~]$ ciop-tutorial install beam-arithm

This will take a few minutes. Once the installation is concluded you get the BeamArithm application ready to run.

Tip: check the ciop-tutorial Command Line (CLI) reference (UPCOMING)

Execute the BeamArithm processing steps one by one

CIOP allows you to process independently the nodes of the workflow.

It may sound obvious but to run the second node of the workflow, the first node has to have run successfully at least once

List the nodes of the workflow

[user@sb ~]$ ciop-simjob -n

This will output:

node_expression
node_binning

where node_expression is the BandMaths operator and the node_binning is the Level 3 Binning Processor.

Tip: check the ciop-simjob CLI reference

Execute the node_expression workflow node:

[user@sb ~]$ ciop-simjob node_expression

The CIOP framework will take the MERIS Level 1 products and execute the BEAM BandMaths operator taking advantage of the Hadoop Map/Reduce cluster.
The output of the command above provides you with a tracking URL. Open it on your favorite browser.

Tip: if you execute a node more than once, do not forget to use the flag -f to remove the results of the previous execution

After a few minutes, the outputs generated will be listed as hdfs resources. You can inspect the results in the HDFS mount point of your sandbox with:

[user@sb ~]$ ls -l /share/tmp/sandbox/node_expression/data

Tip: remember CIOP relies of the Hadoop HDFS distributed storage to manage input and output data. More information about the sandbox: Understanding the sandbox

Now, execute the node_binning node:

[user@sb ~]$ ciop-simjob node_binning

As for the node_expression, after a few minutes, the list of the generated products is shown.

Execute the BeamArithm processing workflow

While executing the single nodes can be very practical for debugging the application individual processing steps, CIOP allows you to process the entire workflow automatically. To do so run the command:

[user@sb ~]$ ciop-simwf

This will display an ascii status output of the application workflow execution.

Tip: check the ciop-simwf CLI reference

Tip: each workflow run has a unique identifier and the results of a run are never overwritten when executing the workflow again.

After a few minutes, the same outputs are generated and available as hdfs resources. As for the single node execution, these resources can be accessed on the HDFS mount point.
To do so, you need the run identifier. Obtain it with the command:

[user@sb ~]$ ciop-simwf -l

You should have a single run identifier. Use it to list the generated results:

[user@sb ~]$ ls -l /share/tmp/sandbox/run/<run identifier>/node_binning/data

This will list the same results as the single node_binning execution.

Now that you have seen CIOP manage the BeamArithm application, we will go through all files composing the application.

BeamArithm CIOP application breakdown

The application descriptor file: application.xml

Each CIOP application is described with an application descriptor file. This file is always named application.xml and is found in the /application file system in your sandbox.

This file contains two main sections:
  • a section where the application job templates are described
  • a section with the workflow definition combining the job templates as a DAG

Tip: check the DAG definition here: CIOP terminology and definitions

Tip: learn about the application descriptor file here: Understanding the sandbox

Job templates

The listing below show the job templates section of the application descriptor file.

<jobTemplates>
        <!-- BEAM BandMaths operator job template  -->
        <jobTemplate id="expression">
            <streamingExecutable>/application/expression/run</streamingExecutable>
            <defaultParameters>                        
                <parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
            </defaultParameters>
        </jobTemplate>
        <!-- BEAM Level 3 processor job template  -->
        <jobTemplate id="binning">
            <streamingExecutable>/application/binning/run</streamingExecutable>
            <defaultParameters>                        
                <parameter id="cellsize">9.28</parameter>
                <parameter id="bandname">out</parameter>
                <parameter id="bitmask">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
                <parameter id="bbox">-180,-90,180,90</parameter>
                <parameter id="algorithm">Minimum/Maximum</parameter>
                <parameter id="outputname">binned</parameter>
                <parameter id="resampling">binning</parameter>
                <parameter id="palette">#MCI_Palette
color0=0,0,0
color1=0,0,154
color2=54,99,250
color3=110,201,136
color4=166,245,8
color5=222,224,0
color6=234,136,0
color7=245,47,0
color8=255,255,255
numPoints=9
sample0=98.19878118960284
sample1=98.64947122314665
sample2=99.10016125669047
sample3=99.5508512902343
sample4=100.0015413237781
sample5=100.4522313573219
sample6=100.90292139086574
sample7=101.35361142440956
sample8=101.80430145795337</parameter>
                <parameter id="band">1</parameter>
                <parameter id="tailor">true</parameter>
            </defaultParameters>
            <defaultJobconf>
                    <property id="ciop.job.max.tasks">1</property>
                </defaultJobconf>
        </jobTemplate>
    </jobTemplates>

Tip: check the validity of the application descriptor file with ciop-appcheck

Tip: learn more about the application descriptor file here: Understanding the sandbox

Each job template has the mandatory element defining the streaming executable.

Example: the streaming executable for the job template expression is:

<streamingExecutable>/application/expression/run</streamingExecutable>

Tip: do not forget to chmod the streaming executable with executable rights, e.g. chmod 755 /application/expression/run

Both job templates, expression and binning, define a set of defaults parameters.

Example: the job template expression defines a default expression for the parameter expression:

<defaultParameters>                        
    <parameter id="expression">l1_flags.INVALID?0:radiance_13>15?0:100+radiance_9-(radiance_8+(radiance_10-radiance_8)*27.524/72.570)</parameter>
</defaultParameters>

The job template binning defines the default job configuration.

As explained above, the job template binning does a temporal and spatial aggregation of the expression job outputs. The binning job will thus be a single job instance (the •expression* job instead exploits the parallelism offered by CIOP).
To express such a job configuration we've added the XML tags:

<defaultJobconf>
    <property id="ciop.job.max.tasks">1</property>
</defaultJobconf>

Tip: for a list of possible of job default properties read Application descriptor

Streaming executable for expression job template

It is important to keep in mind that job input and ouput are text references (e.g. to data).
Indeed, when a job process input, it actually reads line by line the reference workflow and the child job will read via stdin the outputs of the parent job (or nay other source e.g. catalogued data).

Tip: if you need to combine results of a parent job with other values (e.g. bounding boxes to process over a set on input products) you will have to add a simple job that combines the outputs and values

The job template expression steaming executable is Bourne Again SHell (bash) script:

#!/bin/bash

# source the ciop functions (e.g. ciop-log)
source ${ciop_job_include}

export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
export PATH=$BEAM_HOME/bin:$PATH

# define the exit codes
SUCCESS=0
ERR_NOINPUT=1
ERR_BEAM=2
ERR_NOPARAMS=5

# add a trap to exit gracefully
function cleanExit ()
{
   local retval=$?
   local msg="" 
   case "$retval" in
     $SUCCESS)      msg="Processing successfully concluded";;
     $ERR_NOPARAMS) msg="Expression not defined";;
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
     *)             msg="Unknown error";;
   esac
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg" 
   exit $retval
}
trap cleanExit EXIT

# create the output folder to store the output products
mkdir -p $TMPDIR/output
export OUTPUTDIR=$TMPDIR/output

# retrieve the parameters value from workflow or job default value
expression="`ciop-getparam expression`" 

# run a check on the expression value, it can't be empty
[ -z "$expression" ] && exit $ERR_NOPARAMS

# loop and process all MERIS products
while read inputfile 
do
    # report activity in log
    ciop-log "INFO" "Retrieving $inputfile from storage" 

    # retrieve the remote geotiff product to the local temporary folder
    retrieved=`ciop-copy -o $TMPDIR $inputfile`

    # check if the file was retrieved
    [ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT

    # report activity
    ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression" 
    outputname=`basename $retrieved`

    BEAM_REQUEST=$TMPDIR/beam_request.xml
cat << EOF > $BEAM_REQUEST
<?xml version="1.0" encoding="UTF-8"?>
<graph>
  <version>1.0</version>
  <node id="1">
    <operator>Read</operator>
      <parameters>
        <file>$retrieved</file>
      </parameters>
  </node>
  <node id="2">
    <operator>BandMaths</operator>
    <sources>
      <source>1</source>
    </sources>
    <parameters>
      <targetBands>
        <targetBand>
          <name>out</name>
          <expression>$expression</expression>
          <description>Processed Band</description>
          <type>float32</type>
        </targetBand>
      </targetBands>
    </parameters>
  </node>
  <node id="write">
    <operator>Write</operator>
    <sources>
       <source>2</source>
    </sources>
    <parameters>
      <file>$OUTPUTDIR/$outputname</file>
   </parameters>
  </node>
</graph>
EOF
   gpt.sh $BEAM_REQUEST &> /dev/null
   res=$?
   [ $res != 0 ] && exit $ERR_BEAM

    cd $OUTPUTDIR

    outputname=`echo $(basename $retrieved)`.dim
    outputfolder=`echo $(basename $retrieved)`.data

    tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
    cd - &> /dev/null

    ciop-log "INFO" "Publishing $outputname.dim and $outputname.data" 
    ciop-publish $OUTPUTDIR/$outputname.tgz
    cd - &> /dev/null    

    # cleanup
    rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 

done

exit 0

The first line tells Linux to use the bash interpreter to run this script.

Tip: always set the interpreter, there is no other way to tell CIOP how to execute the streaming executable

The block after is mandatory as it defines the CIOP functions (ciop-log, ciop-getparam, etc.) needed to write the streaming executable script:

# source the ciop functions (e.g. ciop-log)
source ${ciop_job_include}

After that, we set a few environment variables needed to have BEAM working:

export BEAM_HOME=$_CIOP_APPLICATION_PATH/share/beam-4.11
export PATH=$BEAM_HOME/bin:$PATH

After that, we set the error handling. Although this block is not mandatory, it is a good practice to set clear error codes and use a trap function:

# define the exit codes
SUCCESS=0
ERR_NOINPUT=1
ERR_BEAM=2
ERR_NOPARAMS=5

# add a trap to exit gracefully
function cleanExit ()
{
   local retval=$?
   local msg="" 
   case "$retval" in
     $SUCCESS)      msg="Processing successfully concluded";;
     $ERR_NOPARAMS) msg="Expression not defined";;
     $ERR_BEAM)    msg="Beam failed to process product $product (Java returned $res).";;
     *)             msg="Unknown error";;
   esac
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg" 
   exit $retval
}
trap cleanExit EXIT

CIOP framework provides a temporary location unique to the job/parameter execution (very important if more than one processing node is used).
In it, we'll define where our results will be written:

# create the output folder to store the output products
mkdir -p $TMPDIR/output
export OUTPUTDIR=$TMPDIR/output

Then, we read the processing parameters using ciop-getparam and do a simple check on the value (it cannot be empty):

# retrieve the parameters value from workflow or job default value
expression="`ciop-getparam expression`" 

# run a check on the expression value, it can't be empty
[ -z "$expression" ] && exit $ERR_NOPARAMS

At this point we loop the input MERIS Level 1 products and copy them locally to the TMPDIR location:

# loop and process all MERIS products
while read inputfile 
do
    # report activity in log
    ciop-log "INFO" "Retrieving $inputfile from storage" 

    # retrieve the remote geotiff product to the local temporary folder
    retrieved=`ciop-copy -o $TMPDIR $inputfile`

    # check if the file was retrieved
    [ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT

    ...
done    

Tip: always report activity using ciop-log, if you don't report activity CIOP will kill the process if the walltime is reached

We finally apply the BandMaths operator to the retrieved MERIS Level 1 product:

# loop and process all MERIS products
while read inputfile 
do
    ...

    # report activity
    ciop-log "INFO" "Retrieved `basename $retrieved`, moving on to expression" 
    outputname=`basename $retrieved`

    BEAM_REQUEST=$TMPDIR/beam_request.xml
cat << EOF > $BEAM_REQUEST
<?xml version="1.0" encoding="UTF-8"?>
<graph>
  <version>1.0</version>
  <node id="1">
    <operator>Read</operator>
      <parameters>
        <file>$retrieved</file>
      </parameters>
  </node>
  <node id="2">
    <operator>BandMaths</operator>
    <sources>
      <source>1</source>
    </sources>
    <parameters>
      <targetBands>
        <targetBand>
          <name>out</name>
          <expression>$expression</expression>
          <description>Processed Band</description>
          <type>float32</type>
        </targetBand>
      </targetBands>
    </parameters>
  </node>
  <node id="write">
    <operator>Write</operator>
    <sources>
       <source>2</source>
    </sources>
    <parameters>
      <file>$OUTPUTDIR/$outputname</file>
   </parameters>
  </node>
</graph>
EOF
   gpt.sh $BEAM_REQUEST &> /dev/null
   res=$?
   [ $res != 0 ] && exit $ERR_BEAM

   ...

done

At this stage the produced results are packaged and published in the CIOP distirbuted filesystem and available for the *binnning* job using ciop-publish:

<pre>
# loop and process all MERIS products
while read inputfile 
do
    ...

    cd $OUTPUTDIR

    outputname=`echo $(basename $retrieved)`.dim
    outputfolder=`echo $(basename $retrieved)`.data

    tar cfz $outputname.tgz $outputname $outputfolder &> /dev/null
    cd - &> /dev/null

    ciop-log "INFO" "Publishing $outputname.dim and $outputname.data" 
    ciop-publish $OUTPUTDIR/$outputname.tgz
    cd - &> /dev/null    

    # cleanup
    rm -fr $retrieved $OUTPUTDIR/$outputname.d* $OUTPUTDIR/$outputname.tgz 
done
</pre>

Tip: ciop-publish does more than a simple copy of data, it also "echoes" the destination URL and this string(s) will be used as input for the *binning* job

This concludes the tutorial.

Updated by Herve Caumont over 11 years ago · 3 revisions