Project

General

Profile

Actions

GDAL Simple job

Introduction

This simple job will demonstrate a few things about a typical sandbox usage:
  • installing software packages
  • configuration of a simple application taking geotiff files available on a remote FTP site and convert them to a chosen format (e.g. PNG)

Pre-requisites

To follow this simple tutorial you need:
  • a running sandbox
  • access to the sandbox
    less than 30 minutes of time

Step 1: install gdal on the sandbox

The installation of gdal is done via yum(Yellowdog Updater Modified)

Run the command below on your sandbox:

[user@sb ~]$ sudo yum -y install gdal

Step 2: create the simplejob

The simple job will require a folder under /application:

[user@sb ~] cd /application
[user@sb application] mkdir simplejob
[user@sb application] cd simplejob

Create the wrapper file which handles the input parameters

[user@sb simplejob] vi run

Note: You can use any editor you like

Paste the code below in the run file:

#!/bin/bash

# Project:       ${project.name}
# Author:        $Author: stripodi $ (Terradue Srl)
# Last update:   $Date: 2011-09-08 12:01:58 +0200 (Thu, 08 Sep 2011) $
# Element:       ${project.name}
# Context:       services/${project.artifactId}
# Version:       ${project.version} (${implementation.build})
# Description:   ${project.description}
#
# This document is the property of Terradue and contains information directly
# resulting from knowledge and experience of Terradue.
# Any changes to this code is forbidden without written consent from Terradue Srl
#
# Contact: info@terradue.com
# 2012-02-10 - NEST in jobConfig upgraded to version 4B-1.1

# source the ciop functions (e.g. ciop-log)
source ${ciop_job_include}

# define the exit codes
SUCCESS=0
ERR_NOINPUT=1
ERR_NOPARAMS=2
ERR_GDAL=4

# add a trap to exit gracefully
function cleanExit ()
{
   local retval=$?
   local msg="" 
   case "$retval" in
     $SUCCESS)      msg="Processing successfully concluded";;
     $ERR_NOPARAMS) msg="Outout format not defined";;
     $ERR_GDAL)    msg="Graph processing of job ${JOBNAME} failed (exit code $res)";;
     *)             msg="Unknown error";;
   esac
   [ "$retval" != "0" ] && ciop-log "ERROR" "Error $retval - $msg, processing aborted" || ciop-log "INFO" "$msg" 
   exit $retval
}
trap cleanExit EXIT

# retrieve the parameters value from workflow or job default value
format=`ciop-getparam format`

# run a check on the format value
[ -z "$format" ] && exit $ERR_NOPARAMS

# loop through all geotiff URLs passed as stdin
while read inputfile 
do
    # report activity in log
    ciop-log "INFO" "Retrieving $inputfile from storage" 

    # retrieve the remote geotiff product to the local temporary folder
    retrieved=`ciop-copy -o $TMPDIR $inputfile`

    # check if the file was retrieved
    [ "$?" == "0" -a -e "$retrieved" ] || exit $ERR_NOINPUT

    # report activity
    ciop-log "INFO" "Retrieved $retrieved" 

    # invoke gdal to convert the geotiff into selected format
    gdal_translate -of $format $retrieved $OUTPUTDIR/`basename $retrieved`    

    # check error code
    [ "$?" != "0" ] && exit $ERR_GDAL || ciop-log "INFO" "Processed $inputfile" 
done

exit 0

Create the application descriptor

Go up one level to /application

[user@sb ~] cd /application
[user@sb application] vi application.xml

Paste the XML content:

<?xml version="1.0" encoding="UTF-8"?>
<application id="example"> <!-- you can type any id you want --> 
    <jobTemplates>
        <jobTemplate id="gdalformatconv"> <!-- this is the job name -->
            <streamingExecutable>/application/simplejob/run</streamingExecutable> <!-- this is the wrapper script --> 
            <defaultParameters>                            
                <parameter id="format">PNG</parameter> <!-- this sets the default value for parameter format -->
            </defaultParameters>
        </jobTemplate>
    </jobTemplates>
    <workflow id="workflow">    <!-- Sample workflow -->
        <workflowVersion>1.0</workflowVersion>
        <workflowDescription>My simple workflow</workflowDescription> <!-- provide a description to the workflow -->
        <node id="gdal">         <!-- workflow node unique id -->
            <job id="gdalformatconv"></job>        <!-- job defined above -->
            <sources>
                <source refid="file:urls" >/home/fbrito/geotiff.urls</source> <!-- the geotiff URLs are provided on an ASCII file, set your username value -->
            </sources>
            <parameters></parameters>
        </node>
    </workflow>
</application>

Create the URLs files in your home directory

[user@sb application] cd 
[user@sb ~] vi geotiff.urls

Add a few URLs:

ftp://ftp.remotesensing.org/pub/geotiff/samples/spot/chicago/SP27GTIF.TIF
ftp://ftp.remotesensing.org/pub/geotiff/samples/spot/chicago/UTM2GTIF.TIF

Step 3: execute the job

Execute the simple job

Optionally list the nodes you can execute:

[user@sb ~] ciop-simjob -n

This will return:

gdal

Process it!

 
[user@sb ~] ciop-simjob gdal

The job is executed and a tracking URL is provided to follow the progress and access the execution logs. Open the URL on a browser

TBC

Check the gdal node results

The generated files are published on a local filesystem:

[user@sb ~] cd /share/tmp/TBC

There should be two image files in the file format defined by default: PNG

Define another file format to test the node

Edit the application.xml file and add the line:

<parameter id="format">JPEG</parameter>

To obtain:

<?xml version="1.0" encoding="UTF-8"?>
<application id="example">
    <jobTemplates>
        <jobTemplate id="gdalformatconv">
            <streamingExecutable>/application/simplejob/run</streamingExecutable>
            <defaultParameters>                            
                <parameter id="format">PNG</parameter>
            </defaultParameters>
        </jobTemplate>
    </jobTemplates>
    <workflow id="workflow">                            <!-- Sample workflow -->
        <workflowVersion>1.0</workflowVersion>
        <workflowDescription>My simple workflow</workflowDescription>
        <node id="gdal">                            <!-- workflow node unique id -->
            <job id="gdalformatconv"></job>                    <!-- job defined above -->
            <sources>
                <source refid="file:urls" >/home/fbrito/geotiff.urls</source>
            </sources>
            <parameters>
                <parameter id="format">JPEG</parameter>
            </parameters>
        </node>
    </workflow>
</application>

Run the job again this time using the ciop-simjob flag to delete the previous run results:

 
[user@sb ~] ciop-simjob -f gdal

The generated files are published on a local filesystem:

[user@sb ~] cd /share/tmp/TBC

There should be two image files in the file format defined at workflow level: JPEG

Run the application as a workflow

Our workflow has a single job but it's still a workflow!

You can trigger the workflow with:

[user@sb ~] ciop-simwf

You can track the workflow execution on the shell. Wait for the workflow conclusion.

Use the command below to get the latest workflow run:

[user@sb ~] ciop-simwf -l
0000000-130405042430716-oozie-oozi-W

Use the value returned above to check the workflow results:

[user@sb ~] ll tmp/sandbox/run/0000000-130405042430716-oozie-oozi-W/gdal/output

You can optionally delete the generated results:

Conclusion

With this simple job you have learned:
  • how to install packages to run your application using yum
  • how to create a job template in your sanbox
  • how to write the job wrapper script
  • how to define the application descriptor
  • how to execute the job with the default parameter
  • how to execute the job with the parameter value definition
  • how to execute the workflow
  • how to check the generated results

Updated by Herve Caumont over 10 years ago ยท 3 revisions