Project

General

Profile

Actions

Sandbox terminology » History » Revision 4

« Previous | Revision 4/5 (diff) | Next »
Herve Caumont, 2013-09-20 23:39


Sandbox Terminology and Definitions

This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service.

Application template

The application template is defined as a disk space with a structure of directories and files following a set of template rules.
Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG).

Cloud Computing Operational Pilots (CIOP)

A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools).

Directed Acyclic Graph (DAG)

In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph)

This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST.
The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application workflow.
A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks.

Hadoop Streaming

A Developer Cloud Sandboxes service comes with an Application Template that embeds a MapReduce Framework to support distributed computing over large datasets on clusters of virtual machines. Hadoop Streaming is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations).

The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list.

This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing.

Sandbox

A collection of software libraries providing a defined application programming interface (API).

Task

To exploit the parallelism offered by the CIOP framework, a job may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a job.
A job can then have one or more tasks.

Virtual Machine template

Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System.

Workflow

An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier.
When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.

Updated by Herve Caumont about 11 years ago · 4 revisions