Sandbox terminology » History » Revision 4
« Previous |
Revision 4/5
(diff)
| Next »
Herve Caumont, 2013-09-20 23:39
Sandbox Terminology and Definitions¶
This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service.
Application template¶
The application template is defined as a disk space with a structure of directories and files following a set of template rules.
Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG).
Cloud Computing Operational Pilots (CIOP)¶
A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools).
Directed Acyclic Graph (DAG)¶
In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph)
This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST.
The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application workflow.
A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks.
Hadoop Streaming¶
A Developer Cloud Sandboxes service comes with an Application Template that embeds a MapReduce Framework to support distributed computing over large datasets on clusters of virtual machines. Hadoop Streaming is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations).
The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list.
This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing.
Sandbox¶
A collection of software libraries providing a defined application programming interface (API).
Task¶
To exploit the parallelism offered by the CIOP framework, a job may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a job.
A job can then have one or more tasks.
Virtual Machine template¶
Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System.
Workflow¶
An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier.
When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.
Updated by Herve Caumont about 11 years ago · 4 revisions