Sandbox Terminology and Definitions¶
This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service.
Application template¶
The application template is defined as a disk space with a structure of directories and files following a set of template rules.
Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG).
Cloud Computing Operational Pilots (CIOP)¶
A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools).
Directed Acyclic Graph (DAG)¶
In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph)
This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST.
The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application workflow.
A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks.
Hadoop Streaming¶
A Developer Cloud Sandboxes service comes with an Application template defined for the MapReduce Framework, so that applications can support distributed computing over large datasets, on clusters of virtual machines.
Hadoop Streaming is a particular utility of the MapReduce Framework which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations).
The provided Framework is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list.
This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing.
Sandbox¶
A collection of software libraries providing a defined application programming interface (API).
Task¶
To exploit the parallelism offered by the CIOP framework, a job may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a job.
A job can then have one or more tasks.
Virtual Machine template¶
Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System.
Workflow¶
An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier.
When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.
Updated by Herve Caumont about 11 years ago · 5 revisions