Sandbox terminology » History » Revision 4
Revision 3 (Herve Caumont, 2013-09-20 18:02) → Revision 4/5 (Herve Caumont, 2013-09-20 23:39)
h1. Sandbox Terminology and Definitions This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service. h2. Application template The application template is defined as a disk space with a structure of directories and files following a set of template rules. Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG). h2. Cloud Computing Operational Pilots (CIOP) A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, , and that led to the creation of a set of tools (ciop command line tools) that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools). service. h2. Directed Acyclic Graph (DAG) In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph) This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST. The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*. A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks. h2. Hadoop MapReduce Streaming A Developer Cloud Sandboxes service comes with an Application Template that embeds a Hadoop MapReduce Framework Framework, to support distributed computing over large datasets on clusters of virtual machines. Hadoop Streaming is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations). The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over a single "single shell command execution execution" on virtual machines, and reduces the output data URLs to a final results list. This Framework can be deployed accordingly to the virtual machine mode: Sandbox mode (sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) cluster), which supports the Developer Cloud Sandboxes service use cases for both application development and testing, testing (Sandbox mode) and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed application deployment for large scale processing. processing (Cluster mode). h2. Sandbox A collection of software libraries providing a defined application programming interface (API). h2. Task To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+. A +*job+* can then have one or more +*tasks*+. h2. Virtual Machine template Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System. h2. Workflow An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier. When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.