Sandbox terminology » History » Version 5
Herve Caumont, 2013-09-20 23:51
| 1 | 1 | Herve Caumont | h1. Sandbox Terminology and Definitions |
|---|---|---|---|
| 2 | |||
| 3 | 3 | Herve Caumont | This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service. |
| 4 | 1 | Herve Caumont | |
| 5 | 4 | Herve Caumont | h2. Application template |
| 6 | |||
| 7 | The application template is defined as a disk space with a structure of directories and files following a set of template rules. |
||
| 8 | Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG). |
||
| 9 | |||
| 10 | 3 | Herve Caumont | h2. Cloud Computing Operational Pilots (CIOP) |
| 11 | 1 | Herve Caumont | |
| 12 | 4 | Herve Caumont | A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools). |
| 13 | 3 | Herve Caumont | |
| 14 | 1 | Herve Caumont | h2. Directed Acyclic Graph (DAG) |
| 15 | |||
| 16 | In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph) |
||
| 17 | 3 | Herve Caumont | |
| 18 | 1 | Herve Caumont | This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST. |
| 19 | 3 | Herve Caumont | The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*. |
| 20 | 1 | Herve Caumont | A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks. |
| 21 | |||
| 22 | 4 | Herve Caumont | h2. Hadoop Streaming |
| 23 | 1 | Herve Caumont | |
| 24 | 5 | Herve Caumont | A Developer Cloud Sandboxes service comes with an Application template defined for the MapReduce Framework, so that applications can support distributed computing over large datasets, on clusters of virtual machines. |
| 25 | 1 | Herve Caumont | |
| 26 | 5 | Herve Caumont | Hadoop Streaming is a particular utility of the MapReduce Framework which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations). |
| 27 | |||
| 28 | The provided Framework is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list. |
||
| 29 | 1 | Herve Caumont | |
| 30 | 4 | Herve Caumont | This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing. |
| 31 | 3 | Herve Caumont | |
| 32 | 4 | Herve Caumont | h2. Sandbox |
| 33 | |||
| 34 | A collection of software libraries providing a defined application programming interface (API). |
||
| 35 | |||
| 36 | 1 | Herve Caumont | h2. Task |
| 37 | |||
| 38 | To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+. |
||
| 39 | 3 | Herve Caumont | A +*job+* can then have one or more +*tasks*+. |
| 40 | 4 | Herve Caumont | |
| 41 | h2. Virtual Machine template |
||
| 42 | |||
| 43 | Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System. |
||
| 44 | 3 | Herve Caumont | |
| 45 | h2. Workflow |
||
| 46 | |||
| 47 | An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier. |
||
| 48 | When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file. |