Sandbox terminology » History » Version 4
Herve Caumont, 2013-09-20 23:39
1 | 1 | Herve Caumont | h1. Sandbox Terminology and Definitions |
---|---|---|---|
2 | |||
3 | 3 | Herve Caumont | This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service. |
4 | 1 | Herve Caumont | |
5 | 4 | Herve Caumont | h2. Application template |
6 | |||
7 | The application template is defined as a disk space with a structure of directories and files following a set of template rules. |
||
8 | Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG). |
||
9 | |||
10 | 3 | Herve Caumont | h2. Cloud Computing Operational Pilots (CIOP) |
11 | 1 | Herve Caumont | |
12 | 4 | Herve Caumont | A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools). |
13 | 3 | Herve Caumont | |
14 | 1 | Herve Caumont | h2. Directed Acyclic Graph (DAG) |
15 | |||
16 | In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph) |
||
17 | 3 | Herve Caumont | |
18 | 1 | Herve Caumont | This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST. |
19 | 3 | Herve Caumont | The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*. |
20 | 1 | Herve Caumont | A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks. |
21 | |||
22 | 4 | Herve Caumont | h2. Hadoop Streaming |
23 | 1 | Herve Caumont | |
24 | 4 | Herve Caumont | A Developer Cloud Sandboxes service comes with an Application Template that embeds a MapReduce Framework to support distributed computing over large datasets on clusters of virtual machines. Hadoop Streaming is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations). |
25 | 1 | Herve Caumont | |
26 | 4 | Herve Caumont | The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list. |
27 | 1 | Herve Caumont | |
28 | 4 | Herve Caumont | This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing. |
29 | 3 | Herve Caumont | |
30 | 4 | Herve Caumont | h2. Sandbox |
31 | |||
32 | A collection of software libraries providing a defined application programming interface (API). |
||
33 | |||
34 | 1 | Herve Caumont | h2. Task |
35 | |||
36 | To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+. |
||
37 | 3 | Herve Caumont | A +*job+* can then have one or more +*tasks*+. |
38 | 4 | Herve Caumont | |
39 | h2. Virtual Machine template |
||
40 | |||
41 | Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System. |
||
42 | 3 | Herve Caumont | |
43 | h2. Workflow |
||
44 | |||
45 | An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier. |
||
46 | When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file. |