Sandbox terminology » History » Version 3
Herve Caumont, 2013-09-20 18:02
1 | 1 | Herve Caumont | h1. Sandbox Terminology and Definitions |
---|---|---|---|
2 | |||
3 | 3 | Herve Caumont | This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service. |
4 | 1 | Herve Caumont | |
5 | 3 | Herve Caumont | h2. Cloud Computing Operational Pilots (CIOP) |
6 | |||
7 | A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, , and that led to the creation of a set of tools (ciop command line tools) that are deployed as a baseline in a Developer Cloud Sandboxes service. |
||
8 | |||
9 | 1 | Herve Caumont | h2. Directed Acyclic Graph (DAG) |
10 | |||
11 | 3 | Herve Caumont | In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph) |
12 | 1 | Herve Caumont | |
13 | 3 | Herve Caumont | This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST. |
14 | The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*. |
||
15 | A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks. |
||
16 | 1 | Herve Caumont | |
17 | 3 | Herve Caumont | h2. Hadoop MapReduce Streaming |
18 | |||
19 | A Developer Cloud Sandboxes service comes with an Application Template that embeds a Hadoop MapReduce Framework, to support distributed computing over large datasets on clusters of virtual machines. |
||
20 | |||
21 | The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over "single shell command execution" on virtual machines, and reduces the output data URLs to a final results list. |
||
22 | |||
23 | This Framework can be deployed accordingly to the virtual machine mode (sandbox or cluster), which supports the Developer Cloud Sandboxes service use cases for both application development and testing (Sandbox mode) and application deployment for large scale processing (Cluster mode). |
||
24 | |||
25 | 1 | Herve Caumont | h2. Task |
26 | |||
27 | To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+. |
||
28 | A +*job+* can then have one or more +*tasks*+. |
||
29 | 3 | Herve Caumont | |
30 | h2. Workflow |
||
31 | |||
32 | An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier. |
||
33 | When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file. |