Project

General

Profile

Sandbox terminology » History » Version 3

Herve Caumont, 2013-09-20 18:02

1 1 Herve Caumont
h1. Sandbox Terminology and Definitions
2
3 3 Herve Caumont
This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service.
4 1 Herve Caumont
5 3 Herve Caumont
h2. Cloud Computing Operational Pilots (CIOP)
6
7
A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, , and that led to the creation of a set of tools (ciop command line tools) that are deployed as a baseline in a Developer Cloud Sandboxes service.
8
9 1 Herve Caumont
h2. Directed Acyclic Graph (DAG)
10
11 3 Herve Caumont
In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph)
12 1 Herve Caumont
13 3 Herve Caumont
This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST.
14
The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*.
15
A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks.
16 1 Herve Caumont
17 3 Herve Caumont
h2. Hadoop MapReduce Streaming
18
19
A Developer Cloud Sandboxes service comes with an Application Template that embeds a Hadoop MapReduce Framework, to support distributed computing over large datasets on clusters of virtual machines.
20
21
The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over "single shell command execution" on virtual machines, and reduces the output data URLs to a final results list. 
22
23
This Framework can be deployed accordingly to the virtual machine mode (sandbox or cluster), which supports the Developer Cloud Sandboxes service use cases for both application development and testing (Sandbox mode) and application deployment for large scale processing (Cluster mode).
24
25 1 Herve Caumont
h2. Task
26
27
To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+.
28
A +*job+* can then have one or more +*tasks*+.
29 3 Herve Caumont
30
h2. Workflow
31
32
An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier.
33
When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.