Project

General

Profile

Sandbox terminology » History » Version 4

Herve Caumont, 2013-09-20 23:39

1 1 Herve Caumont
h1. Sandbox Terminology and Definitions
2
3 3 Herve Caumont
This page contains the terminology and definitions related to the use of the Developer Cloud Sandboxes service.
4 1 Herve Caumont
5 4 Herve Caumont
h2. Application template
6
7
The application template is defined as a disk space with a structure of directories and files following a set of template rules.
8
Applications must conform to this template in order to integrate correctly as a processing workflow in Cluster mode. Moreover, an application workflow must be defined as a directed acyclic graph (DAG).
9
10 3 Herve Caumont
h2. Cloud Computing Operational Pilots (CIOP)
11 1 Herve Caumont
12 4 Herve Caumont
A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, and that led to the creation of a set of tools that are deployed as a baseline in a Developer Cloud Sandboxes service (the 'ciop' command line tools).
13 3 Herve Caumont
14 1 Herve Caumont
h2. Directed Acyclic Graph (DAG)
15
16
In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph)
17 3 Herve Caumont
18 1 Herve Caumont
This notion is in place in the G-POD processing flows, and is used in the ESA EO Toolboxes such as BEAM and NeST.
19 3 Herve Caumont
The CIOP tools embedded in Developer Cloud Sandboxes rely on the DAG to describe a scientific application +*workflow+*.
20 1 Herve Caumont
A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks.
21
22 4 Herve Caumont
h2. Hadoop Streaming
23 1 Herve Caumont
24 4 Herve Caumont
A Developer Cloud Sandboxes service comes with an Application Template that embeds a MapReduce Framework to support distributed computing over large datasets on clusters of virtual machines. Hadoop Streaming is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer (MapReduce operations).
25 1 Herve Caumont
26 4 Herve Caumont
The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over a single shell command execution on virtual machines, and reduces the output data URLs to a final results list. 
27 1 Herve Caumont
28 4 Herve Caumont
This Framework can be deployed accordingly to the virtual machine mode: Sandbox or Cluster. That way a template can be instantiated first as a simulation environment (Sandbox mode) for development and testing, and can be triggered at any time as a collection of virtual machines (Cluster mode) deployed for large scale processing.
29 3 Herve Caumont
30 4 Herve Caumont
h2. Sandbox
31
32
A collection of software libraries providing a defined application programming interface (API).
33
34 1 Herve Caumont
h2. Task
35
36
To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+.
37 3 Herve Caumont
A +*job+* can then have one or more +*tasks*+.
38 4 Herve Caumont
39
h2. Virtual Machine template
40
41
Determines the properties of the virtual machine that can be created. A virtual machine template is always linked to a Cloud Provider. The template is composed of a definition of supported hardware type (e.g. “Small”, “HighPower”, ...) and supported Operating System.
42 3 Herve Caumont
43
h2. Workflow
44
45
An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier.
46
When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.