Sandbox terminology » History » Revision 3
Revision 2 (Herve Caumont, 2013-06-20 15:04) → Revision 3/5 (Herve Caumont, 2013-09-20 18:02)
h1. Sandbox Terminology and Definitions This page contains the CIOP terminology and definitions related to the use of the Developer Cloud Sandboxes service. h2. Cloud Computing Operational Pilots (CIOP) A project funded by ESA (March 2011 - September 2013) to deliver proof of concept for an Earth Observations Sandbox Service, , and that led to the creation of a set of tools (ciop command line tools) that are deployed as a baseline in a Developer Cloud Sandboxes service. h2. Directed Acyclic Graph (DAG) In mathematics and computer science, a Directed Acyclic Graph (DAG), is a directed graph with no directed cycles. DAGs may be used to model several different kinds of structure. By definition a DAG is a directed graph with no path that starts and ends at the same vertex. DAGs may be used to model several different kinds of structures (http://en.wikipedia.org/wiki/Directed_acyclic_graph) This notion is in place in the G-POD processing flows, flows and is used in the ESA EO Toolboxes toolboxes such as BEAM and NeST. The CIOP tools embedded in Developer Cloud Sandboxes rely on uses the DAG to describe a scientific the science application +*workflow+*. A CIOP workflow is a DAG where nodes of the DAG are jobs. The execution of a job may have one or more identical tasks. h2. Hadoop MapReduce Streaming A Developer Cloud Sandboxes service comes with an Application Template that embeds a Hadoop MapReduce Framework, to support distributed computing over large datasets on clusters of virtual machines. The provided implementation of MapReduce is used in a simple way, that maps URLs of input data to be processed over "single shell command execution" on virtual machines, and reduces the output data URLs to a final results list. This Framework can be deployed accordingly to the virtual machine mode (sandbox or cluster), which supports the Developer Cloud Sandboxes service use cases for both application development and testing (Sandbox mode) and application deployment for large scale processing (Cluster mode). h2. Task To exploit the parallelism offered by the CIOP framework, a +*job+* may process its input in several tasks. In principle, the CIOP framework runs the tasks in parallel within a +*job*+. A +*job+* can then have one or more +*tasks*+. h2. Workflow An application created with the Developer Cloud Sandboxes service is always identified as a process workflow with an identifier. When a job simulation is submitted with a ciop-simjob command, the workflow identifier is always the id that was defined in the application.xml file.