Understanding the sandbox » History » Version 1
Herve Caumont, 2013-06-20 15:02
1 | 1 | Herve Caumont | h1. Understanding the Sandbox |
---|---|---|---|
2 | |||
3 | {{>toc}} |
||
4 | |||
5 | h2. The Sandbox filesystems |
||
6 | |||
7 | In the context of the application life-cycle, the Sandbox has three filesystems (or directory): |
||
8 | * /home/<user> that we refer to as *HOME* |
||
9 | * /application that we refer to as *APPLICATION* |
||
10 | * /share that we refer to as *SHARE* |
||
11 | |||
12 | h3. HOME directory |
||
13 | |||
14 | > A user's home directory is intended to contain that user's files; including text documents, music, pictures or videos, etc. It may also include their configuration files of preferred settings for any software they have used there and might have tailored to their liking: web browser bookmarks, favorite desktop wallpaper and themes, passwords to any external services accessed via a given software, etc. The user can install executable software in this directory, but it will only be available to users with permission to this directory. The home directory can be organized further with the use of sub-directories. |
||
15 | |||
16 | As such, the *HOME* is used to store the user's files. It can be used to store source files (the compiled programs would then go *APPLICATION*). |
||
17 | |||
18 | > At job or workflow execution time, the Sandbox uses a system user to execute the application. This system user cannot read files in *HOME*. |
||
19 | |||
20 | > When the application is run on the Sandbox Runtime Environment, the *HOME* directory is not available in any of the computing nodes. |
||
21 | |||
22 | h3. APPLICATION filesystem |
||
23 | |||
24 | The *APPLICATION* filesystem contains all the files required to run the application. |
||
25 | |||
26 | The *APPLICATION* filesystem is available on the Sandbox as /application. |
||
27 | |||
28 | > Whenever an application wrapper script needs to use the *APPLICATION* value (/application) the variable $_CIOP_APPLICATION_PATH, example: |
||
29 | |||
30 | <pre> |
||
31 | export BEAM_HOME=$_CIOP_APPLICATION_PATH/common/beam-4.11 |
||
32 | </pre> |
||
33 | |||
34 | The *APPLICATION* contains |
||
35 | * the Application Descriptor File, named _application.xml_ and described here: [[Application descriptor]] |
||
36 | * a folder for each job template |
||
37 | |||
38 | A *job template* folder contains: |
||
39 | * the streaming executable, a script that deals with the _stdin_ managed by the Sandbox (e.g. EO data URLs to be passed to ciop-copy). There isn't a defined naming convention although it is often called _run_. |
||
40 | |||
41 | > Tip: The streaming executable will read its inputs via stdin managed by the Hadoop Map Reduce streaming underlying layer |
||
42 | |||
43 | * a set of folders such as: |
||
44 | ** /application/<job template name>/bin standing for "binaries" and contains certain fundamental job utilities which are in part needed by the job wrapper script. |
||
45 | ** /application/<job template name>/etc containing job-wide configuration files |
||
46 | ** /application/<job template name>/lib containing the job libraries |
||
47 | ** ... |
||
48 | |||
49 | > There aren't any particular rules for the folders in the job template folder |
||
50 | |||
51 | The *APPLICATION* of a workflow with two jobs can then be represented as |
||
52 | |||
53 | /application/ |
||
54 | application.xml |
||
55 | /job_template_1 |
||
56 | run |
||
57 | /bin |
||
58 | /etc |
||
59 | /job_template_2 |
||
60 | run |
||
61 | /bin |
||
62 | /lib |
||
63 | |||
64 | h3. SHARE filesystem |
||
65 | |||
66 | The *SHARE* filesystem is the Sandbox distributed filesystem mount point. It is a HDFS filesystem used to store the application's job outputs generated by the execution of ciop-simjob and/or ciop-simwf. |
||
67 | |||
68 | The *SHARE* filesystem is available on the Sandbox as /share and the HDFS distributed filesystem acces point is /tmp thus, on the Sandbox, /share/tmp is the root of the distributed filesysyem. |
||
69 | |||
70 | h4. SHARE for ciop-simjob |
||
71 | |||
72 | When the ciop-simjob is invoked to run a node of the workflow, the outputs are found in: |
||
73 | |||
74 | <pre> |
||
75 | /share/tmp/sandbox/<workflow name>/<node name> |
||
76 | </pre> |
||
77 | |||
78 | A job can be executed several times but the results of a previous execution will be deleted. |
||
79 | |||
80 | > Tip: the workflow and node names are found in the Application Descriptor File, named _application.xml_ and described here: [[Application descriptor]] |
||
81 | |||
82 | > Tip: ciop-simjob -n will list the workflow node name(s), check the ciop-simjob reference page here: [[ciop-simjob]] |
||
83 | |||
84 | h4. SHARE for ciop-simwf |
||
85 | |||
86 | When the ciop-simwf is invoked to run the complete application workflow, the outputs are found in a dedicated folder under *SHARE*: |
||
87 | |||
88 | <pre> |
||
89 | /share/tmp/sandbox/run/<run identifier>/<node name>/data |
||
90 | </pre> |
||
91 | |||
92 | Contrarly to ciop-simjob, ciop-simwf keeps all workflow execution runs. This feature allows comparing the results of different sets of parameters for example. |
||
93 | |||
94 | > Tip: check the [[Application descriptor]] page to define default parameter values and how to override these in the workflow |