Table of Contents

Grid from scratch

Disclaimer:

The following instructions are suggestions written with the intention of being useful for a user in the circumstances described below who follows them wisely. Please don't execute them if you are unsure what they do or how they will affect your system. No liability for damage/you_block_yourself_out/unsuccess will be accepted.

Building a Grid within a Ubuntu machine and the Globus/Gridway Toolkit

Probably, one of the most difficult tasks for those beginners who have heard about the goodness of grid computing is the set-up of a working infrastructure. This document provides lightening instructions to properly install and get ready a trivial example of a grid. You will be able to monitor and submit into a single cluster with one backend node, all within the same machine. Just follow the next commands with possible adaptation to your system name conventions.

Installation

Prerequisites Make sure you have java runtime environment and the corresponding developing kit installed (I recommend the sun package if you agree on its license). Some other non-default applications will be needed.

sudo apt-get install libssl-dev xinetd ganglia-monitor sun-java6-jre sun-java6-jdk
sudo apt-get install ant

GridWay users To ease the configuration later on, we will join the GridWay users to the gwusers group.

sudo addgroup gwusers
sudo usermod -a -G gwusers `whoami`

Now, create the configuration profile /etc/profile.d/globus.sh with the following content

for i in `groups`; do
 if [ "$i" = 'gwusers' ]; then
  GLOBUS_HOME=/opt/globus
  GLOBUS_LOCATION=/opt/globus/4.2.1
  GW_LOCATION=/opt/globus/4.2.1
  JAVA_HOME=/usr/lib/jvm/java-6-sun
  ANT_HOME=/usr/share/ant
  PATH=$GLOBUS_LOCATION/bin:${PATH}
  LD_LIBRARY_PATH=$GW_LOCATION/lib:${LD_LIBRARY_PATH}
  export PATH LD_LIBRARY_PATH GLOBUS_HOME GLOBUS_LOCATION GW_LOCATION JAVA_HOME ANT_HOME
 fi
done

and load it into the shell

. /etc/profile.d/globus.sh

Administrative user Also as the globus Toolkit suggests, we are going to create a globus user who will later own initialize the services.

sudo adduser --system --home $GLOBUS_HOME --ingroup gwusers --shell /bin/bash globus

Downloading Get the latest stable source code from the Globus Toolkit 4.2.1 (137MB) and, after a free registration, save it at the current directory.

Let's begin with the real installation of the package, with the following commands

sudo mv gt4.2.1-all-source-installer.tar.gz $GLOBUS_HOME
sudo chown globus:gwusers $GLOBUS_HOME/gt4.2.1-all-source-installer.tar.gz
sudo su - globus
gunzip -c gt4.2.1-all-source-installer.tar.gz | tar xvf -
cd gt4.2.1-all-source-installer
./configure --prefix=$GLOBUS_LOCATION
make

and prepare yourself a long coffee since the complete Globus Toolkit installation takes hours to build. To finish this part, don't forget to

make install
exit

Setup of services

Certificate Authority We have to create a Certificate Authority (CA) who signs the certificates for hosts and users. The following steps set a CA up with the simplest method

sudo su - globus
cd $GLOBUS_LOCATION
source $GLOBUS_LOCATION/etc/globus-user-env.sh
$GLOBUS_LOCATION/setup/globus/setup-simple-ca

and follow instructions. You will have to create PEM phrase for signing later all the certificates. The last part of the output advices you to finish the setup by running the setup-gsi script as root.

exit
sudo -E $GLOBUS_LOCATION/setup/globus_simple_ca_*_setup/setup-gsi

where the * wildcards the CA hash.

Host Certificate Now that we trust the CA, let's ask for a certificate for our frontend host.

sudo -E $GLOBUS_LOCATION/bin/grid-cert-request -host `hostname --fqdn`

After the certificate has been created for the machine, we have to sign it with our brand new authority

sudo su - globus
grid-ca-sign -in /etc/grid-security/hostcert_request.pem -out $GLOBUS_HOME/hostcert.pem
exit

At this point, both key and cert should belong to root be accessible by gridftp only through the container equivalents.

sudo mv $GLOBUS_HOME/hostcert.pem /etc/grid-security/
sudo chown root:root /etc/grid-security/hostcert.pem
sudo bash -c "cd /etc/grid-security; cp hostcert.pem containercert.pem; cp hostkey.pem containerkey.pem; chown globus:gwusers containercert.pem containerkey.pem"

User certificate A similar certificate has to be created for the user who plans to submit jobs to the grid. You will be asked for a User PEM phrase to “sign in” each time before using the grid.

source $GLOBUS_LOCATION/etc/globus-user-env.sh
grid-cert-request

be signed by the globus user:

sudo -E -H -u globus bash -c "$GLOBUS_LOCATION/bin/grid-ca-sign -in $HOME/.globus/usercert_request.pem -out $GLOBUS_HOME/usercert.pem"

place the usercert at the user's .globus directory with the correct ownership:

sudo mv $GLOBUS_HOME/usercert.pem $HOME/.globus/usercert.pem
sudo chown `id -u`:`id -g` $HOME/.globus/usercert.pem

Gridmap-file The gridmap-file should contain information about users and certificates

sudo -E bash -c "$GLOBUS_LOCATION/sbin/grid-mapfile-add-entry -dn \"`grid-cert-info -subject`\" -ln `whoami`"

Starting the webservices container The init script should be copied:

sudo cp $GLOBUS_LOCATION/etc/init.d/globus-ws-java-container /etc/init.d

Setting up GRAM4 Modify the /etc/sudoers to allow globus to run any other user job

sudo visudo

by adding the following 3 lines

Runas_Alias GLOBUSUSERS = ALL, !root;
globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-job-manager-script.pl *
globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-gram-local-proxy-tool *

Setting up Gridftp Create the file

sudo gedit /etc/xinetd.d/gridftp

with the following content

service gsiftp
{
instances               = 100
socket_type             = stream
wait                    = no
user                    = root
env                     += GLOBUS_LOCATION=/opt/globus/4.2.1
env                     += LD_LIBRARY_PATH=/opt/globus/4.2.1/lib
server                  = /opt/globus/4.2.1/sbin/globus-gridftp-server
server_args             = -i
log_on_success          += DURATION
disable                 = no
}

and add to the services

sudo gedit /etc/services

the appropriate line with the gsiftp port

gsiftp          2811/tcp

Starting services Finally, just reload the xinetd services and start the brand new globus

sudo /etc/init.d/xinetd reload
sudo -u globus /etc/init.d/globus-ws-java-container start

Configuring GridWay

Allowing to run the MADs Modify the /etc/sudoers to allow globus to run any other user job within the gwusers group

sudo visudo

by adding the following 4 lines

Runas_Alias GWUSERS = %gwusers
Defaults>GWUSERS env_keep="GW_LOCATION GLOBUS_LOCATION"
globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_em_mad_ws *
globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_tm_mad_ftp *

Allowing Index Services Substitute line #25 from $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml by the result of executing

printf "<upstream>https://`hostname --fqdn`:8443/wsrf/services/DefaultIndexService</upstream>\n"

Web Services Modify the GridWay configuration file by uncommenting the MADs for Web Services (lines 107-109)

sudo -u globus bash -c "mv $GLOBUS_LOCATION/etc/gridway/gwd.conf $GLOBUS_LOCATION/etc/gridway/gwd.conf~; cat $GLOBUS_LOCATION/etc/gridway/gwd.conf~ | sed s/'#IM_MAD = mds4:gw_im_mad_mds4_thr:-s cygnus.dacya.ucm.es:gridftp:ws'/\"IM_MAD = mds4:gw_im_mad_mds4_thr:-s `hostname --fqdn`:gridftp:ws\"/g| sed s/'#EM_MAD = ws:gw_em_mad_ws::rsl2'/'EM_MAD = ws:gw_em_mad_ws::rsl2'/g| sed s/'#TM_MAD = gridftp:gw_tm_mad_ftp:'/'TM_MAD = gridftp:gw_tm_mad_ftp:'/g > $GLOBUS_LOCATION/etc/gridway/gwd.conf"

Enabling Ganglia

sudo su - globus
mds-gluerp-configure none ganglia $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/ganglia-config.xml
mds-gluerp-configure fork ganglia $GLOBUS_LOCATION/etc/globus_wsrf_gram_Fork/gluerp-config.xml
exit

Write permissions

sudo chmod g+w $GLOBUS_LOCATION/var/gridway/

Launch the grid

This operation has to be done everytime you boot the system or after stopping manually the services

sudo /etc/init.d/ganglia-monitor start
sudo -u globus /etc/init.d/globus-ws-java-container start
sudo su - globus -c "$GLOBUS_LOCATION/bin/gwd -m"

and enjoy!!

Test

The first check of your grid will be to monitor the front nodes you have access to:

gwhost -c 1

and the expected output after a few seconds of informatio refresh should be something like this

HID PRIO  OS              ARCH   MHZ %CPU  MEM(F/T)     DISK(F/T)     N(U/F/T) LRMS                 HOSTNAME            
0   1     Linux2.6.27-9- x86_6  800  133   90/1974   66907/81652        0/2/2 Fork                 myhost.mydomain

Common fatal warnings & error messages

* Unknown host error: If you go something like this

[gwadmin@myhost bin]$ ./gwhost 
gethostbyname() : Unknown host
FAILED: failed connection to gwd

it is probably because your GridWay instance is running in a host without a Full Qualified Domain Name (FQDN), so the output of hostname -f also fails. Try editing your /etc/hosts file.

 127.0.0.1      localhost.localdomain    localhost hostname

It may also appear as:

error: globus_ftp_control: gss_init_sec_context failed
GSS Major Status: Unexpected Gatekeeper or Service Name
globus_gsi_gssapi: Authorization denied: The name of the remote host (ubuntu-desktop), and the expected name for the remote host (ubuntu-desktop.domainname) do not match. This happens when the name in the host certificate does not match the information obtained from DNS and is often a DNS configuration problem.

* Java error: configure indicates that java and ant are not available in your system:

checking for javac... no
configure: WARNING: A Java compiler is needed for some parts of the toolkit
configure: WARNING: This message can be ignored if you are only building the C parts of the toolkit
checking for ant... no
configure: WARNING: ant is needed for some parts of the toolkit
configure: WARNING: If you know you will not need one, this message can be ignored

* Writing permissions: You probably tried to run make without indicating at the configure pahse the prefix directory where you own written permissions:

make: /usr/local/globus-4.2.0/sbin/gpt-build: Command not found
make: *** [globus_core-thr] Error 127

* libssl-dev missing: The developer kit of ssl (libssl-dev) is not available in your system:

configure: error: Unable to compile with SSL

ERROR: Build has failed
make: *** [globus_system_openssl-thr] Error 1

* JAVA_HOME missing: configure indicates that your java installation didn't set environment variables:

configure: WARNING: JAVA_HOME is not set
configure: WARNING: Most Java versions will not work correctly without JAVA_HOME set
<code>
or the ''JAVA_HOME'' path is not properly set, usually something like this (''/usr/lib/jvm/java-6-sun'')
<code>
drmaa/org/ggf/drmaa/DrmaaJNI.c:1:17: error: jni.h: No such file or directory
In file included from drmaa/org/ggf/drmaa/DrmaaJNI.c:2:
drmaa/org/ggf/drmaa/DrmaaJNI.h:15: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.h:23: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.h:31: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.h:39: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.h:47: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.h:55: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.h:63: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.h:71: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.h:79: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.h:87: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint'
drmaa/org/ggf/drmaa/DrmaaJNI.h:95: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.h:103: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.h:111: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.h:119: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.c:21: error: expected ')' before '*' token
drmaa/org/ggf/drmaa/DrmaaJNI.c:23: error: expected ')' before '*' token
drmaa/org/ggf/drmaa/DrmaaJNI.c:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.c:45: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.c:58: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.c:100: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.c:126: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.c:157: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.c:208: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.c:223: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void'
drmaa/org/ggf/drmaa/DrmaaJNI.c:254: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.c:351: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint'
drmaa/org/ggf/drmaa/DrmaaJNI.c:369: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.c:384: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject'
drmaa/org/ggf/drmaa/DrmaaJNI.c:403: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.c:417: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring'
drmaa/org/ggf/drmaa/DrmaaJNI.c:430: error: expected ')' before '*' token
drmaa/org/ggf/drmaa/DrmaaJNI.c:648: error: expected ')' before '*' token
make[2]: *** [drmaa/org/ggf/drmaa/DrmaaJNI.lo] Error 1
make[2]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway'

ERROR: Build has failed