Disclaimer:
Probably, one of the most difficult tasks for those beginners who have heard about the goodness of grid computing is the set-up of a working infrastructure. This document provides lightening instructions to properly install and get ready a trivial example of a grid. You will be able to monitor and submit into a single cluster with one backend node, all within the same machine. Just follow the next commands with possible adaptation to your system name conventions.
Prerequisites Make sure you have java runtime environment and the corresponding developing kit installed (I recommend the sun package if you agree on its license). Some other non-default applications will be needed.
sudo apt-get install libssl-dev xinetd ganglia-monitor sun-java6-jre sun-java6-jdk sudo apt-get install ant
GridWay users To ease the configuration later on, we will join the GridWay users to the gwusers group.
sudo addgroup gwusers sudo usermod -a -G gwusers `whoami`
Now, create the configuration profile /etc/profile.d/globus.sh
with the following content
for i in `groups`; do if [ "$i" = 'gwusers' ]; then GLOBUS_HOME=/opt/globus GLOBUS_LOCATION=/opt/globus/4.2.1 GW_LOCATION=/opt/globus/4.2.1 JAVA_HOME=/usr/lib/jvm/java-6-sun ANT_HOME=/usr/share/ant PATH=$GLOBUS_LOCATION/bin:${PATH} LD_LIBRARY_PATH=$GW_LOCATION/lib:${LD_LIBRARY_PATH} export PATH LD_LIBRARY_PATH GLOBUS_HOME GLOBUS_LOCATION GW_LOCATION JAVA_HOME ANT_HOME fi done
and load it into the shell
. /etc/profile.d/globus.sh
Administrative user Also as the globus Toolkit suggests, we are going to create a globus user who will later own initialize the services.
sudo adduser --system --home $GLOBUS_HOME --ingroup gwusers --shell /bin/bash globus
Downloading Get the latest stable source code from the Globus Toolkit 4.2.1 (137MB) and, after a free registration, save it at the current directory.
Let's begin with the real installation of the package, with the following commands
sudo mv gt4.2.1-all-source-installer.tar.gz $GLOBUS_HOME sudo chown globus:gwusers $GLOBUS_HOME/gt4.2.1-all-source-installer.tar.gz sudo su - globus gunzip -c gt4.2.1-all-source-installer.tar.gz | tar xvf - cd gt4.2.1-all-source-installer ./configure --prefix=$GLOBUS_LOCATION make
and prepare yourself a long coffee since the complete Globus Toolkit installation takes hours to build. To finish this part, don't forget to
make install exit
Certificate Authority We have to create a Certificate Authority (CA) who signs the certificates for hosts and users. The following steps set a CA up with the simplest method
sudo su - globus cd $GLOBUS_LOCATION source $GLOBUS_LOCATION/etc/globus-user-env.sh $GLOBUS_LOCATION/setup/globus/setup-simple-ca
and follow instructions. You will have to create PEM phrase for signing later all the certificates. The last part of the output advices you to finish the setup by running the setup-gsi script as root.
exit sudo -E $GLOBUS_LOCATION/setup/globus_simple_ca_*_setup/setup-gsi
where the * wildcards the CA hash.
Host Certificate Now that we trust the CA, let's ask for a certificate for our frontend host.
sudo -E $GLOBUS_LOCATION/bin/grid-cert-request -host `hostname --fqdn`
After the certificate has been created for the machine, we have to sign it with our brand new authority
sudo su - globus grid-ca-sign -in /etc/grid-security/hostcert_request.pem -out $GLOBUS_HOME/hostcert.pem exit
At this point, both key and cert should belong to root be accessible by gridftp only through the container equivalents.
sudo mv $GLOBUS_HOME/hostcert.pem /etc/grid-security/ sudo chown root:root /etc/grid-security/hostcert.pem sudo bash -c "cd /etc/grid-security; cp hostcert.pem containercert.pem; cp hostkey.pem containerkey.pem; chown globus:gwusers containercert.pem containerkey.pem"
User certificate A similar certificate has to be created for the user who plans to submit jobs to the grid. You will be asked for a User PEM phrase to “sign in” each time before using the grid.
source $GLOBUS_LOCATION/etc/globus-user-env.sh grid-cert-request
be signed by the globus
user:
sudo -E -H -u globus bash -c "$GLOBUS_LOCATION/bin/grid-ca-sign -in $HOME/.globus/usercert_request.pem -out $GLOBUS_HOME/usercert.pem"
place the usercert at the user's .globus directory with the correct ownership:
sudo mv $GLOBUS_HOME/usercert.pem $HOME/.globus/usercert.pem sudo chown `id -u`:`id -g` $HOME/.globus/usercert.pem
Gridmap-file The gridmap-file should contain information about users and certificates
sudo -E bash -c "$GLOBUS_LOCATION/sbin/grid-mapfile-add-entry -dn \"`grid-cert-info -subject`\" -ln `whoami`"
Starting the webservices container The init script should be copied:
sudo cp $GLOBUS_LOCATION/etc/init.d/globus-ws-java-container /etc/init.d
Setting up GRAM4
Modify the /etc/sudoers
to allow globus
to run any other user job
sudo visudo
by adding the following 3 lines
Runas_Alias GLOBUSUSERS = ALL, !root; globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-job-manager-script.pl * globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-gram-local-proxy-tool *
Setting up Gridftp Create the file
sudo gedit /etc/xinetd.d/gridftp
with the following content
service gsiftp { instances = 100 socket_type = stream wait = no user = root env += GLOBUS_LOCATION=/opt/globus/4.2.1 env += LD_LIBRARY_PATH=/opt/globus/4.2.1/lib server = /opt/globus/4.2.1/sbin/globus-gridftp-server server_args = -i log_on_success += DURATION disable = no }
and add to the services
sudo gedit /etc/services
the appropriate line with the gsiftp port
gsiftp 2811/tcp
Starting services Finally, just reload the xinetd services and start the brand new globus
sudo /etc/init.d/xinetd reload sudo -u globus /etc/init.d/globus-ws-java-container start
Allowing to run the MADs
Modify the /etc/sudoers
to allow globus
to run any other user job within the gwusers group
sudo visudo
by adding the following 4 lines
Runas_Alias GWUSERS = %gwusers Defaults>GWUSERS env_keep="GW_LOCATION GLOBUS_LOCATION" globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_em_mad_ws * globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_tm_mad_ftp *
Allowing Index Services Substitute line #25 from $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml by the result of executing
printf "<upstream>https://`hostname --fqdn`:8443/wsrf/services/DefaultIndexService</upstream>\n"
Web Services Modify the GridWay configuration file by uncommenting the MADs for Web Services (lines 107-109)
sudo -u globus bash -c "mv $GLOBUS_LOCATION/etc/gridway/gwd.conf $GLOBUS_LOCATION/etc/gridway/gwd.conf~; cat $GLOBUS_LOCATION/etc/gridway/gwd.conf~ | sed s/'#IM_MAD = mds4:gw_im_mad_mds4_thr:-s cygnus.dacya.ucm.es:gridftp:ws'/\"IM_MAD = mds4:gw_im_mad_mds4_thr:-s `hostname --fqdn`:gridftp:ws\"/g| sed s/'#EM_MAD = ws:gw_em_mad_ws::rsl2'/'EM_MAD = ws:gw_em_mad_ws::rsl2'/g| sed s/'#TM_MAD = gridftp:gw_tm_mad_ftp:'/'TM_MAD = gridftp:gw_tm_mad_ftp:'/g > $GLOBUS_LOCATION/etc/gridway/gwd.conf"
Enabling Ganglia
sudo su - globus mds-gluerp-configure none ganglia $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/ganglia-config.xml mds-gluerp-configure fork ganglia $GLOBUS_LOCATION/etc/globus_wsrf_gram_Fork/gluerp-config.xml exit
Write permissions
sudo chmod g+w $GLOBUS_LOCATION/var/gridway/
This operation has to be done everytime you boot the system or after stopping manually the services
sudo /etc/init.d/ganglia-monitor start sudo -u globus /etc/init.d/globus-ws-java-container start sudo su - globus -c "$GLOBUS_LOCATION/bin/gwd -m"
and enjoy!!
The first check of your grid will be to monitor the front nodes you have access to:
gwhost -c 1
and the expected output after a few seconds of informatio refresh should be something like this
HID PRIO OS ARCH MHZ %CPU MEM(F/T) DISK(F/T) N(U/F/T) LRMS HOSTNAME 0 1 Linux2.6.27-9- x86_6 800 133 90/1974 66907/81652 0/2/2 Fork myhost.mydomain
* Unknown host error: If you go something like this
[gwadmin@myhost bin]$ ./gwhost gethostbyname() : Unknown host FAILED: failed connection to gwd
it is probably because your GridWay instance is running in a host without a Full Qualified Domain Name (FQDN), so the output of hostname -f
also fails. Try editing your /etc/hosts
file.
127.0.0.1 localhost.localdomain localhost hostname
It may also appear as:
error: globus_ftp_control: gss_init_sec_context failed GSS Major Status: Unexpected Gatekeeper or Service Name globus_gsi_gssapi: Authorization denied: The name of the remote host (ubuntu-desktop), and the expected name for the remote host (ubuntu-desktop.domainname) do not match. This happens when the name in the host certificate does not match the information obtained from DNS and is often a DNS configuration problem.
* Java error: configure
indicates that java and ant are not available in your system:
checking for javac... no configure: WARNING: A Java compiler is needed for some parts of the toolkit configure: WARNING: This message can be ignored if you are only building the C parts of the toolkit checking for ant... no configure: WARNING: ant is needed for some parts of the toolkit configure: WARNING: If you know you will not need one, this message can be ignored
* Writing permissions: You probably tried to run make without indicating at the configure pahse the prefix directory where you own written permissions:
make: /usr/local/globus-4.2.0/sbin/gpt-build: Command not found make: *** [globus_core-thr] Error 127
* libssl-dev missing: The developer kit of ssl (libssl-dev
) is not available in your system:
configure: error: Unable to compile with SSL ERROR: Build has failed make: *** [globus_system_openssl-thr] Error 1
* JAVA_HOME missing: configure
indicates that your java installation didn't set environment variables:
configure: WARNING: JAVA_HOME is not set configure: WARNING: Most Java versions will not work correctly without JAVA_HOME set <code> or the ''JAVA_HOME'' path is not properly set, usually something like this (''/usr/lib/jvm/java-6-sun'') <code> drmaa/org/ggf/drmaa/DrmaaJNI.c:1:17: error: jni.h: No such file or directory In file included from drmaa/org/ggf/drmaa/DrmaaJNI.c:2: drmaa/org/ggf/drmaa/DrmaaJNI.h:15: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:23: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:31: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:39: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:47: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:55: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:63: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:71: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:79: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:87: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint' drmaa/org/ggf/drmaa/DrmaaJNI.h:95: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:103: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:111: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:119: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:21: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:23: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:45: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:58: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:100: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:126: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:157: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:208: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:223: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:254: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:351: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint' drmaa/org/ggf/drmaa/DrmaaJNI.c:369: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:384: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:403: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:417: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:430: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:648: error: expected ')' before '*' token make[2]: *** [drmaa/org/ggf/drmaa/DrmaaJNI.lo] Error 1 make[2]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway' ERROR: Build has failed