====== Grid from scratch ======= Disclaimer:
The following instructions are suggestions written with the intention of being useful for a user in the circumstances described below who follows them wisely. Please don't execute them if you are unsure what they do or how they will affect your system. No liability for damage/you_block_yourself_out/unsuccess will be accepted.
===== Building a Grid within a Ubuntu machine and the Globus/Gridway Toolkit ===== Probably, one of the most difficult tasks for those beginners who have heard about the goodness of grid computing is the set-up of a working infrastructure. This document provides lightening instructions to properly install and get ready a trivial example of a grid. You will be able to monitor and submit into a single cluster with one backend node, all within the same machine. Just follow the next commands with possible adaptation to your system name conventions. ==== Installation ==== ** Prerequisites ** Make sure you have java runtime environment and the corresponding developing kit installed (I recommend the sun package if you agree on its license). Some other non-default applications will be needed. sudo apt-get install libssl-dev xinetd ganglia-monitor sun-java6-jre sun-java6-jdk sudo apt-get install ant ** GridWay users ** To ease the configuration later on, we will join the GridWay users to the gwusers group. sudo addgroup gwusers sudo usermod -a -G gwusers `whoami` Now, create the configuration profile ''/etc/profile.d/globus.sh'' with the following content for i in `groups`; do if [ "$i" = 'gwusers' ]; then GLOBUS_HOME=/opt/globus GLOBUS_LOCATION=/opt/globus/4.2.1 GW_LOCATION=/opt/globus/4.2.1 JAVA_HOME=/usr/lib/jvm/java-6-sun ANT_HOME=/usr/share/ant PATH=$GLOBUS_LOCATION/bin:${PATH} LD_LIBRARY_PATH=$GW_LOCATION/lib:${LD_LIBRARY_PATH} export PATH LD_LIBRARY_PATH GLOBUS_HOME GLOBUS_LOCATION GW_LOCATION JAVA_HOME ANT_HOME fi done and load it into the shell . /etc/profile.d/globus.sh ** Administrative user ** Also as the globus Toolkit suggests, we are going to create a **globus** user who will later own initialize the services. sudo adduser --system --home $GLOBUS_HOME --ingroup gwusers --shell /bin/bash globus ** Downloading ** Get the latest stable source code from the [[http://www-unix.globus.org/toolkit/survey/index.php?download=gt4.2.1-all-source-installer.tar.gz| Globus Toolkit 4.2.1]] (137MB) and, after a free registration, save it at the current directory. Let's begin with the real installation of the package, with the following commands sudo mv gt4.2.1-all-source-installer.tar.gz $GLOBUS_HOME sudo chown globus:gwusers $GLOBUS_HOME/gt4.2.1-all-source-installer.tar.gz sudo su - globus gunzip -c gt4.2.1-all-source-installer.tar.gz | tar xvf - cd gt4.2.1-all-source-installer ./configure --prefix=$GLOBUS_LOCATION make and prepare yourself a long coffee since the complete Globus Toolkit installation takes hours to build. To finish this part, don't forget to make install exit ==== Setup of services ==== ** Certificate Authority ** We have to create a Certificate Authority (CA) who signs the certificates for hosts and users. The following steps set a CA up with the simplest method sudo su - globus cd $GLOBUS_LOCATION source $GLOBUS_LOCATION/etc/globus-user-env.sh $GLOBUS_LOCATION/setup/globus/setup-simple-ca and follow instructions. You will have to create PEM phrase for signing later all the certificates. The last part of the output advices you to finish the setup by running the setup-gsi script as root. exit sudo -E $GLOBUS_LOCATION/setup/globus_simple_ca_*_setup/setup-gsi where the * wildcards the CA hash. ** Host Certificate ** Now that we trust the CA, let's ask for a certificate for our frontend host. sudo -E $GLOBUS_LOCATION/bin/grid-cert-request -host `hostname --fqdn` After the certificate has been created for the machine, we have to sign it with our brand new authority sudo su - globus grid-ca-sign -in /etc/grid-security/hostcert_request.pem -out $GLOBUS_HOME/hostcert.pem exit At this point, both key and cert should belong to root be accessible by gridftp only through the container equivalents. sudo mv $GLOBUS_HOME/hostcert.pem /etc/grid-security/ sudo chown root:root /etc/grid-security/hostcert.pem sudo bash -c "cd /etc/grid-security; cp hostcert.pem containercert.pem; cp hostkey.pem containerkey.pem; chown globus:gwusers containercert.pem containerkey.pem" ** User certificate ** A similar certificate has to be created for the user who plans to submit jobs to the grid. You will be asked for a User PEM phrase to "sign in" each time before using the grid. source $GLOBUS_LOCATION/etc/globus-user-env.sh grid-cert-request be signed by the ''globus'' user: sudo -E -H -u globus bash -c "$GLOBUS_LOCATION/bin/grid-ca-sign -in $HOME/.globus/usercert_request.pem -out $GLOBUS_HOME/usercert.pem" place the usercert at the user's .globus directory with the correct ownership: sudo mv $GLOBUS_HOME/usercert.pem $HOME/.globus/usercert.pem sudo chown `id -u`:`id -g` $HOME/.globus/usercert.pem ** Gridmap-file ** The gridmap-file should contain information about users and certificates sudo -E bash -c "$GLOBUS_LOCATION/sbin/grid-mapfile-add-entry -dn \"`grid-cert-info -subject`\" -ln `whoami`" ** Starting the webservices container ** The init script should be copied: sudo cp $GLOBUS_LOCATION/etc/init.d/globus-ws-java-container /etc/init.d ** Setting up GRAM4 ** Modify the ''/etc/sudoers'' to allow ''globus'' to run any other user job sudo visudo by adding the following 3 lines Runas_Alias GLOBUSUSERS = ALL, !root; globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-job-manager-script.pl * globus ALL=(GLOBUSUSERS) NOPASSWD: /opt/globus/4.2.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/4.2.1/libexec/globus-gram-local-proxy-tool * ** Setting up Gridftp ** Create the file sudo gedit /etc/xinetd.d/gridftp with the following content service gsiftp { instances = 100 socket_type = stream wait = no user = root env += GLOBUS_LOCATION=/opt/globus/4.2.1 env += LD_LIBRARY_PATH=/opt/globus/4.2.1/lib server = /opt/globus/4.2.1/sbin/globus-gridftp-server server_args = -i log_on_success += DURATION disable = no } and add to the services sudo gedit /etc/services the appropriate line with the gsiftp port gsiftp 2811/tcp ** Starting services ** Finally, just reload the xinetd services and start the brand new globus sudo /etc/init.d/xinetd reload sudo -u globus /etc/init.d/globus-ws-java-container start ==== Configuring GridWay ==== ** Allowing to run the MADs ** Modify the ''/etc/sudoers'' to allow ''globus'' to run any other user job within the gwusers group sudo visudo by adding the following 4 lines Runas_Alias GWUSERS = %gwusers Defaults>GWUSERS env_keep="GW_LOCATION GLOBUS_LOCATION" globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_em_mad_ws * globus ALL=(GWUSERS) NOPASSWD: /opt/globus/4.2.1/bin/gw_tm_mad_ftp * ** Allowing Index Services ** Substitute line #25 from $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml by the result of executing printf "https://`hostname --fqdn`:8443/wsrf/services/DefaultIndexService\n" ** Web Services ** Modify the GridWay configuration file by uncommenting the MADs for Web Services (lines 107-109) sudo -u globus bash -c "mv $GLOBUS_LOCATION/etc/gridway/gwd.conf $GLOBUS_LOCATION/etc/gridway/gwd.conf~; cat $GLOBUS_LOCATION/etc/gridway/gwd.conf~ | sed s/'#IM_MAD = mds4:gw_im_mad_mds4_thr:-s cygnus.dacya.ucm.es:gridftp:ws'/\"IM_MAD = mds4:gw_im_mad_mds4_thr:-s `hostname --fqdn`:gridftp:ws\"/g| sed s/'#EM_MAD = ws:gw_em_mad_ws::rsl2'/'EM_MAD = ws:gw_em_mad_ws::rsl2'/g| sed s/'#TM_MAD = gridftp:gw_tm_mad_ftp:'/'TM_MAD = gridftp:gw_tm_mad_ftp:'/g > $GLOBUS_LOCATION/etc/gridway/gwd.conf" ** Enabling Ganglia ** sudo su - globus mds-gluerp-configure none ganglia $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/ganglia-config.xml mds-gluerp-configure fork ganglia $GLOBUS_LOCATION/etc/globus_wsrf_gram_Fork/gluerp-config.xml exit **Write permissions** sudo chmod g+w $GLOBUS_LOCATION/var/gridway/ === Launch the grid === This operation has to be done **everytime you boot** the system or after stopping manually the services sudo /etc/init.d/ganglia-monitor start sudo -u globus /etc/init.d/globus-ws-java-container start sudo su - globus -c "$GLOBUS_LOCATION/bin/gwd -m" and enjoy!! === Test === The first check of your grid will be to monitor the front nodes you have access to: gwhost -c 1 and the expected output after a few seconds of informatio refresh should be something like this HID PRIO OS ARCH MHZ %CPU MEM(F/T) DISK(F/T) N(U/F/T) LRMS HOSTNAME 0 1 Linux2.6.27-9- x86_6 800 133 90/1974 66907/81652 0/2/2 Fork myhost.mydomain ===== Common fatal warnings & error messages ===== * **Unknown host** error: If you go something like this [gwadmin@myhost bin]$ ./gwhost gethostbyname() : Unknown host FAILED: failed connection to gwd it is probably because your GridWay instance is running in a host without a Full Qualified Domain Name (FQDN), so the output of ''hostname -f'' also fails. Try editing your ''/etc/hosts'' file. 127.0.0.1 localhost.localdomain localhost hostname It may also appear as: error: globus_ftp_control: gss_init_sec_context failed GSS Major Status: Unexpected Gatekeeper or Service Name globus_gsi_gssapi: Authorization denied: The name of the remote host (ubuntu-desktop), and the expected name for the remote host (ubuntu-desktop.domainname) do not match. This happens when the name in the host certificate does not match the information obtained from DNS and is often a DNS configuration problem. * **Java error**: ''configure'' indicates that java and ant are **not** available in your system: checking for javac... no configure: WARNING: A Java compiler is needed for some parts of the toolkit configure: WARNING: This message can be ignored if you are only building the C parts of the toolkit checking for ant... no configure: WARNING: ant is needed for some parts of the toolkit configure: WARNING: If you know you will not need one, this message can be ignored * **Writing permissions**: You probably tried to run make without indicating at the configure pahse the prefix directory where you own written permissions: make: /usr/local/globus-4.2.0/sbin/gpt-build: Command not found make: *** [globus_core-thr] Error 127 * **libssl-dev missing**: The developer kit of ssl (''libssl-dev'') is not available in your system: configure: error: Unable to compile with SSL ERROR: Build has failed make: *** [globus_system_openssl-thr] Error 1 * **JAVA_HOME missing**: ''configure'' indicates that your java installation didn't set environment variables: configure: WARNING: JAVA_HOME is not set configure: WARNING: Most Java versions will not work correctly without JAVA_HOME set or the ''JAVA_HOME'' path is not properly set, usually something like this (''/usr/lib/jvm/java-6-sun'') drmaa/org/ggf/drmaa/DrmaaJNI.c:1:17: error: jni.h: No such file or directory In file included from drmaa/org/ggf/drmaa/DrmaaJNI.c:2: drmaa/org/ggf/drmaa/DrmaaJNI.h:15: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:23: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:31: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:39: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:47: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:55: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:63: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:71: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.h:79: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:87: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint' drmaa/org/ggf/drmaa/DrmaaJNI.h:95: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:103: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.h:111: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.h:119: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:21: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:23: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:45: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:58: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:100: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:126: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:157: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:208: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:223: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'void' drmaa/org/ggf/drmaa/DrmaaJNI.c:254: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:351: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jint' drmaa/org/ggf/drmaa/DrmaaJNI.c:369: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:384: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jobject' drmaa/org/ggf/drmaa/DrmaaJNI.c:403: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:417: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'jstring' drmaa/org/ggf/drmaa/DrmaaJNI.c:430: error: expected ')' before '*' token drmaa/org/ggf/drmaa/DrmaaJNI.c:648: error: expected ')' before '*' token make[2]: *** [drmaa/org/ggf/drmaa/DrmaaJNI.lo] Error 1 make[2]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/alorca/work/globus/gt4.2.1-all-source-installer/source-trees/gridway' ERROR: Build has failed