We have been asked by the GLUE schema working group to propose changes to Glue schema for version 2.0. However, it appears that we are not using Glue 1.3 to its full potential. This page details some of the features deployed in GLUE 1.3 that could be of use. See http://glueschema.forge.cnaf.infn.it/Spec/V13

Existing features

ComputingElement

This is the structure that describes a particular queue. As such it can be used to differentiate groups of nodes with different capabilities.

Capability

This is a list of strings advertised for a particular CE. We could use this to advertise MPI support at a queue level, rather than at cluster-level (effectively for the whole site). For example, I manually edited /opt/glite/etc/gip/ldif/static-file-CE.ldif to add the following entries to a particular queue:

GlueCECapability: MPI
GlueCECapability: MPICH

I was then able to distinguish between two queues at the same site using this expression in my JDL:

Requirements = Member("MPI",other.GlueCECapability);

Previously we were unable to distinguish between these queues as the only thing we could match on was the "MPICH" GlueHostApplicationSoftwareRunTimeEnvironment tag advertised for the whole site. (This actually has a real use case at our site already: we advertise both Condor and PBS queues through our CE and only want MPI jobs on PBS.)

MaxSlotsPerJob

According to the spec (p. 10) this is "The maximum number of slots which could be allocated to a single job (defined to be 1 for a site accepting only standard jobs)." This should be defined to be >1 for sites allowing multi-processor jobs, and could be used to indicate the maximum number of jobs in an MPI set allowed at a site.

SubCluster

The SubCluster structure represents a homogeneous set of nodes. It is at this level that detailed information such as CPU performance, network connectivity, etc. are specified.

ArchitectureSmpSize

This can be used to advertise the number of CPUs (or cores) on a single node. This can then be used by users to locate sites with an appropriate configuration for their jobs. For example if the user's MPI job is small (say 8 nodes), they could search for sites with SmpSize of 8 or more that could run their jobs quickly.

(I think some sites set this variable incorrectly to work around calculations performed wrongly by a particular VO. Will need to follow this up.)

Software

This entity has been added in version 1.3 and doesn't seem to be widely used yet. However, it has potential to be very useful for advertising versions of MPI. It contains the following subfields: LocalID, Name, Version, InstalledRoot, EnvironmentSetup (script for setting up the environment), ModuleName (for loading the appropriate module to set up environment). We'll need to find information on how to use this.

Proposed changes

The SubCluster schema should include a hook for publishing the internal network technology (i.e. the interconnect). This should probably be something as simple as NetworkInterconnect with a defined set of values.

mpi: Glue (last edited 2011-07-12 14:41:39 by localhost)