The department's networked filesystem uses IBM's General Parallel File System (GPFS) running on commodity hardware. GPFS is a clustered filesystem and should provide excellent scalability, in cost and performance, as the department continues to grow. The GPFS software allows us to re-export filesystems via NFS for Linux clients and we clustered CIFS for Windows.

This web page provides technical information about our filesystem cluster. This information is not needed to use the filesystem, but may be interesting to the curious user.

The Cluster

The department's GPFS cluster is comprised of over a dozen servers. These servers are divided into discrete groups, with each group of servers performing a specific task. Each class of server is described below.

Network Shared Disk (NSD) Nodes

The cluster's physical disk is currently comprised of two IBM storage arrays attached to four IBM servers, which provide the disk to the rest of the cluster. For redundancy, each storage array is attached to two servers, allowing the cluster to withstand the failure of either of the two NSD servers in a pair. Users do not access these servers directly, but go via the NFS/CIFS gateways instead.

Manager Nodes

These nodes handle the configuration and management of the cluster. They monitor disk leases, detect failures and handle recoveries, distribute configuration changes, manage disk quota, and process changes to the filesystem. These servers are paired for redundancy.

NFS/CIFS Gateways

These nodes run NFS and CIFS servers in order to export the filesystem to Linux and Windows clients. These servers allow clients to access the GPFS filesystem without being part of the GPFS cluster themselves.

The NFS servers use clustered NFS (CNFS), which is built on top of the native Linux NFS server. If any one of the servers goes down, CNFS will automatically fail over to another server in the cluster. However, all clients connected to the failed server will hang for 90 seconds during the fail over. The hostname uses DNS round robin to load balance the servers.

The CIFS servers use Samba, and sits on top of the clustered trivial database (CTDB), which allows state (such as file locks) to be shared between all the CIFS servers in the cluster. If a CIFS server goes down, another server will take over. The hostname uses DNS round robin for load balancing.

In addition, there are three dedicated nodes that provide NFS to the compute cluster. These servers are kept separate so that I/O-intensive grid jobs cannot disrupt the rest of the department.

Two additional dedicated nodes provide NFS to servers in the DMZ. These servers are kept separate to enhance security.

Backup Nodes

Finally, there are several nodes dedicated to doing nightly backups. Separate servers are used due to the heavy load demands of the backup software.