Scalability Blog

Scaling tips, insights, updates, culture, and more from our Server Experts.
 

Using GlusterFS On Your Managed Server

You will first need to setup a distributed GlusterFS storage cluster and follow these instructions:

First, we will have to install  EPEL repository:

[root@webserver ~]# rpm -Uvh http://mirror.symnds.com/distributions/fedora-epel/6/x86_64/epel-release-6-8.noarch.rpm
Retrieving http://mirror.symnds.com/distributions/fedora-epel/6/x86_64/epel-release-6-8.noarch.rpm
warning: /var/tmp/rpm-tmp.CjOwN6: Header V3 RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...                ########################################### [100%]
1:epel-release           ########################################### [100%]

Now we’ll install the necessary packages:

[root@webserver ~]# yum -y install glusterfs-fuse glusterfs

Place the same hosts file in /etc/hosts as on GlusterFS nodes.  We will also create a folder /mnt/glusterfs to use as our mount point, and place any node in /etc/fstab :

gluster1:/gluster /mnt/glusterfs glusterfs rw,allow_other,default_permissions,max_read=131072 0 0

To mount, type mount -a

To manually mount the GlusterFS storage node:

mount -t glusterfs gluster1:/gluster /mnt/glusterfs

And as a final touch, we can verify just how much storage space we have :

[root@webserver ~]# df -h /mnt/glusterfs/
Filesystem            Size  Used Avail Use% Mounted on
gluster1:/gluster      99G  4.2G   90G   5% /mnt/glusterfs

That is a total of 90GB of storage capacity distributed across 5 servers.
We can test this setup by writing a 1GB file to this mount:

[root@webserver ~]# dd if=/dev/zero of=/mnt/glusterfs/1GB bs=1024 count=1048576

1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 239.994 s, 4.5 MB/s

This file ended up on gluster1:

[root@gluster1 ~]# ls -lah /exp1/
total 1.1G
drwxr-xr-x  2 root root 4.0K Jan  1 11:12 .
drwxr-xr-x 23 root root 4.0K Jan  1 09:52 ..
-rw-r--r--  1 root root 1.0G Jan  1 11:16 1GB

We should verify that files will be randomly written across all 5 servers by generating a hundred smaller files:

[root@webserver ~]# for i in `seq 1 100`; do dd if=/dev/zero of=/mnt/glusterfs/$i bs=1024 count=1; done

Out of 100 files generated, the distribution was:

24 on gluster1, 24 on gluster2, 17 on gluster3, 20 on gluster4, and 15 on gluster5.

Therefore, the files are distributed pretty evenly across the entire cluster.  Using this setup you can scale your storage quickly and with hardware of variable storage capacity.

But what if you will have files that are too large to fit on any one storage node?  You can create a distributed striped GlusterFS volume, striped across 5 storage nodes:

[root@gluster1 ~]# gluster volume create largefiles stripe 5 transport tcp gluster1:/large1 gluster2:/large2 gluster3:/large3 gluster4:/large4 gluster5:/large5
Creation of volume stripe has been successful

Start the new volume:

[root@gluster1 ~]# gluster volume start largefiles

Now this volume can be mounted on your webserver:

[root@webserver ~]# mkdir /mnt/largefiles && mount -t glusterfs gluster1:/largefiles /mnt/largefiles

The great thing about this setup is that it can co-exist with other volumes and volume types.  Your nodes will use the same amount of available space for both distributed and striped volumes, so you don’t have to worry about resizing.  Just remember to place really large files (greater than 18GB for our example) in /mnt/largefiles.  This will automatically distribute the large file in blocks across 5 storage nodes, and you will still have enough space on each gluster node for smaller files.  If one GlusterFS node was to go offline, you would lose access to large files, since the entire file is stored in pieces on all GlusterFS nodes:

This setup can be used for storing raw videos in /mnt/largefiles and FFMpeg encoded versions on /mnt/glusterfs. For example, original 50GB file can be stored in block format on /mnt/largefiles.  While FLV, x264, Divx, WMV, and AVI versions can be stored on different GlusterFS nodes under /mnt/glusterfs.  It is a great setup if you have an encoding server, a webserver, and storage nodes all accessing same information.  Whether you are into video on demand, live streaming, or file sharing, GlusterFS can be a viable solution for a distributed network file storage.