Setup a NAS on AWS
As part of Metal Toad’s Managed Services Team, I have worked extensively in custom cloud based solutions. There is no doubt Amazon Web Services (AWS) leads the pack in Cloud hosting, but when it comes to a Network Attached Storage (NAS) solution, I have found a shortfall...a fast solution.
Simple Storage Service (S3) is AWS’s file solution to NAS, and it is a great system, but without refactoring our code to serve the images directly from S3, we are left with using the S3 Fuse Driver which responds slowly when serving content.
One could also argue for the use of Elastic Block Storage (EBS), and it solves the issue of slowness when using S3. But EBS is unable to be mounted on multiple EC2 machines, essentially defeating the purpose (NAS needs to be shared).
When searching for a solution, many “How To’s” recommend a program called GlusterFS as a NAS storage solution on AWS. It is a software package that is fast and replicates data between machines. However, I am never satisfied taking other people’s word for “the best solution”. So, I decided to test Gluster against some of the other NAS options offered for AWS.
- GlusterFS is an open source distributed file system. It ties together 2 or more machines to create a virtual RAID of drives.
- GlusterFS is easy to setup, easy to use, very versatile, and it’s open source!! (so it’s free to install)
- With larger file systems it can be resource intensive, particularly in memory.
- 2 m1.large are setup with 60GB EBS volumes that have Gluster running on them.
- Zadara Storage is infamous for purchasing facilities next to Amazon in the US-East and US-West-1 regions. They are connected to the AWS facility with multiple 10GB fiber channels. Zadara Storage allows you to rent dedicated hard drives and controllers, which then require you to configure these drives with a NAS like configuration tool into RAID arrays and volumes.
- You don’t need to worry about calculating IOPS on EBS since you know exactly which type of drives you are getting.
- Great customer support and an on-boarding process that shows you how to best use their tools.
- The initial setup is a little tricky, but it is mitigated by their great on-boarding process.
- Additions aren’t made it real-time. It generates a request that requires approval (this is usually done in a few minutes).
- (1) m1large EC2 instance mounting a RAID 1 built out of 15k SAS drives.
- SoftNAS is a lot like your traditional NAS controller. The only difference is that instead of hard drives it uses EBS volumes.
- Very easy to setup and configure.
- The ability to build and replicate RAIDs out of EBS is versatile since you can reserve IOPS for EBS to get better speeds.
- High availability is harder to setup with these controllers than with Gluster or Zadara.
- (1) m1.large EC2 host and (1) m1.small SoftNAS machine with a RAID 1 made from (2) 20GB EBS volumes.
The tool I decided to used for the test is called IOZone. It is an older tool with many positive reviews on multiple many forums. IOZone allows you to test different file sizes, record lengths, and io options. The IO options range from read, write, random read/ random write, backwards read, and lots more.
I wanted to see the whole gambit of tests so I ran iozone -A. The ‘-A’ is a flag that gives a wide range of record lengths and files sizes. We ran the test for files up to 512MB but the data in the graphs is capped at 8MB because 99.9% of our files are smaller than 8MB.
The IOZone definition of tests:
Write: This test measures the performance of writing a new file. When a new file is written not only does the data need to be stored but also the overhead information for keeping track of where the data is located on the storage media. This overhead is called the “metadata” It consists of the directory information, the space allocation and any other data associated with a file that is not part of the data contained in the file. It is normal for the initial write performance to be lower than the performance of re-writing a file due to this overhead information.
Re-write: This test measures the performance of writing a file that already exists. When a file is written that already exists the work required is less as the metadata already exists. It is normal for the rewrite performance to be higher than the performance of writing a new file.
Read: This test measures the performance of reading an existing file.
Re-Read: This test measures the performance of reading a file that was recently read. It is normal for the performance to be higher as the operating system generally maintains a cache of the data for files that were recently read. This cache can be used to satisfy reads and improves the performance.
Random Read: This test measures the performance of reading a file with accesses being made to random locations within the file. The performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.
Random Write: This test measures the performance of writing a file with accesses being made to random locations within the file. Again the performance of a system under this type of activity can be impacted by several factors such as: Size of operating system’s cache, number of disks, seek latencies, and others.
Below are the straight up comparisons of speeds for the tests we’re interested in.
Here we see that Zadara is better at Write Speeds on smaller files, though they even- off as they get larger. Winner: Zadara.
For Random Writes we see all of the solutions perform better as the record length gets bigger but Gluster clearly outperforms all of them. Winner Gluster.
Rewrite is a much closer write then the random or straight write test. I would call this a tie between Gluster and Zadara.
Read tests: Zadara performed much slower with small record lengths and once the file got larger then a 1MB Gluster started slowing down. Winner: SoftNAS.
Random read: all clients performed equally well. Winner: Tie.
Reread is very close once as the record length grows, but the for smaller records lengths SoftNAS has better record lengths.
Since SoftNASperformed marginally better at reads and Zadara was better at writes I wanted to get a better look to see the differences between them.
The above graph shows the difference between Zadara and SoftNAS. Using the Zadara speed, we subtract the SoftNAS speed, allowing us to see the difference between the speeds. The results will then show a negative number when SoftNAS is faster and positive when Zadara is faster.
As you can see, the write is slightly better towards Zadara but the reads are extremely fast for Soft NAS. Our primary use for the NAS would be to server static files to various web servers, so read speed is a big plus.
The IOZone ran files sizes up to 512MB but I couldn’t get GlusterFS to finish without the NFS mount hanging.
As I mentioned above, the graphs stopped at 8MB because of what we used the servers for, but when the file size made it to 128MB the Write speed for Zadara greatly out- performed both SoftNAS and GlLuster.
Gluster is best for quick setup and reduced cost since it can run on the same instance of its server, though it can be bogged down if the server load gets too high.
Zadara is great if you need the added security of having dedicated drives, or if you have a right intensive application.
SoftNAS works best at serving lots of small files quickly.
Pick the one thats right for you.