2010-07-08

Flash Harry

(USB flash drive)I'm a big fan of RAID: lashing together more than one disk drive to provide redundancy so that an important computer can still be used when one disk fails (I'm confident that the few people likely to read this already know that it's a question of "when", not "if"). Occasionally I have cause to throw together an impromptu array from USB flash drives, either because I want to test something or because I need a little temporary space and don't have a spare external hard disk to hand. Today I started to do that because I'm using a laptop with a failed hard disk and I wanted some space for pkgsrc.

First I tested the read and write speeds of two 4 Gbyte USB flash drives. Each yielded about 12 Mbytes/sec write speed and about 18 Mbytes/sec read. I then used raidframe (NetBSD's software RAID facility) to write a stripe across both flash drives. This is RAID level 0, so it doesn't provide any redundancy but it's a convenient way to stitch two drives together and usually provides a slight performance improvement. When I tested the speed of the array, I was surprised by the results. The read speed improved to about 24 Mbytes/sec, which was about what I expected. Writes however had slowed to about 3 Mbytes/sec, which would have been slow enough to impact the work I had in mind.

I've used raidframe enough that I doubted that was the cause, but I wanted to rule it out as the culprit. I tore down the raidframe array and used some older, more simple software called ccd (for concatenated disk) to stitch the USB flash drives together. When configuring ccd, I had the option of interleaving data (writing alternately to one drive and then the other) or simple concatenation (switching drives half way through). I started with the interleave because on paper that helps with speed but it yielded basically identical results to raidframe (3 Mbyte/sec writes). Switching to simple concatenation gave me what I was after: performance similar to the individual drives, but with the convenience of a single block device to write a filesystem on.

I suspect the problem is in hardware and is some sort of "log jam" that forms when I throw more data at the laptop's USB ports than most people would. Whether or not that's true, I'm glad I took a moment to test the speed of the array before I started work.

2 comments:

Stuart Northcott said...

I've always been a fan of striping to improve speed, and in my current configuration I do just that, however the next rebuild of Windows I perform I'm actually going to remove the RAID configuration and operate the drives separately.

I'm going to do this after reading that for real world use striping provides little performance benefit, and to be honest there's stuff I want to do that isn't easily supported in a raid configuration (booting from a vhd file, for example).

As far as mirroring goes I completely agree if you need the failed machine to remain available. As it stands I have a fairly robust backup schedule now (after loosing everything when my Ubuntu installation died) to a Windows Home Server box which takes care of this (and surprisingly manages data duplication and the storage of identical files from multiple machines) and performs other duties. I can take the downtime hit at the cost of halving my capacity.

Another reason I go for his solution is that the only time I've been hit by a virus in the last 5 years it decided to wipe my discs - a mirrored raid would not protect against this (I also lost a few files on USB drives before I realised what was happening and flipped the switch)

I guess my point (probably not aimed at you) is don't just rely on a raid mirror to keep a backup of data, keep a remote copy too.

Andrew Ball said...

I agree. Sadly I have met people (including the I.T. director of a county in Indiana) who thought that RAID meant they didn't need a backup strategy. I helped him understand that wasn't the case and hopefully he's spread that around but you know there are people out there who are going to learn the hard way. :-(