How to Implement Next-Generation Storage Infrastructure for Big Data

16.04.2012

"When you go to put it back together, you don't have to have every single piece," says Russ Kennedy, vice president of product strategy, marketing and customer solutions for Cleversafe. "The number of pieces you generate, we call that the width. We call the minimum number you need to put it back together the threshold. The difference between the number of pieces you create and the minimum number required to put it back together is what determines its reliability. Simultaneously, you can lose nodes and drives, and you can still get the data back in its original form. The highest amount of reliability you can get with RAID is dual parity. You can lose two drives. That's it. With our solution, you can lose up to six."

Erasure codes are also a software-based technology, meaning it can be used with commodity hardware, bringing down the cost of scaling even more.

"Having identified the right technology, we went and looked at a number of different vendors who were providing solutions in that space," Day says. "We looked at building it ourselves. But we felt that if we could find a company that was a pretty close match to our requirements, with a system that was reasonably proven, that would be a much better approach for us."

Shutterfly brought four vendors to its lab for evaluation and built prototypes of the storage device it wanted for its data center. Day says he was looking for performance, availability, fault tolerance and manageability.