An interesting federation whitepaper recently hit my inbox and the subject line grabbed my attention immediately. It was a whitepaper called “Virtualizing Hadoop in Large-Scale Infrastructure”. It was written in conjunction with our customer Adobe Systems out of Utah and it was right in line with some of the comments I’ve been making recently with customers and at conferences/VMUGs. This paper further convinces me that Hadoop and Object based Storage (think Amazon s3 storage) is the future as it relates to storage consumption. If you are in IT, and you plan to be in IT for the next 3 or more years, now would be a great time to start getting up to speed on Hadoop as well as Object Based storage. While I certainly think Block and File based storage will be around for a long time, it appears the new “cool kids” on the block is HDFS and Object.
Anyway, back to this whitepaper. The focus of the white paper is around Adobe’s IT department wanting to be more agile and responsive to the business needs. They specifically called out as a key objective to: “Build a virtualized HDaaS environment to deliver analytics through a self-service catalog to internal Adobe customers”. They wanted to utilize their Cisco UCS Blades, EMC VNX and EMC Isilon (Isilon was used for the Hadoop Storage – more on that in a future blog post) as well as VMware’s “Big Data Extensions” (BDE). In addition Adobe is convinced (as am I) that the companies can gain significant competitive advantage mining the vast amount of information they collect on their site. To the tune of over 8PB (PETABYTE !!!). THAT’S CRAZY!. This is mostly collected from site vists and web traffic, and then it’s tied back to revenue. It’s just one of the examples they used in the document.
The whitepaper outlines some of the key objectives of a HDaaS offering as well as their sincere desire to figure out what the possible performance consequences were of virtualizing and then scaling Hadoop. It also points out some of the lessons learned, or what I like to call “banging your head against the wall” issues. When looking through the whitepaper it’s clear that memory settings were really important. The paper does an excellent job of also sharing the various other whitepapers and documents used for guidance and recommendations.
So if you are interested in learning more about Hadoop, or even have already started down the process of implementing Hadoop, take a moment to read through this customer whitepaper. If nothing more, you might get some ideas on how you might be able to utilize Hadoop in your environment.