When creating a new Amazon Web Services (AWS) hadoop cluster it is overwhelming for most people to put together a configuration plan or topology.
I’ve done this many times and as part of my focus on tools and templates thought I’d add a template you can use as a basic guideline for planning your Cloudera big data cluster. The template includes configurations for:
the cluster topology
metastore detail for hive, YARN, hue, impala, sqoop, oozie, and Cloudera Manager
and additional detail for custom service descriptors (CSD) for Storm and Redis
No Warranty Expressed or Implied
It’s not meant to be exhaustive as there are many items not covered (AWS security groups, network optimization, dockerization, continuous integration, monitors, etc.) but it is an example of a real-world cluster in AWS (details of instance and AZ changed for security).
Cloudera hadoop cluster configuration template for Amazon Web Services (AWS)
Fresh Elastic Block Storage volumes have first-write overhead
At my employer I architect Big Data hybrid cloud platforms for global audience that have to be FAST. In our cluster provisioning I find we frequently overlook doing an initial write across our volumes to reduce write time during production compute workloads (called pre-warming the EBS volumes). Per Amazon (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewarm.html) failure to pre-warm EBS volumes incurs a 5-50% loss in effective IOPS. Worst case that means you could DOUBLE the IO portion of your HDFS writes until each sector has been touched by the kernel. Amazon asserts that this performance loss, amortized over the life of a disk, is inconsequential to most applications. For one of our current clusters we have a portion with 8-1TB drives in each of 10 compute nodes as a baseline. Our estimated pre-warm time is 30 hours on each mount point so if done sequentially that’s 2,400 hours to touch each drive block.
What does this imply? Without pre-warming we would have added as much as 2,400 additional hours of write latency during initial HDFS writes and that latency could appear in many different places in the stack (HDFS direct writes, Hive postgresql/mysql metadata writes and management, log writes, etc.)
Steps to optimize your EBS writes
Read the AWS document above carefully as it will ERASE EVERYTHING ON THE DISK if you use the first method in their article. The steps below will execute this safely on disks with existing content.
To pre-warm the drives on your cluster:
stop the cluster services
ssh into each server
execute lsblk and note the mount points (they likely start from /dev/xvdf and go down from there increasing the letter at the end, such as /dev/xvdg, /dev/xvdh, etc.)
unmount each one at a time with sudo umount /ONEMOUNTPOINT
Continue until all mount points are unmounted, meaning there’s nothing shown after the ‘disk’ column as below:
CAUTION: DO NOT DO THE FOLLOWING ON A MOUNTED DISK AND MAKE SURE YOU USE THE SAME MOUNT FOR BOTH if AND of
execute the following, changing the if= and of= to the same mount pointsudo dd if=/YOURMOUNTPOINT of=/YOURSAMEMOUNTPOINT conv=notrunc bs=1MExample: sudo dd if=/dev/xvdf of=/dev/xvdf conv=notrunc bs=1M
Wait. It’ll be a few minutes for a 32GB drive as shown in the Amazon write-up above or 1 day+ for a 1TB drive.
After ALL the processes on the server complete, reboot the server
If you’d like to check the process or if your ssh session has expired and you want to ensure you’re still warming execute ps aux|grep YOURMOUNTPOINT , example: ps aux |grep /dev/xvdf
A far better approach, of course, would be to automate this as part of your cluster deployment process using Chef or equivalent infrastructure automation tool.
Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers. It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose. We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data. It’s not sufficient for a production cluster but is a good starting point for a development group. The total cost of this cluster as configured (less storage) is under $6/hour.