Planning and Communicating Your Cluster Design
When creating a new Amazon Web Services (AWS) hadoop cluster it is overwhelming for most people to put together a configuration plan or topology.
I’ve done this many times and as part of my focus on tools and templates thought I’d add a template you can use as a basic guideline for planning your Cloudera big data cluster. The template includes configurations for:
- instance basics
- instance list
- operating system
- CDH version
- the cluster topology
- metastore detail for hive, YARN, hue, impala, sqoop, oozie, and Cloudera Manager
- resource management
- and additional detail for custom service descriptors (CSD) for Storm and Redis
No Warranty Expressed or Implied
It’s not meant to be exhaustive as there are many items not covered (AWS security groups, network optimization, dockerization, continuous integration, monitors, etc.) but it is an example of a real-world cluster in AWS (details of instance and AZ changed for security).
Cloudera hadoop cluster configuration template for Amazon Web Services (AWS)
Please feel free to let me know how it works for you and if you have any improvements for it.