Category Archives: Templates

Working templates for business intelligence/decision support, collaborative decision making, IT operations, IT strategy, analytics, big data, machine learning, and natural language processing.

AWS configuration and topology template for Cloudera Hadoop

Planning and Communicating Your Cluster Design

When creating a new Amazon Web Services (AWS) hadoop cluster it is overwhelming for most people to put together a configuration plan or topology.

I’ve done this many times and as part of my focus on tools and templates thought I’d add a template you can use as a basic guideline for planning your Cloudera big data cluster.  The template includes configurations for:

  • instance basics
  • instance list
  • storage
  • operating system
  • CDH version
  • the cluster topology
  • metastore detail for hive, YARN, hue, impala, sqoop, oozie, and Cloudera Manager
  • high-availability
  • resource management
  • and additional detail for custom service descriptors (CSD) for Storm and Redis

No Warranty Expressed or Implied

It’s not meant to be exhaustive as there are many items not covered (AWS security groups, network optimization, dockerization, continuous integration, monitors, etc.) but it is an example of a real-world cluster in AWS (details of instance and AZ changed for security).

Screenshot of the roles and services in the big data design template
Example list of EC2 instances for the cluster plan

Cloudera hadoop cluster configuration template for Amazon Web Services (AWS)

AWS_topology_template

Please feel free to let me know how it works for you and if you have any improvements for it.