Example Big Data dev cluster topology

Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers.  It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose.  We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data.  It’s not sufficient for a production cluster but is a good starting point for a development group.  The total cost of this cluster as configured (less storage) is under $6/hour.

Here’s a link to this dev_topology in Excel.

 

Service Category Server1 Server2 Server3 Server4 Server5 Server6
Cloudera Mgr Cluster Mgt Alert pub Server Host mon Svc Mon Event Svr Act Mon
HDFS Infra Namenode SNN/DN/JN/HA DN DN/JN DN DN/JN
Zookeeper Infra Server Server Server
YARN Infra Node Mgr Node Mgr JobHist Node Mgr RM/NM
Redis Infra Master Slave Slave
Hive Data Hive server Metastore Hcat
Impala Data App Master Cat Svr Daemon Daemon Daemon
Storm Data Nimbus/UI Supervisor Supervisor Supervisor
Hue UI Server
Pentaho BI UI BI Server
IP ADDRESS
AWS details              
Name m3.2xlarge m3.2xlarge m3.2xlarge r3.4xlarge r3.4xlarge r3.4xlarge
vCPU 8 8 8 16 16 16
Memory (Gb) 30.0 30.0 30.0 122.0 122.0 122.0
Instance storage (Gb) SSD 2 x 80 SSD 2 x 80 SSD 2 x 80 SSD 1 x 320 SSD 1 x 320 SSD 1 x 320
I/O High High High High High High
EBS option Yes Yes Yes Yes Yes Yes