Example Big Data dev cluster topology – A | I

Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers. It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose. We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data. It’s not sufficient for a production cluster but is a good starting point for a development group. The total cost of this cluster as configured (less storage) is under $6/hour.

Here’s a link to this dev_topology in Excel.

Service	Category	Server1	Server2	Server3	Server4	Server5	Server6
Cloudera Mgr	Cluster Mgt	Alert pub	Server	Host mon	Svc Mon	Event Svr	Act Mon
HDFS	Infra	Namenode	SNN/DN/JN/HA	DN	DN/JN	DN	DN/JN
Zookeeper	Infra	Server	Server	Server
YARN	Infra		Node Mgr	Node Mgr	JobHist	Node Mgr	RM/NM
Redis	Infra		Master	Slave	Slave
Hive	Data	Hive server			Metastore	Hcat
Impala	Data		App Master	Cat Svr	Daemon	Daemon	Daemon
Storm	Data	Nimbus/UI			Supervisor	Supervisor	Supervisor
Hue	UI				Server
Pentaho BI	UI						BI Server
	IP ADDRESS
AWS details
Name		m3.2xlarge	m3.2xlarge	m3.2xlarge	r3.4xlarge	r3.4xlarge	r3.4xlarge
vCPU		8	8	8	16	16	16
Memory (Gb)		30.0	30.0	30.0	122.0	122.0	122.0
Instance storage (Gb)		SSD 2 x 80	SSD 2 x 80	SSD 2 x 80	SSD 1 x 320	SSD 1 x 320	SSD 1 x 320
I/O		High	High	High	High	High	High
EBS option		Yes	Yes	Yes	Yes	Yes	Yes