Planning and Communicating Your Cluster Design When creating a new Amazon Web Services (AWS) hadoop cluster it is overwhelming for most people to put together a configuration plan or topology. Below is a Hadoop reference architecture template I’ve built that can be filled in that addresses the key aspects of planning, building, configuring, and communicating… Continue reading Fillable Hadoop reference architecture template for AWS clusters
Category: Technologies
Content on Knowledge Discovery in Databases (KDD), analytics, decision support, or data mining ranging from the user-approachable to the technically focused.
Double your effective IO on AWS EBS-backed volumes
NOTE: This content is for archive purposes only. With generation 4+ EBS volumes big data IO performance no longer requires volume prewarming. Fresh Elastic Block Storage volumes have first-write overhead At my employer I architect Big Data hybrid cloud platforms for global audience that have to be FAST. In our cluster provisioning I find we frequently… Continue reading Double your effective IO on AWS EBS-backed volumes
How to recover a corrupt HDFS namenode
Scenario 1: There was data, the logs say Namenode not formatted, the dfs.data.dir (check your config to see where it is) is empty Cause: The data was emptied out of your namenode directory. Things to try (in order): FSCK (see scenario 2 below) recover the namenode hadoop namenode start -recover If the output says some… Continue reading How to recover a corrupt HDFS namenode