NOTE: This content is for archive purposes only. With generation 4+ EBS volumes big data IO performance no longer requires volume prewarming. Fresh Elastic Block Storage volumes have first-write overhead At my employer I architect Big Data hybrid cloud platforms for global audience that have to be FAST. In our cluster provisioning I find we frequently… Continue reading Double your effective IO on AWS EBS-backed volumes
How to recover a corrupt HDFS namenode
Scenario 1: There was data, the logs say Namenode not formatted, the dfs.data.dir (check your config to see where it is) is empty Cause: The data was emptied out of your namenode directory. Things to try (in order): FSCK (see scenario 2 below) recover the namenode hadoop namenode start -recover If the output says some… Continue reading How to recover a corrupt HDFS namenode
Big Data Virtualization
Jboss enterprise has a free data virtualization (NOT server virtualization) platform called Teiid. Capabilities of this include service of data from multiple technologies (jdbc, odbc, Thrift, REST, SOAP, etc.), merging/transformation of data, fault tolerance, scalability, and other capabilities one would require of an enterprise service. This can stand in the technology portfolio as part of… Continue reading Big Data Virtualization