A topology for a big data production environment

I’ve attached an excel file for a full-featured Big Data (hadoop) Production topology with a good starting place for an architecture that supports full Lambda architecture (streaming for seconds-old recency, batch for heavy lifting, and services to logically merge the two on demand).  The cluster is composed of 21 AWS instances with EBS backing.  The HDFS layer… Continue reading A topology for a big data production environment

Published
Categorized as Templates

Example Big Data dev cluster topology

Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers.  It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose.  We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data.… Continue reading Example Big Data dev cluster topology

Approachable Data Mining Tutorials for the Non Data Miner

A list of several sources to learn data science in a hands-on format https://www.coursera.org/course/ml – The most approachable machine learning course available. And it’s free. https://www.kaggle.com/wiki/Tutorials – Provides data sources, forums, scenarios, and real-world competitions to teach data mining http://deeplearning.net/tutorial/ – Tutorial on Deep Learning – introduction to machine learning image analysis algorithms http://tryr.codeschool.com/ – Interactive introduction to… Continue reading Approachable Data Mining Tutorials for the Non Data Miner