Tag Archives: policies

Big Data Technology Strategy: Is Hadoop Already Outdated?


Is Hadoop Already Outdated?

Logical architecture of the Hadoop stack
The Hadoop Ecosystem

An article posted to Information Age 18 February, 2013 Teradata CTO Stephen Brobst highlights the schism that has overtaken traditional Decision Support and the new-age Big Data camp, noting at a recent Stanford University very-large-database conference “The Hadoop guys were saying, ‘relational databases are dead, SQL programming is for dinosaurs, long live the new kings Hadoop and MapReduce.'”  (Swabey, 2013 ).  The inclusion of the Hadoop platform by name and the technology’s rapid ascendancy is striking in its proliferation progressing from initial release to core services in multinational platforms in less than six years (Hadoop Releases, 2013), yet it represents the lion’s share of the commercial Big Data marketplace.  Fanatical zeal aside, should it be the sole platform for knowledge management and creation?

Much is made of the dimensions by which we assign special treatment to “Big Data”.   These facets are known popularly as “The Three V’s”, which are defined by Gartner as “high-volume, high-velocity and high-variety information assets”.  Additional V’s are sometimes added to suit the audience as necessary including Veracity (What is big data?, 2013), Variability, and Value (Fan, 2013).  In the December 2013 issue of the ACM SIGKDD, Wei Fan and Albert Bifet explore the current and future state of Big Data.  They allude to signals that the technology adoption has overshot the technical ecosystem’s ability to give it proper perspective providing seven factors they consider to be controversial (Fan, 2012):

  •  There is no need to distinguish Big Data analytics from data analytics, as data will continue growing, and it will never be small again […]
  •  Big Data may be a hype to sell Hadoop based computing systems. Hadoop is not always the best tool […]
  • In real time analytics, data may be changing. In that case, what it is important is not the size of the data, it is its recency […]
  • Claims to accuracy are misleading […]
  • Bigger data are not always better data.  It depends if the data is noisy or not, and if it is representative of what we are looking for […]
  • …[Is it] ethical that people can be analyzed without knowing it […]
  • Limited access to Big Data creates new digital divides […]

Further supporting Fan and Bifet’s arguments, Stephen Brobst notes, “A lot of people are talking about the ‘velocity of big data’ but if that just means that data values are updating quickly, it’s nothing new.  What’s new is the velocity of change in the structure of data.” (Swabey, 2013).

Google (noticeably silent in the Big Data marketplace) abandoned the batch processing approach underlying Hadoop in favor of a real-time, service-based processing architecture originally called Dremel and outlined in a paper from Google research (Melnik, 3010).  Google’s BigQuery cloud service, used extensively at Google internally, takes a differing tack that “builds on ideas from web search and parallel DBMSs”—core competencies for the company.  In a January 2013 consortium organized by IBM and Arizona State University, Dr. K. Selcuk Candan (Candan, 2013) highlights six key outcomes which may be summarized as a need for better data fusion, data analysis algorithms, data models, scalable architectures, and real-time analysis.  While several vendors are visibly out front with custom Hadoop builds for real-time analysis, two non-Hadoop projects, S4 in the Apache Incubator and the production-ready Storm (http://storm-project.net/) show promise a general-purpose parallel computing engines.

While Apache Hadoop project has staged an impressive entrance, broken through the Relational and OLAP paradigms, and shown the viability of open source software, I intend to keep an eye on the companies that have avoided the hype such as Google (Regalado, 2013) and observe as the market polarizes into real-time analysis and those who never needed it.



Candan, K. Selcuk. (2013, June 25). Hunting for the Value Gaps in Data Management, Services, and Analytics.  Retrieved from http://wp.sigmod.org/?p=904 .

Fan, Wei, and Albert Bifet, Mining Big Data: Current Status, and Forecast to the Future, December 2014, Vol. 4, Issue 2.  Downloaded from http://www.kdd.org/sites/default/files/issues/14-2-2012-12/V14-02-01-Fan.pdf

Gilyadov, Camuel. (2013, July 2). OpenDremel: Google BigQuery / Dremel implementation.  Retrieved from http://bigdatacraft.com/opendremel

Hadoop Releases. (2013, June 14). Retrieved from http://hadoop.apache.org/releases.html

Melnik, Sergey, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis (2010). “Dremel: Interactive Analysis of Web-Scale Datasets”. Proc. of the 36th International Conference on Very Large Data Bases (VLDB).

Regalado, Antonio. (2013, June 11).  Just Don’t Call it Big Data: Why Google fears the totalitarian connotations of the buzzword big data.  Retrieved from http://www.technologyreview.com/view/515941/just-dont-call-it-big-data/

Swabey, Pete. (2013, February 18).  Teradata seeks compromise in the big data Holy Wars.  Retrieved from http://www.information-age.com/technology/information-management/123456802/teradata-seeks-compromise-in-the-big-data-holy-wars-

What is big data? (n.d.). Retrieved from http://www-01.ibm.com/software/data/bigdata/


SOFs and Big Data – A Not a Cultural Shift

NOTE: This is a repost by permission of an article by Mr. Richard Marshall. Mr. Marshall provides big data and analytics capabilities to the Special Operations community through his company, Blackstorm International. His website is http://blackstorm-int.com.

SOF Warriors


When you think of Special Operations Forces you think of the hard men that stormed the Osama Bin Ladin compound in the middle of the night, successfully delivering Justice and Honor. You do not think of tall thin kid, barely out of college with a European man-bag, converse shoes drinking a vanilla latte as the next warrior against the enemies of freedom.

Special Operations has always looked to gain the advantage in every action, seeking especially adept groups as seeking out competitive advantage. Too often these groups focus on the bleeding edge of operations and are often scarce resources used for a limited purpose.

In a situation that is not unique to SOF, there is a condition where the supportive functions of the organization do not benefit from the same attention the primary mission holders receive. While this is to be expected, organizations also need to ensure that the supporting elements’ business systems and processes are improved over time to avoid organizational drag.

In essence the lack of proliferation of qualified data scientists in all levels of the organization result in a lack of consistent business practices and a myopic focus in isolated business areas severely limits the value big data and analytics can bring to SOF.  What is needed is a set of practices and processes that are repeatable, can be expanded upon and easily translated across organizational boundaries. The potential for subordinate units being able to leverage Headquarters practices and resources thereby lowering the barriers successful analytics utilization is an ability not yet realized for most commands.

In fact, many consultants in this space will assert that commoditization is not possible within the discipline of BI/BA as every problem is different and that it takes different skills and approaches to solve the identified problems. This is a fallacy and is a stance usually designed to prolong consulting engagements and profitability.

It is a simple fact that much of the technology needed to develop an analytics program are already in existence within the organizations desiring analytics capability. There are benefits to purchasing scalable distributed storage solutions supporting big data applications; however these need to be balanced against the benefits of license optimization within the current infrastructure. Seldom is scalability a driving issue in COCOMS the way it is for other industries such as banking. The data are simply not that large.

Eventually we will begin to learn to utilize the additional deluge of data off our sensor platforms necessitating the need for a scalable infrastructure however the practice of working with the data must come first. Most likely, big data sets that are available in DoD will be more focused on efficiencies and utilization (performance management) rather than finding a bad guy. In fact, much of the data that fits the big data profile will be platform specific data that has little to do with SOF’s 8 primary mission areas.

So what will DoD organizations as the Combatant Command and subordinate organizations need to change to take advantage of this emergent approach to competitive advantage? SOF only needs to do what they have always done—operate outside their comfort zone:

  1. Realize that the contracting groups that are most likely to assist in this field will not come from their old ops buddies. The groups that will bring this success will have little or no knowledge of SOF Missions. They will have a deep knowledge of data, statistical analysis and presentation.
  2. Look to develop a set of business practices and policies that support decision making for the command that can be shared with subordinate units.
  3. Question Solutions. Look critically at the offerings within the community. Many organizations are trying to sell applications and hardware as bundled sets. Analyze the benefits of these platforms and what capability it will bring. Most organizations running a Microsoft infrastructure already have all the tools they need to develop an analytics capability.
  4. Focus on the practice. Build a framework and integrate the capability into every J-Code/staff section. Hire the personnel that can train and guide Command staff asking the questions that will lead to analytics solutions.
  5. Focus on the data. The practice of working with data has academically been reserved for a small group of science majors and professionals. As the data sets expand, staff members can assist the command in being mindful of the importance of all data and ensure that the organizations information is properly constructed and cared for.
  6. Knowledge Management. Knowledge Management offers a unique position for developing a global analytics solution due to the scope of their reach within CCMD’s. Though underutilized now, KM’s will mature into the focal point for future analytics operations, as keepers of the index.

There are plenty of opportunities for SOF warriors to squeeze more out of their data and current systems. The habit of consistently reaching outside existing comfort zones is a hallmark of profession. What SOF needs is a practice and a framework that can be shared and grown and a vehicle to deliver the tools needed by the new generation of leaders and operations specialists.  The nondescript, European man-bag-carrying warrior will be on point in our unconventional war against our enemies with enhanced, analytics-driven information as a key weapon in her arsenal.