I’ve started a meetup for local professionals in the decision science field around the Tampa Bay area to come together and learn about what’s happening in our area. If you are a data science professional, come join us and be a part of making the Tampa-St. Petersburg metro area the southeast center of excellence in big data and analytics. Visit http://www.meetup.com/Analytics-Professionals-of-Tampa/ to find events and to join. I hope to see you there.
Big Data Analytics in a Secure Environment
This 8-part article will outline the elements of using Big Data technologies for the analysis of classified information. The topic will be divided to address each facet of big data analysis of classified information:
Part 1 – applied big data architecture
Part 2 – information flow
Part 3 – organizational alignment
Part 4 – roles and responsibilities
Part 5 – principal phases and earned value benchmarks
Part 6 – Data fusion
Part 7 – Knowledge creation
Part 8 – Visualization
Part 9 – Summary and review
The challenges of multi-level secure operating systems have been undertaken by several companies with arguably the largest being SUN’s Trusted Solaris 2.5.1, which is based on Solaris 2.5.1, Common Desktop Environment 1.1, and Solstice AdminSuite 2.1. The ITSEC certification granted by the UK is not presently accepted by NSA and so does not serve as a pre-built secure OS capability. General Dynamics C4S has also built a capability based on a Linux OS that does not mandate a SPARC architecture, making it more friendly to open-source platforms. These initiatives are generating the potential for data fusion, real-time analytics, and predictive analytics across gov, NIPR, SIPR, JWICS, and coalition networks. The architecture in practice is non-trivial but a generalized TOGAF Technical Reference Model based on Linux and open-source HADOOP, Mahout, openNLP, Cassandra, Hive, and PIG is now possible to construct.
The TOGAF Architectural Model
A reference architecture is useful as a starting point for building an enterprise-specific architecture and is useful to reduce the risk that any design facet is skipped. The Open Group Architecture Framework (TOGAF) is one of the more widely-adopted and is the concept upon which many domain-specific architectural standards are built. Per TOGAF, “The TOGAF Foundation Architecture is an architecture of generic services and functions that provides a foundation on which more specific architectures and architectural components can be built. This Foundation Architecture is embodied within the Technical Reference Model (TRM), which provides a model and taxonomy of generic platform services. The TRM is universally applicable and, therefore, can be used to build any system architecture.” 1
At its most fundamental, TOGAF is broken into Application Software, Application Platform, and Communications Infrastructure connected by Applications Platform Interfaces and Communications Infrastructure Interfaces as depicted in Figure 1. This construct provides a structure for top-down planning of service catalog elements and pre-positions for follow-on plans for ITIL Service Catalog construction. Service elements connect infrastructure to applications and are used to further visualize dependence.
Mapping the TRM to Open Source Big Data Technologies
Open Source Software (OSS) can be part of a cost-effective long-range strategy for many organizations. The US Government’s CIO in 2003 and again in 2009 declared that open source technologies should be considered closely when electing technologies and clarified the misconception that government-created versions of these technologies must be openly distributable to the public. Since these declarations the Apache Foundation technologies have figured highly in the US Government’s strategic portfolio, especially within the big data and analytics domain. Widely adopted platforms with security accreditation include Hadoop, Mahout, openNLP, Hive, Pig, Cassandra, SOLR, Lucene, the Apache Web Server, and many others. A general mapping of these technologies against the target big data architecture along with the capabilities of a secure operating system indicate complete coverage of core, non-specialized capabilities.
A robust open source portfolio for the analysis of classified information in this design includes capabilities for structured data analysis, unstructured data analysis, knowledge discovery, and complete multi-level classification and caveat isolation includes:
- Secure OS – The secure operating system. Examples are a multi-level secure Linux or multi-level secure Solaris.
- Router – The multi-level secure router. This provides TCP packet extensions and routing based on security classification markings.
- Apache – The Apache Web Server, which provides HTML and other rendering services.
- HDFS – Hadoop File System is the persistence (storage) structure that allows Hadoop to distribute data and operate on it.
- Hadoop – Scalable Massive Parallel Processing architecture for distributed and scalable computing applications.
- Mahout – A machine learning platform implemented on Hadoop for classification and prediction of discrete and continuous data.
- openNLP – A natural language processing platform on Hadoop for unstructured text analysis (sentence marking, tokenizing, part-of-speech extraction, entity extraction, etc.)
- SOLR – Opensource Apache search platform built on Lucene.
- Lucene – Full-text indexer and search engine. Lucene will accept outputs from Mahout and openNLP models to aid searching results of analysis.
- Hive – Apache Hive is a data warehouse infrastructure that may be used for content storage, retrieval, indexing, and other core DBMS functions.
- HiveSQL – SQL-like Hive DDL/DML, non ANSI-92 compliant used to warehouse massive parallel datasets and operate upon them.
In the next part in the series we will look at the logical architecture and explore communication sequences within a few common scenarios.
- 1 The Open Group Foundation, “43. Foundation Architecture: Technical Reference Model”, http://pubs.opengroup.org/architecture/togaf9-doc/arch/chap43.html. Retrieved 29 May 2013.
When you think of Special Operations Forces you think of the hard men that stormed the Osama Bin Ladin compound in the middle of the night, successfully delivering Justice and Honor. You do not think of tall thin kid, barely out of college with a European man-bag, converse shoes drinking a vanilla latte as the next warrior against the enemies of freedom.
Special Operations has always looked to gain the advantage in every action, seeking especially adept groups as seeking out competitive advantage. Too often these groups focus on the bleeding edge of operations and are often scarce resources used for a limited purpose.
In a situation that is not unique to SOF, there is a condition where the supportive functions of the organization do not benefit from the same attention the primary mission holders receive. While this is to be expected, organizations also need to ensure that the supporting elements’ business systems and processes are improved over time to avoid organizational drag.
In essence the lack of proliferation of qualified data scientists in all levels of the organization result in a lack of consistent business practices and a myopic focus in isolated business areas severely limits the value big data and analytics can bring to SOF. What is needed is a set of practices and processes that are repeatable, can be expanded upon and easily translated across organizational boundaries. The potential for subordinate units being able to leverage Headquarters practices and resources thereby lowering the barriers successful analytics utilization is an ability not yet realized for most commands.
In fact, many consultants in this space will assert that commoditization is not possible within the discipline of BI/BA as every problem is different and that it takes different skills and approaches to solve the identified problems. This is a fallacy and is a stance usually designed to prolong consulting engagements and profitability.
It is a simple fact that much of the technology needed to develop an analytics program are already in existence within the organizations desiring analytics capability. There are benefits to purchasing scalable distributed storage solutions supporting big data applications; however these need to be balanced against the benefits of license optimization within the current infrastructure. Seldom is scalability a driving issue in COCOMS the way it is for other industries such as banking. The data are simply not that large.
Eventually we will begin to learn to utilize the additional deluge of data off our sensor platforms necessitating the need for a scalable infrastructure however the practice of working with the data must come first. Most likely, big data sets that are available in DoD will be more focused on efficiencies and utilization (performance management) rather than finding a bad guy. In fact, much of the data that fits the big data profile will be platform specific data that has little to do with SOF’s 8 primary mission areas.
So what will DoD organizations as the Combatant Command and subordinate organizations need to change to take advantage of this emergent approach to competitive advantage? SOF only needs to do what they have always done—operate outside their comfort zone:
- Realize that the contracting groups that are most likely to assist in this field will not come from their old ops buddies. The groups that will bring this success will have little or no knowledge of SOF Missions. They will have a deep knowledge of data, statistical analysis and presentation.
- Look to develop a set of business practices and policies that support decision making for the command that can be shared with subordinate units.
- Question Solutions. Look critically at the offerings within the community. Many organizations are trying to sell applications and hardware as bundled sets. Analyze the benefits of these platforms and what capability it will bring. Most organizations running a Microsoft infrastructure already have all the tools they need to develop an analytics capability.
- Focus on the practice. Build a framework and integrate the capability into every J-Code/staff section. Hire the personnel that can train and guide Command staff asking the questions that will lead to analytics solutions.
- Focus on the data. The practice of working with data has academically been reserved for a small group of science majors and professionals. As the data sets expand, staff members can assist the command in being mindful of the importance of all data and ensure that the organizations information is properly constructed and cared for.
- Knowledge Management. Knowledge Management offers a unique position for developing a global analytics solution due to the scope of their reach within CCMD’s. Though underutilized now, KM’s will mature into the focal point for future analytics operations, as keepers of the index.
There are plenty of opportunities for SOF warriors to squeeze more out of their data and current systems. The habit of consistently reaching outside existing comfort zones is a hallmark of profession. What SOF needs is a practice and a framework that can be shared and grown and a vehicle to deliver the tools needed by the new generation of leaders and operations specialists. The nondescript, European man-bag-carrying warrior will be on point in our unconventional war against our enemies with enhanced, analytics-driven information as a key weapon in her arsenal.