SOFs and Big Data – A Not a Cultural Shift

NOTE: This is a repost by permission of an article by Mr. Richard Marshall. Mr. Marshall provides big data and analytics capabilities to the Special Operations community through his company, Blackstorm International. His website is

SOF Warriors


When you think of Special Operations Forces you think of the hard men that stormed the Osama Bin Ladin compound in the middle of the night, successfully delivering Justice and Honor. You do not think of tall thin kid, barely out of college with a European man-bag, converse shoes drinking a vanilla latte as the next warrior against the enemies of freedom.

Special Operations has always looked to gain the advantage in every action, seeking especially adept groups as seeking out competitive advantage. Too often these groups focus on the bleeding edge of operations and are often scarce resources used for a limited purpose.

In a situation that is not unique to SOF, there is a condition where the supportive functions of the organization do not benefit from the same attention the primary mission holders receive. While this is to be expected, organizations also need to ensure that the supporting elements’ business systems and processes are improved over time to avoid organizational drag.

In essence the lack of proliferation of qualified data scientists in all levels of the organization result in a lack of consistent business practices and a myopic focus in isolated business areas severely limits the value big data and analytics can bring to SOF.  What is needed is a set of practices and processes that are repeatable, can be expanded upon and easily translated across organizational boundaries. The potential for subordinate units being able to leverage Headquarters practices and resources thereby lowering the barriers successful analytics utilization is an ability not yet realized for most commands.

In fact, many consultants in this space will assert that commoditization is not possible within the discipline of BI/BA as every problem is different and that it takes different skills and approaches to solve the identified problems. This is a fallacy and is a stance usually designed to prolong consulting engagements and profitability.

It is a simple fact that much of the technology needed to develop an analytics program are already in existence within the organizations desiring analytics capability. There are benefits to purchasing scalable distributed storage solutions supporting big data applications; however these need to be balanced against the benefits of license optimization within the current infrastructure. Seldom is scalability a driving issue in COCOMS the way it is for other industries such as banking. The data are simply not that large.

Eventually we will begin to learn to utilize the additional deluge of data off our sensor platforms necessitating the need for a scalable infrastructure however the practice of working with the data must come first. Most likely, big data sets that are available in DoD will be more focused on efficiencies and utilization (performance management) rather than finding a bad guy. In fact, much of the data that fits the big data profile will be platform specific data that has little to do with SOF’s 8 primary mission areas.

So what will DoD organizations as the Combatant Command and subordinate organizations need to change to take advantage of this emergent approach to competitive advantage? SOF only needs to do what they have always done—operate outside their comfort zone:

  1. Realize that the contracting groups that are most likely to assist in this field will not come from their old ops buddies. The groups that will bring this success will have little or no knowledge of SOF Missions. They will have a deep knowledge of data, statistical analysis and presentation.
  2. Look to develop a set of business practices and policies that support decision making for the command that can be shared with subordinate units.
  3. Question Solutions. Look critically at the offerings within the community. Many organizations are trying to sell applications and hardware as bundled sets. Analyze the benefits of these platforms and what capability it will bring. Most organizations running a Microsoft infrastructure already have all the tools they need to develop an analytics capability.
  4. Focus on the practice. Build a framework and integrate the capability into every J-Code/staff section. Hire the personnel that can train and guide Command staff asking the questions that will lead to analytics solutions.
  5. Focus on the data. The practice of working with data has academically been reserved for a small group of science majors and professionals. As the data sets expand, staff members can assist the command in being mindful of the importance of all data and ensure that the organizations information is properly constructed and cared for.
  6. Knowledge Management. Knowledge Management offers a unique position for developing a global analytics solution due to the scope of their reach within CCMD’s. Though underutilized now, KM’s will mature into the focal point for future analytics operations, as keepers of the index.

There are plenty of opportunities for SOF warriors to squeeze more out of their data and current systems. The habit of consistently reaching outside existing comfort zones is a hallmark of profession. What SOF needs is a practice and a framework that can be shared and grown and a vehicle to deliver the tools needed by the new generation of leaders and operations specialists.  The nondescript, European man-bag-carrying warrior will be on point in our unconventional war against our enemies with enhanced, analytics-driven information as a key weapon in her arsenal.

Automated Metadata Extraction for Competitive Intelligence

Artificial Intelligence for the Creation of Competitive Intelligence Tools


Often in prioritizing business development activities it is helpful to determine who is able to influence a decision and how they are related to those in the market space.  To make a defensible and actionable strategy it is useful to perform Influence Analysis and Network Analysis, which can form the kernel of a competitive intelligence analysis strategy.  The data required for analysis must be obtained by identifying and extracting target attribute values in unstructured and often very large (multi-terabyte or petabyte) data stores.  This necessitates a scalable infrastructure, distributed parallel computing capability, and fit-for-use natural language processing algorithms.  Herein I will demonstrate a target logical architecture and methodology for accomplishing the task.  Influence and Network analysis by machine learning algorithm (naïve bayes or perceptron for example) will be covered in a later supporting article.

Recognizing Significance

Named-Entity Recognition is required for unstructured content extraction in this scenario.  This identification scheme may or may not employ stemming but will always require tokenizing, part-of-speech tagging, and the acquisition of a predefined model of attribute patterns to properly recognize and extract required metadata.  A powerful platform with these built-in capabilities is the Apache openNLP project, which includes typed attribute models for the name finder, an extensible name finder algorithm, an API that exposes a Lucene index consumer, and a scalable, distributed architecture.  The Apache Stanbol project in the incubator ( shows promise at semantic-based extraction and content enhancement but hasn’t been promoted outside the incubator yet.

Apache openNLP attribute recognition models are available in only a few languages with the original and largest being English.  The community publishes models in English for the Name Finder interface for dates, location, money, organization, percentage, person, and time (date).  Each is an appropriate candidate for term extraction for competitive intelligence analysis.

Logical Architecture

Natural Language Processing for Competitive Intelligence
openNLP in four node Hadoop cluster

The controlling requirement for the task of metadata extraction from massive datasources is the processing of massive datasets to extract information.  For this Hadoop provides a flexible, fault-tolerant framework and processing model that readily supports the natural language processing needs.  The logical architecture for a small (<1TB) 4-node clustered Hadoop solution is as follows:


Process Flow

As below, the process to execute is standardized on the map/reduce patterns Distributed Task Execution, Union, Selection, and Intersection.  Pre-processing using a Graph Processing pattern in a distinctly separate map phase would likely hasten any Influence Analysis to be performed post-process.


Operations Sequence Diagram of openNLP with Map Reduce on Hadoop for Competitive Intelligence
Multi-node Sequence Diagram for openNLP with Map Reduce on Hadoop

The primary namenode initiates work and passes the data and map/reduce execution program to the task trackers, who in turn distribute it among worker nodes.  The worker nodes execute the map on HDFS-stored data, provide health and status to the task tracker, who reports it to the primary namenode.  On node map completion the primary namenode may redistribute map work to the worker node or order the reduce task, each by way of the task tracker.  The reduce task selects data from the HDFS interim resultset, aggregates, and streams to a result file.  The result file is then used later for analysis by the machine learning algorithm of choice.

File Structures

The input file is of a machine-readable ASCII text type and is unstructured.  Example:


From: Amir Soofi

Sent: Thursday, December 06, 2012 2:37 AM

To: Aaron Macarthur; Hugo Cruz

Cc: Donald Krapohl

Subject: RE: Language Comparison




FYI, Rick Marshall unofficially approved a 3-day trip for one person from the Enterprise team down to Jacksonville, FL to assist in the catalog reinstall.


I’ll be placing it in the travel portal soon for the official process, so that the option becomes officially available to us.


I think together we’ll be able to push through the environment differences better in person than over the phone.


Let us know whether your site can even accommodate a visitor, and when you’d like to exercise this option.




Amir Soofi


Principal Software Engineer, Enterprise



The output of the openNLP Name Find algorithm map task on this input:

From: <namefind/person>Amir Soofi</namefind/person>

Sent: <namefind/date>Thursday, December 06, 2012 2:37 AM</namefind/date >

To: <namefind/person>Aaron Macarthur</namefind/person>; <namefind/person>Hugo Cruz</namefind/person>

Cc: <namefind/person>Donald Krapohl</namefind/person>

Subject: RE: Language Comparison




FYI, <namefind/person>Rick Marshall</namefind/person> unofficially approved a 3-day trip starting <namefind/date>14 November</namefind/date> for one person from the Enterprise team down to <namefind/location>Jacksonville, FL</namefind/location> to assist in the catalog reinstall.


I’ll be placing it in the travel portal soon for the official process, so that the option becomes officially available to us.


I think together we’ll be able to push through the environment differences better in person than over the phone.


Let us know whether your site can even accommodate a visitor, and when you’d like to exercise this option.




<namefind/person>Amir Soofi</namefind/person>


Principal Software Engineer, Enterprise


The output of an example reduce task on this output:

{DocumentUniqueID, EntityKey, EntityType}

{234cba3231, Amir Soofi, Person}

{234cba3231, Thursday, December 06, 2012 2:37 AM, Date}

{234cba3231, Aaron Macarthur, Person}

{234cba3231, Hugo Cruz, Person}

{234cba3231, Donald Krapohl, Person}

{234cba3231, Rick Marshall, Person}

{234cba3231, 14 November, Date}

{234cba3231, Jacksonville/,FL, Location}

{234cba3231, Amir Soofi, Person}


A second reduce pass might yield combinations for network analysis (link strength below being calculated on instances of co-existence across unique documents):

{EntityKey, LinkedEntity, LinkStrength}

{Amir Soofi, Donald Krapohl, 6}

{Amir Soofi, Aaron Macarthur, 15}

{Amir Soofi, Jacksonville/, FL, 1}


The data may then be consumed into the analysis tool of choice, such as RapidMiner, WEKA, PowerPivot, or SQL Server/SQL Server Analysis Services for further analysis.


openNLP on Hadoop can provides good metadata extraction for key information in unstructured data.  The information may be retrieved from competitor websites, SEC filings, Twitter activity, employee social network activity, or many other sources.  The data pre-processing and preparation steps in metadata extraction for competitive intelligence applications can be low relative to that of other analytical problems (contract semantic analysis, social analysis trending, etc.).  The steps outlined in this paper demonstrate a very high-level overview of a logical architecture and key execution activities required to gather metadata for Influence Analysis and Network Analysis for competitive advantage.

My Google+

Reorganization of a Healthcare Products Technology Organization – Proposal


Organizational Transition – Healthcare Products Division

[Redacted] Corporation

Executive Sponsor: Don Krapohl


Executive Summary

[Redacted] Corporation is presently functionally-organized and biased strongly toward a single customer.  The project efforts that led to this will be drawing down in FY14.  By discounting several outmoded assumptions and capitalizing on changes in environment the organization will be able to overcome space issues, reduce the impact of the partnership dissolution, add business development and test lab capabilities, provide the ability to more rapidly adapt to local changes, simplify communications, and provide line-of-site accountability.  Changes to within our market segment necessitate a more agile support structure that can cater to the needs of multiple customer requirements.  Our realignment post-transition organization chart, transition timeline, and high-level transition plan are included herein.


Only 35% of personnel provide direct support that cannot be performed offsite

Engineering functions do not need direct customer facility access

Corporate intelligence analysts do not need to interface with customers

[Redacted site] and [Redacted site] locations are entirely unsuitable as warehouse space

Physical space is a persistent concern with partner co-location

Direct engineering support to partner companies is only operations of hosted infrastructure, the [Redacted] product line, and training.

There exists risk of O&M support reduction on award of new contract by customer [Redacted] in FY14

Future changes

The Health Mining Partnership Initiative is to be dismantled in FY13

Personnel requirements for space and equipment need to be defined prior to drawdown

Functional leads to be named by discipline

Contract re-compete and partnering agreements renegotiated in FY13


Major customers

Our customers are closely aligned with the proposed organization:  Intelligence producers and consumers (Knowledge division), [Redacted Medical Supply Customer] (IT Support division), [Redacted Healthcare Customer] (Support division), and the other [Redacted] organizations (Portfolio Support and Enterprise Software Engineering divisions).

Intelligence producers and consumers

Mission: Provide IT capabilities and tools for business intelligence enhancement, analytics, and dissemination.  Facilitate information transfer between producers and consumers of knowledge products.

Customer location: Widely distributed, none local.

[Redacted Healthcare Customer]

Mission: Provide direct support to [Redacted Healthcare Customer] for implementation and use of analytic capabilities.

Customer location: Partner-owned/leased office space in [Redacted Healthcare Customer] and annexes.  Some servers and network elements located off-site.

[Redacted Medical Supply Customer]

Mission: Provide limited direct analytic subject matter and product support. Assist in facilitating vendor relationships and engineering/scientific support. Supply hosted infrastructure as a service and datacenter management for warehouse facilities.


Customer location: Owned/leased office space in [Redacted] and annexes.  Some servers and network elements located on-site.

Program partners/Enterprise Support

Mission: Provide identical capabilities to other customers’ enhanced capabilities, expanded requirements, capacity planning to continue support.

Customer location: None local.  Large majority not located within a primary facility.



Restructure the organization in a manner that balances customer-facing footprint, intelligence functions, enterprise engineering, and Enterprise Program support.

  1. Organizational
  • Balance O&M support for partners, form lines of management by industry segment for IT direct support and customer-specific engineering efforts.
  • Add a division for Program support that includes all non-operational support and infrastructure personnel.
  • Retain divisions for Enterprise Software Engineering and Knowledge Management
  • Remove intelligence activities from single executive and align in their own division.
  1. Geographic
  • Emplace only IT direct support personnel at [Redacted Healthcare Customer] and [Redacted Medical Supply Customer]
  • Remove all other personnel to a central location.
  • Move all Enterprise-, Intel-, and Program-related functions away from partner sites.
  1. Communications
  • Email, VOIP, SharePoint, and other core services are presently hosted.
  • Portfolio performance, metrics, announcements, calendars, WARs, and other administrative coordination on SharePoint.
  • IT operations incident/request on BICES service desk tool.
  1. Financial
  • Identify cost-sharing potential with another organization with need of DISA collateral-rated office space with adjacent/attached warehouse/loading facilities.
  • Buy-out or subcontract Brandon facility for remainder of lease.


Final-state Organization Chart

information technology company reorganization post-reorg view


SLA-compliant connectivity can be attained in commercial space

Office equipment budget can be reallocated for the new facility

Larger organizational surface area adds noise to communications and decision cycle

Funding will come from management reserve at Atlanta CDC redesign project closure.




  • [Competitor X] is execution-aligned in this way
  • Ability to focus communication more tightly to/from/within customer efforts
  • Provides only one accountable route into and out of each group
  • More fundamental organization into operations, engineering, support, and knowledge
  • Resolves warehouse issues
  • Allows capability surge that limits space requests within partner locations
  • Breaks the organization into elements that simplify execution of SOPs and workflows
  • Optimizes on Service Level Management by coalescing incident/request processes


  • Must vacate offices in [Redacted Location] and [Redacted Location] and terminate leases
  • Likely to meet significant resistance from Healthcare Products and Energy Divisions as they compete for funding
  • Capital expenses on centralized facility will increase
  • Must SLA-accredit the facility quickly


Direct benefits

  • Easier accommodation of surge space needs
  • Fewer targets for customer off-book requests
  • Improved communications management throughout the division
  • Better alignment with modes of work (operations, support, engineering, knowledge).
  • Improved security for proprietary information

Indirect benefits

  • Adds business development (BD) capability and increases demonstration space
  • Adds to R&D capability
  • Provides potential for test lab environment
  • Collaboration between functional areas are enhanced


  • Loss of mass at [Redacted Healthcare Customer] could have unknown impacts on future orders (mitigated by scheduled OPT dissolution and probable forced loss of personnel).


Transition Plan

1 August – Plan announced

1-7 August – Discover office space options

7-10 August – Document infrastructure needs

13-17 August – Submit requests for equipment, sign lease for 1 Oct, begin facility accreditation request

20-31 August – Document and lay out workspace; order workstations, locks, safes, printers, and phones

3-14 September – Manage facility accreditation request, sign up for utilities and security monitoring

17-28 September – Stage equipment, schedule accreditation site inspection

1-12 October – Move office equipment, set up work areas, set up kitchen and coffee, move out of Brandon facility

15-26 October – Management reserve in case of schedule slip

29 October -2 November – Staff move-in of first group

5 November 2012-2 January 2013– Facility burn-in

3-31 January 2013– Emplace and configure remaining desks and workstations for second tenant group

1 February 2013 – Second group moves in

1 February 2013 – Transition complete



270-day staffing plan


As-is staffing

To-be staffing

Staff matrix 1 Nov

Staff matrix (Final) 1 Feb

[Redacted Healthcare   Customer] 21 on-site 5 on-site 9 on-site 5 on-site
7 statisticians/data miners–
1 business analyst 2 network engineers 2 network engineers
4 desktop support– 2 desktop support 2 desktop support 2 desktop support
2 trainers
1 web developer — 2 web developers
1 tech writer —
1 configuration manager —
1 manager 1 manager 1 manager 1 manager
1 requirements mgr — 1 requirements mgr/Deputy PgM
1 user experience engineer — 1 IA engineer
Transition complete Feb 2013
[Redacted Medical Supply Customer] 7 on-site 10 on-site 10 on-site 10 on-site
4 systems engineers 4 systems engineers 4 systems engineers 3 software engineers
1 network engineer 2 network engineers 2 network engineers 2 desktop support
1 storage engineer 1 storage engineer 1 storage engineer 1 line-of-business architect
1 manager 1 manager 1 manager 1 business analyst
2 desktop support 2 desktop support 1 report developer
Transition complete Nov 2012 1 business intelligence developer
1 manager
Corporate HQ 13 on-site 28 on-site 25 on-site 28 on-site
1 Director 1 Director 1 Director 1 Director
2 Project Managers 2 Project Managers 2 Project Managers 2 Project Managers
1 business analyst 1 business analyst 1 business analyst 7 statistician/data miners
1 program admin 1 program admin 1 program admin 3 Software Engineers
1 logistician 1 logistician 1 logistician 2 business intelligence developers
7 software engineers 7 software engineers 7 software engineers 1 enterprise architect
6 intel analysts 6 Intel 1 user experience engineer
2 trainers 2 trainers 1 web developer
2 web developers 1 executive assistant
1 technical writer 1 technical writer 1 logistician
1 IA engineer 1 business analyst
1 configuration manager 1 configuration manager 1 technical writer
1 requirements manager/Deputy PgM 1 configuration manager
1 enterprise systems engineer 1 enterprise systems engineer 1 enterprise network engineer
1 enterprise systems engineer
1 IA engineer
1 requirements/release manager
Transition complete Feb 2013