Big Data Virtualization

Jboss enterprise has a free data virtualization (NOT server virtualization) platform called Teiid.  Capabilities of this include service of data from multiple technologies (jdbc, odbc, Thrift, REST, SOAP, etc.), merging/transformation of data, fault tolerance, scalability, and other capabilities one would require of an enterprise service.  This can stand in the technology portfolio as part of… Continue reading Big Data Virtualization

Example Big Data dev cluster topology

Below is an example dev cluster topology for a Big Data development cluster as I’ve actually used for some customers.  It’s composed of 6 Amazon Web Service (AWS) servers, each with a particular purpose.  We have been able to perform full lambda using this topology along with Teiid (for data abstraction) on terabytes of data.… Continue reading Example Big Data dev cluster topology

The Structure of an OpenNLP NameFinder Model

Named Entity Models Research labs and product teams intent on building upon openNLP and SOLR (which can consume an openNLP NameFinder model) frequently find it important to generate their own model parser or model builder classes.  openNLP has in-built capabilities for this but in the case of custom parsers the structure of the openNLP NameFinder model… Continue reading The Structure of an OpenNLP NameFinder Model