Data Processing & Analytics

A Cloud-Based Analytics Platform for Regulatory Compliance and Business Insights


Previously I wrote an article about what I called the “Intelligent Data Plane” where I described how analytic functions and data processing could exist together to allow businesses to make use of their data securely and ubiquitously across departments of an organization. The article described how cloud data platforms have emerged to help enterprises gain insights into the massive amounts of data available to them. I described two new companies, one adding value in the document analysis workflow area (making manual processes more efficient) and one in the manufacturing area (making manufacturing data available to management through analytics and reports) to show the promise of analytic processes in various markets. 

In this article I focus on the special requirements within the financial services industry for trade and communications oversight and regulatory reporting. I also highlight how a new company (SteelEye Limited) has built a unique cloud-based platform to serve the needs of this industry and also provide business insight into trade and communications data generally.

Background and Motivation

My experience running the engineering organization that handled communications surveillance and other analytics-based products within Bloomberg LP for five years taught me how critical the analysis of communications and trade data is for financial-services firms. Financial firms (some of the world’s largest banks, hedge funds and private equity entities), rely on 24 hour/day access to the data they generate in their trading and general business operations, the efficient search of these data, and the analysis of their data in relation to financial regulations of all types.

My Intelligent Data Plane article illustrated how cloud-based platforms such as Snowflake Inc. allow companies to harness all of their data into a single view where analytics can be performed. These platforms are great resources for corporations as they allow one “place” where many relevant data sets can be viewed by different departments of the company and then analyzed with Business Intelligence Tools, Data Analytics packages, etc. 

Such platforms are fine for general business analysis of data but lack the functionality required for the analysis of financial transactions and the related data in company communications that accompany them. When it comes to the regulatory requirements of financial services firms, specialized platforms are needed.

Financial Service Firms – Special Requirements

The financial services sector is a demanding one. The data that is exchanged to represent transactions and communicate between firms has to be retained (record-keeping) and the data has to be processed and stored in a certain way. Preparing and indexing data for financial services regulatory applications requires special knowledge of the regulations and how regulators, in-house counsel interfacing with regulators and compliance officers need to search and access data. These requirements are more rigorous than those used for general enterprise data (for plain search purposes).

Search criteria around transactional data is important as it is common that the legal and regulatory teams from financial institutions need verification of what is inside their trades, electronic communications and other data, and how these data relate to one another. Data in various formats must be “normalized” so that key data attributes of trades, email and instant message, and even chat room data can be identified with search queries. The time-based nature and custody of communications must be preserved when data is processed for indexing.

These large data sets representing transactions (and communications related to transactions) are often housed within disparate systems that each have a separate search and retrieval interface. It is generally a nightmare to try and coalesce a single view of all relevant data pertinent to a regulatory matter across these enterprise systems. 

For this monumental task, a special kind of platform is required; one that is tuned for regulatory reporting but also flexible enough to support general views into massive data sets. Such a platform was designed, built and brought to market by a group of seasoned financial services executives. They formed a company, SteelEye, and built a groundbreaking platform. They delivered this platform in a very compressed timeframe by having the correct expertise within the company and starting with a fresh perspective. The SteelEye cloud-based service, leveraging the near infinite processing and storage capability of the cloud, transcends the power and capability of other solutions by orders of magnitude.

Prior to the SteelEye platform, compliance teams would be required to log in to disparate systems, pull results and correlate them by manual means. This is of course a costly alternative to receiving a single view of results from the highly scalable SteelEye platform. SteelEye provides one view into a firm’s data negating the manual processes required by multiple enterprise databases or search indices associated with single-use “silo-based repositories”.

SteelEye – A Unifying Cloud-Based Platform for Analytics and Regulatory Compliance

SteelEye is a compliance technology and data analytics firm that simplifies trade and communications oversight and regulatory reporting through a unique, fully integrated, and cloud-based platform. By connecting large volumes of data from multiple sources, SteelEye enables firms to meet regulatory obligations quickly, efficiently, and accurately. With SteelEye, firms gain full visibility and control of their trading and compliance operations, with cutting-edge analytics that provide timely insights on risks and opportunities. 

Founded in 2017, the company is led by Matt Smith (former Chief Information Officer at Noble Group and Senior Product Manager at Bloomberg) and serves clients in the buy and sell-side space across the UK, Europe, and the US. The company has several key case studies and client success stories documented here. A recent and significant funding event has been completed to accelerate growth into the North American Market in 2021.

This quote from the CEO (Matt Smith) encapsulates the vision and mission of the company:

“What makes the SteelEye platform an exciting and cutting-edge solution for financial firms is its ability to simplify key compliance processes by connecting and making sense of large quantities of data that naturally doesn’t fit together. This enables firms to meet multiple regulatory needs using just one platform and truly use their data to uncover new insights.” 

Significance of Team

The team at SteelEye have extensive experience in the regulatory and compliance market. As a consequence of this, they understand all regulatory requirements for building world-class compliance products. Beyond that, they fully understand the sheer scale of processing and analytics required within an industry that has to retain and manage 100’s of millions to billions of transactions per day. This uniquely positions the team to understand how to build not only compliance applications but also enterprise scale applications for the general business analysis market. The blend of team members with both compliance/regulatory and technology experience is unique in my experience and sets SteelEye apart from its competitors. The team has built a suite of applications on top of a massively scalable data platform that can provide insight on both risks and opportunities.

Significance of Product

The team at SteelEye recognized that their market requires a true “Intelligent Data Platform”. The team have extensive experience in the various regulations to which their clients must adhere and have decades of combined experience that shines through in the platform’s applications and interfaces, which help with both financial compliance and business operations management. The applications built on top of the SteelEye data platform provide the support that compliance officers require and that are not available in general data analysis platforms. SteelEye applications leverage the same underlying data set representations for a variety of purposes.

SteelEye have produced reporting applications and an API that allow their clients to get the answers they need for regulators in record-breaking amounts of time.

Matt and I both worked together at Bloomberg LP, so I was not surprised when he showed me the platform and it was designed brilliantly. The user interface leads the seasoned professional or the novice compliance analyst through the process of finding relevant data for reporting or analysis purposes. The value of having all of a firm’s trade and communications data in one cloud platform where it can be analyzed against the requirements of a number of different regulations was evident in a quick demonstration.

Platform Capabilities

  • Data Platform
    • Record keeping
    • Trade Reconstruction
    • eDiscovery
  • Trade Oversight
    • Trade Surveillance
    • Holistic Surveillance
    • Best Execution and TCA
  • Communications Oversight
    • Communications Surveillance
  • Regulatory Reporting
    • MIFID II
    • EMIR
  • “Insights” – next generation reporting tools
    • API
    • SDK

Regulatory Support

The SteelEye platform can support a wide range of regulations and rules, including those imposed by Dodd-Frank, EMIR, MAR, MiFID II, SEC, FINRA and more.   

Key Attributes of Platform

  • Multiple compliance obligations satisfied through applications on one platform
    • Various applications for different regulations are supported on one view of a client’s data
      • Multiple data sources collected in the platform are viewed as a whole data set
      • All applications work on the data set in its entirety
    • Each application is tuned by subject matter regulatory experts 
  • Solutions are built to evolve
    • As regulations change, the platform is powerful and can adapt to these shifts
    • Applications are not constrained by capabilities of the platform
  • Automation enabled through Machine Learning (ML) and Advanced Analytics
    • to simplify processes, improve investigations and make better predictions

Flexible Platform Adds Value to and Beyond Compliance

Despite having vertical applications that support specific requirements of specific regulations, the platform also supports “ad hoc” analysis of data. Therefore, as regulators or business analysts need answers to questions that are not in a pure regulatory context, the platform can supply the answers.

The engineering team at SteelEye selected the Elastic Search technology stack (sometimes referred to as and “ELK” stack) as a base for their platform’s analytic capabilities. This was an excellent choice as Elastic Search has petabyte scale, is very extensible and presents API’s to allow for easy integration.

Having led the team at Bloomberg LP that rebuilt its entire data processing pipeline for compliance/surveillance, I respect a team that selects the right technology for not just solving the problems of the present, but also looking to the future and anticipating the need to extend the platform as the needs of customers evolve

The Elastic Search stack provides:

  • Petabyte scale
  • Data sharing – REST APIs are built into Elastic Search
  • Flexibility – adding documents to the stack enhances the data model naturally as documents (data representations) are stored as vector representations that are the basis for ML models

As mentioned above, the SteelEye platform has tuned applications for specific regulatory adherence but is also extensible enough to handle more “ad hoc” uses that transcend compliance alone. Also (as pointed out above) the platform can evolve to meet new regulatory requirements. SteelEye engineers leverage ML and advanced analytics within the platform for a variety of purposes.

The technology team anticipated the future needs of the platform by utilizing both ML and Classification capabilities into the product. This article explains how the Elastic Search stack can be used for classification of data. Data classification is also used (particularly within Trade and Communications Surveillance) to reduce the occurrence of false positive results and make surveillance alerts more relevant.  Examples of how SteelEye has implemented practical items that make surveillance and search results more relevant include:

  • Email classification to eliminate non-relevant emails (such as those containing commercial advertising and solicitation emails not related to business operations). Customers can exclude such email messages from being monitored in market abuse watches, and the system “learns” what these “look like” over time – getting better at identifying and excluding them.
  • Customers can select the areas of certain communications they wish to monitor, for example the email subject and body, which contain the most relevant content, and exclude sections like the email signature or “the rest of the thread”. 
  • The surveillance system of the product includes both rules-based (lexicon based) criteria (as is customary and required in many financial services contexts). The lexicon constructed by SteelEye is reported to be many times larger than competitive rule sets. This multi-layered approach (of employing ML plus required rule sets) is very intelligent and again is designed to reduce the number of false positive results reported by the system.

Examples of how the technology team have implemented practical items that make surveillance and search results more relevant include:

  • Email classification to eliminate non-relevant emails (such as those containing commercial advertising and solicitation emails not related to business operations). Customers can exclude such email messages from being monitored in market abuse watches, and the system “learns” what these “look like” over time – getting better at identifying and excluding them.
  • Customers can select the areas of certain communications they wish to monitor, for example the email subject and body, which contain the most relevant content, and exclude sections like the email signature or “the rest of the thread”. 
  • The surveillance system of the product includes both rules-based (lexicon based) criteria (as is customary and required in many financial services contexts). The lexicon constructed by SteelEye is reported to be many times larger than competitive rule sets. This multi-layered approach (of employing ML plus required rule sets) is very intelligent and again is designed to reduce the number of false positive results reported by the system.


SteelEye has built an outstanding and award-winning data analysis platform that supports compliance and regulatory reporting superbly. The product is vastly scalable and extensible to support evolving regulations. The seasoned team at SteelEye envisioned the product from a true cloud-based perspective. In so doing, the platform is also extremely useful for general business analytic purposes that can aid commercial efforts of all kinds within an enterprise.

Data Processing & Analytics Economy and Business

Datanomix – “Production Intelligence”

In one of my recent posts, I defined what I called “the Intelligent Data Plane” and the SW components that exist within a data intelligent processing “stack” to make data produced within an enterprise “useful”. A company that has built a platform for collecting data from manufacturing facilities and presenting it to users in an intelligent way is Datanomix, Inc., a company located in Nashua, NH.

This platform is interesting because it fits into my thesis as a “Stage Two” platform where data is not only collected and presented to users intelligently, but where data is provided in a fashion that can aid other applications of interest to its users. In addition to performing its own analytic functions, this platform acquires, secures and generates data that can enable other analytic processes that may be of interest to manufacturers.

Please recall the diagram (below) for an overview of the intelligent data plane architecture.

Intelligent Data Plane (Manufacturing Data Analysis)

In this reference architecture for intelligent data analysis, Datanomix would be the Intelligent Data Plane component interfacing between the manufacturing factory floor and cloud services where vital production data is sent for analysis.

Datanomix Platform
Datanomix’ platform is an intelligent analytics platform that can access and collect production data from manufacturing machines (Computer Numerically Controlled “CNC” machines, “smart” lathes, 3 -D printers, etc.) over standard API interfaces. Datanomix deploys easily configured local SW components that interact with the factory floor machines to collect data from production equipment. The Datanomix SW intelligently acquires and filters the data into what is relevant to a particular process within the manufacturing workflow. Subsequently the data is transformed and transported to cloud SW components where it is saved, indexed for retrieval and formatted into reports that are meaningful.

The Datanomix SW transforms the data it retrieves into standard JSON objects that can be displayed in reports or indexed in the cloud. The objects are transported over an encrypted channel to the cloud data service (Azure for example) where it is indexed and analyzed. All data at rest is encrypted in the cloud for security purposes.

Production Monitoring Dashboard

The Datanomix SW also interfaces to standard ERP systems and can incorporate the company “GEMBA” board ( a Japanese term meaning “real place”) representing the status of the manufacturing process. It has standard connectors to display the GEMBA board on a Smart TV television monitor in a factory setting.


The Datanomix SW also has connectors for monitoring machine health and operating parameters. The SW can apply anomaly detection algorithms to establish automatic thresholds for normal operation and send alerts when a machine is operating outside of normal parameters.

By sending the local data to a hosted cloud service where it can be stored, indexed and analyzed, Datanomix can add features to the platform easily. Through their hosted cloud service they provide a wholistic view of an entire manufacturing process. Manufacturing managers get the data they need rapidly and conveniently to help them understand what parts of their operations may need refinement.

As data is stored securely and in a standard format understood by other applications, it can be shared easily. Datanomix’ customers may need to share process control data with their customers for quality assurance purposes. Due to these capabilities, Datanomix can provide true “Stage Two” capabilities by enabling data analysis, security and sharing without requiring a large commitment from their customer’s IT staff.

The platform is not limited to use by industrial manufacturing entities. The platform has relevant use cases with pharmaceutical manufacturers who collect large amounts of process control data and need to store and retrieve it conveniently. Use cases in the semi-conductor manufacturing space also exist.

Datanomix “checks the boxes” for intelligent data acquisition, filtering, transformation and secure transport of data to cloud locations (as shown in the Intelligent Data Plane architecture). Data at rest is protected within the cloud service with strong encryption as well. They have excellent User Interface capabilities to highlight the reports they can generate for customers. Analytic capabilities exist for monitoring machine performance. The platform is easily extensible in the future as more intelligent analysis features are requested by customers. In these respects, the Datanomix platform is a strong performer within the “Intelligent Data Plane” architecture.

Data Processing & Analytics Economy and Business

“Intelligent Data Plane”

Executive Overview

The data analytics platform revolution is in Stage One of its development. Investors, entrepreneurs and enlightened customers have enabled the development of great platforms that perform data analysis, yield insights on data sets at great scale and deliver enormous value. The term “Artificial Intelligence” (AI) is used to describe the functions that many of these platforms deliver. The right way to think about this may be that these platforms will first deliver enhanced, human-supervised Machine Learning (ML) rather than true native intelligence, and that argument is made here, but this article is more about describing an “Intelligent Data Plane” architecture.

This architecture will enable these platforms to not only ingest data and perform a “best in breed” function based on AI/ML, but to allow customers to hook new platforms together with existing ones in a “data pipeline” fashion where data can move between platforms and add value across all of them. Stage Two of this revolution is building service platforms in a way that allows these “best of breed” technologies to securely share and analyze data among one another and provide greater advantage than they could by working alone.

Background and Perspective

In April of 2012, famous investor Ann Winblad pronounced that “Data is the new oil” on a segment of CNBC. This was a long time ago, but it has proven to be true. In a more recent article (from March 23, 2020), Peter Wagner, founder of, a Silicon Valley venture firm, outlined the importance of data and analytics and what he calls Artificial Intelligence to the modern enterprise.

Peter’s article is an intelligent overview of some companies that he identifies as playing a major role in providing the data intelligence that will give the modern enterprise competitive advantage. He makes some insightful comments about how enterprises will use Artificial Intelligence and Machine Learning to build this advantage.

In the article Peter illustrates how workers in the modern enterprise connect to and utilize data in many different applications or platforms. The nature of the distributed enterprise and the distributed nature of enterprise data is illustrated very well. The number of applications shown in the article also illustrates that data can be used in a variety of ways and across applications. The distributed nature of applications and how the data they use resides in multiple locations and multiple clouds is obvious to the reader. Peter did an excellent job showcasing the landscape of Artificial Intelligence technology, platforms and how they will be used commonly in the future.

Pondering this new “fact of life” it occurred to me that in speaking about analytics, artificial intelligence and machine learning, some of the aspects of how to make data useful get lost in the discussion. For example, making data available to a number of applications and securing it in transit as the data changes locations is now important. Ensuring that the data is in a format that can be consumed by multiple applications while maintaining security of the contents and ensuring only authorized access to the data must also be a consideration. In the past (and in many non-cloud environments) this was not the case. Now such considerations are common business requirements.

Stage OneA Good Start

In actual practice though, I think that the evolution of the data analytics “industry” is in stage one of its development. There are great platforms and services that have been built, but the second stage of industry evolution will be when we can use platforms seamlessly across data sets, transforming the data as necessary to allow the “best of breed” applications to work together. In actual practice, most enterprise applications rely on their own databases and access paradigm. Data is not universally useful and fungible across different platforms.

When a manager in Human Resource Management can access data selected (and made anonymous) from the communications sent within the company (without a manual step being necessary to make the data anonymous) to run analysis that will indicate workplace satisfaction trends, we will have begun to make use of data. We of course have to ensure the platform will maintain the security of the data and anonymity of the senders but this would be a great benefit to help management understand what to improve about the workplace environment.

Like oil, data has to be extracted from its source, refined, moved to where it is useful, and used in a productive way. It has to be protected while it is stored and there are regulations for how it can be stored as well. So in a lot of ways, data is like oil in that it is powerful and precious and requires special handling to be useful.

Given these facts, it occurred to me that there is an emerging model for how to think about data and data analysis technology. As the networking world defined the seven layer “OSI model” of how network technology is divided into layers of functionality, there is a similar (but different) model emerging for evaluating data handling and analysis software. I have named this model the “Intelligent Data Plane” and compare it in some ways to the OSI Seven Layer stack defined for networking.

The Intelligent Data Plane

A new “Intelligent Data Plane” is necessary to connect and identify all the data moving through an enterprise and help prioritize and manage it. Data from users, manufacturing locations, supply chain partners and cloud services must be accounted for in the Intelligent Data Plane. This intelligent layer of SW is illustrated in the following diagram:

At a high level of abstraction it is important to visualize that the modern enterprise is distributed and data comes from many different applications. Data is sent from different users and is often stored in diverse systems at multiple locations. Given what is in the data itself, who produced it and where it originated, the data will have certain value and will require certain types of handling. Data should reside and be processed in either the “cloud”, “hybrid edge”, “edge computing”, or pure on-premise locations, depending on the security and processing requirements of data items.

A set of components implementing the “Intelligent Data Plane” of a processing stack provide the necessary identification, indexing, labeling, encryption and authorization functions to protect and use data in a cross-platform, multi-user environment.

OSI Stack for Networking

The following diagram illustrates how the OSI stack for networking delineates the required functions to supply services from applications running on a TCP/IP “stack”.

Intelligent Data Plane (Functional View)Stack” Components

The Intelligent Data Plane, in a way somewhat analogous to the OSI model for networking, has defined functions. The functions work together as data passes through this “data analysis stack” to provide data that is valuable to users through applications. Such a stack can work together as a pipeline to provide data that can be useful and securely available to given sets of users and applications. The stack enables users to “interact” with the data and analysis algorithms to refine the classification capabilities of the software.

All of these components need to be present to capture, filter and safely store data that can be analyzed and used. As with the OSI layers of a networking stack, each layer of the Data Analysis stack has to perform its function to make the data relevant, meaningful and secure.

Stage Two – Extending Capability

Note that data can move through the components of such a data plane, or processing stack in pipeline fashion, adding value to data items as they are processed by applications. Stage two of the data revolution will be achieved when applications can use components implementing these functions as they need them. This will allow an application to be built quickly, utilizing whatever components of the intelligent stack are required for processing and analyzing then presenting data of interest to users.

The Intelligent Data Plane is software that provides the functions of:
Intelligent data acquisition (source-specific connection and object/attribute filtering of data). Examples would be reading files from a specific Amazon S3 bucket or receiving emails from a certain mailbox location where customer inquiries are stored. This is largely present in Stage One platforms and is mentioned here for completeness.

Filtering rules/initial analysis to collect only relevant data. Examples of this would be collecting files that have been stored within a specific timeframe, or emails that are sent to a specific mailbox address, or retrieving database records with a certain range of identifiers in a certain column of a table. These functions may not be as “automatic” as they need to be in today’s stage one environment.

Data Transformation into objects that support analytics. An example would be parsing a PDF file and determining if it has a table, the table within the PDF has data with certain values, and transforming the data into a JSON object (document) and storing those objects for further analysis by another component within the Data Intelligence stack. Stage two platforms will provide standard mechanisms to normalize and share data among applications/platforms.

Movement of data to third-party modeling and analysis software or modeling software within the intelligent data stack itself. Identifying data with sensitive personally identifiable information (PII) and labeling the data so that these data items can be protected. Movement of data and classification and labels (while maintaining user context of who defined the labels) is largely platform specific in stage one products. Stage two would make these a universal attribute of data analysis systems.

Data Protection (encryption, obfuscation). Examples would be encrypting objects or documents that have sensitive data. Stage two would make the encryption and key management transparent to users, and encryption/data protection a user-centric or community centric attribute of classified data sets.

Data Analysis/Classification – with “human in the loop” capability to aid and improve classification performance over time. Examples would be a user interface and data presentation layer that allow a user to reclassify data that the classifier has mis-labeled. Such data would be reintroduced to the training SW to allow the model to be updated and allow the classifier to work more accurately in subsequent classification runs. Stage two would evolve to include this as base functionality. Stage one products add this today platform by platform at the application level.

Data Storage and Control (storage plus indexing for eventual retrieval). Examples would be functionality to index data items, store data items and labels such that documents can be searched and retrieved by either keyword attributes of the documents, or classification labels associated with given data items. Most Stage one platforms have this today, or provide on-demand indexing based on meta data attributes for deeper content searching. Stage two platforms would enable association of user-defined labels with certain sets of data items and allow them to be searched along with strict data attributes.

Data Presentation and User Interface Layers for Viewing Data and the results of classifications.

The Data Presentation layer must be backed by APIs and technology that can store the labels for multiple classifications and present them in the proper context for a given user. All documents that are classified accurately and given a specific label should be able to be associated and presented to the user who has the authorization to view them.

The Intelligent Data Plane is a reference architecture defining layers of functional components that aid the analysis and management of data in service platforms. The evolution from stage one platforms that provide great value but that may not be built with all of these functions was explained. The different layers of the architecture were presented to allow the reader to understand what functions can be built to extend the platforms that exist today.

The ability of the architecture to supply “pipelined” processing flows that allow multiple platforms to securely share data and provide their own best of breed functionality into a greater benefit for customers is the end goal of such an architecture.

Data Processing & Analytics Economy and Business “Liberating Data, Accelerating Intelligent Business Decisions”

In one of my recent posts, I defined what I called “The Intelligent Data Plane” and the SW components that exist within a data intelligent processing “stack” to make data produced within an enterprise “useful”.

Key components of the Intelligent Data Plane are automation of the processing of data items (email, documents), automation of analytic processes and transformation of data into useful formats for inclusion of data into “downstream” processes. The Intelligent Data Plane architecture also defines an intuitive user interface ability so that users can interact with data to identify data attributes to be used by the classification engine. In addition, the product should contain an ability to refine the performance of its classification and analysis algorithms through human review. is an exciting company doing great things in all of these areas of the Intelligent Data Plane architecture.

Intelligent Data Plane Accelerating Business Decisions, is a company located in New York City. It automates the processing of many types of data, unstructured file data as well as semi-structured email data. The platform eliminates the manual overhead of evaluating emails, documents, forms and the content they contain. It enables the integration of its automated data processing and classification engine with “downstream” processes as well.

Through an intuitive user interface, it allows business users to interact with data and the system to build data models without requiring them to be data scientists. The system then identifies and processes documents and emails, delivering them to “downstream” workflows with minimal human intervention. The system eliminates untold hours of tedious and error prone manual labor and speeds the processing of data included in email and documents.

Deep Compliance and Financial Markets Expertise

The business team has extensive experience in the financial services and compliance market. They have recognized that automating manually intensive business processes will provide unprecedented efficiencies to previously tedious and error-prone operations.

Clear Technical Advantage

The technical team has deep expertise in document processing, machine learning and computer vision. Together, they have built an impressive platform that can evaluate email, documents within emails, and documents at rest to classify and normalize their business content. The system can then deliver the documents that are most relevant to pre-specified business criteria to the next step in a production workflow. The automated classification of relevant documents is extremely accurate and eliminates costly errors that could occur with manual review alone.

Using both machine vision and machine learning,’s extremely capable team has built a system that identifies and classifies data of interest to a business process user, extracts the relevant data into a structured payload (transforming it into a standard format for further processing). Allowing relevant data to be extracted and normalized into a database, the content and the documents that contain the data can remain associated for later retrieval. have named the latest version of their product “Patterns” after the concept of having a user define a pattern in the data that the platform can use to identify and classify documents. Patterns Overview

Alkymi describes their process in this illustration from the Patterns Product Brief ( The interface allows a user to select relevant data from given documents and the system identifies the selected items as attributes to utilize for identifying documents of that “class”. The documents are then analyzed against the patterns and labeled within the system. The combination of machine vision and text analysis is very powerful and differentiating from a technical standpoint.

Defining Patterns/Models

Applying Results to Workflows

Apply Pattern Results to a Workflow

With the system, data can be promoted to the next stage of the business process easily. This fits my thesis of a component within the Intelligent Data Plane architecture allowing data to be transformed and moved to another business process in a workflow. Data moves from an automated location or email box into the system where it is processed according to the pattern defined by the customer. The useful data is extracted and moved along to the next stage of the process.

“Human in The Loop” – Refining Accuracy of Classification

One very important aspect of the system’s team built is the ability for users to refine the results of the automated classifications. By allowing users to mark and refine the data attributes used in a classification the system gets “more intuitive” over time.

Standard Document Automation Scenarios have built solutions that have automated the identification of many types of standard documents in various industry sectors.

  • Portfolio summaries
  • Performance reports
  • Fact sheets
  • Allocation exposure & sector reports
  • ESG related compliance documents

Visit for more industry concentration areas.

Summary is an exciting new company building a second-generation data analysis platform (a generation ahead of other vendors). It incorporates an intuitive interface to allow data to be collected and categorized against the criteria important to business users. This eliminates the tedious and error-prone process of human beings having to open and read thousands of emails to determine which contain relevant documents or data. The SW also meets important criteria that was defined in the Intelligent Data Plane architecture of automating the flow of data into other workflow processes. Additionally, the system allows a human to refine the results of the data classifiers by reviewing results and adjusting the attributes being used for classification. This is a very impressive company and platform.

Data Processing & Analytics

Building Data Pipelines for Machine Learning/Deep Learning with Apache Kafka

Interesting slide presentation on using Apache Kafka Streams and Machine Learning:

For those contemplating these technologies this may be very useful.

Data Processing & Analytics

Kafka & Spark for Data Pipelines

For large-scale applications that process data efficiently, integrating messaging and analytic software is a must. Understanding options and selecting the proper architecture for systems that will implement data pipelines is the most important step.

Here is a great article for those contemplating building and implementing an architecture for a data pipeline: