AI: Natural Language Processing and the Battle for Unstructured Data

September 5, 2019 | By: Ivy Schmerken

Natural Language Processing

By Ivy Schmerken

With digital transformation in full swing, trading desks are inundated with emails, voice calls and chat to process and analyze. Most of this data needs to be captured, tagged and stored for regulatory purposes.

But in capital markets, keeping up with the torrent of research reports in email, quote requests, and chat conversations with clients can be impossible to handle manually.

“Trading firms are overwhelmed with unstructured data as they have many forms of communication, such as the phone, emails and chat,” said Richard Johnson, VP market structure and fintech at Greenwich Associates, moderating the July 25 webinar “Artificial Intelligence on the Trading Desk.  “Then we have the competitive dynamics on the trading desk, which means there is falling head count and fewer traders handling the work.”

Natural Language Processing
Richard Johnson

Increasingly banks are turning to the field of natural language processing (NLP) and machine learning to extract valuable information from voice, documents, and audio to boost productivity on trading desks.

It’s all part of a broader push to gain efficiencies by training machines and bots to analyze language, capture insights, and replace manual tasks and drive workflows further downstream.

While there are fears of machines encroaching upon the work of humans on trading desks, natural language processing (NLP) is emerging as a practical solution for helping firms translate the jargon of trading into structured data that computers can learn from.

So, what is natural language processing?

“At the big picture level, natural processing is about mimicking the way that humans understand language,” wrote Jennifer Bi, a Salesforce executive in “How Natural Language Processing Will Change the Way You Work” on Medium.   For many years, Gmail has used NLP to split emails into different buckets, such as primary, social and promotions. It can tell the difference between retail sales promotions, media subscriptions and conversations with personal friends and co-workers.

NLP has been around for decades, and there are two types: natural language understanding is akin to reading, while natural language generation or NLG, a more nascent field, is more like writing – it helps bring structure to unstructured data in a form that computers can utilize, said Greenwich’s Johnson.

In finance, NLP has been deployed to extract insights from research. Every Wall Street firm employs teams of analysts who spend hours reading tedious financial documents and SEC corporate filings.  The sell side is also using NLG to write some of their research reports based on earnings updates and news stories on Yahoo Finance, while other sites are generated by automation, said Bill Stephenson, founder of AIR Summit, an event which focuses on innovative technology for investment managers. “While the sell side is more interested in pushing information out, the buy side is more interested in taking information in and learning from it,” said Stephenson. But the buy side is stepping up its usage of NLP.

Natural Language Processing
Bill Stephenson

“On the analyst and portfolio manager side, they are using NLP for ingesting corporate calls and earnings calls, said Stephenson whose AIR Summit 5.0 event is taking place from September 18th to 19th in New York City. “NLP is used to understand the tone, the patterns, and to measure the sentiment,” said Stephenson.

On June 5, Liquidnet said it acquired Prattle, an AI startup that uses machine learning and NLP to analyze communications from central banks and earnings calls, reported Financial News.  Prattle is among a handful of AI startups that presented at a past AIR Summit, and it  can be used to generate alpha for investment managers, analysts and institutional traders.

Another startup, Truvalue Labs, applies AI, machine learning and NLP to environmental, social   and governance (ESG) factors. It takes in data from different sources — news, company releases, and social media— and draws time series ESG sentiment scores to track over time.

“Traditional long-only firms are not high-frequency alpha generative.  They see trends over time. That’s where looking at, say, different NLP-generated scores over time is valuable,” said Stephenson.

Cutting Through the Noise

“Trading desks are inundated with many different data sources that are very unstructured. There is a lot of actionable intelligence inside of them, but it’s tough to extract it out,” said Tejas Shastry, chief data scientist at GreenKey Technologies, speaking on the webinar.

GreenKey develops AI applications that extract quotes and trade information from unstructured trade communications using NLP and speech recognition to understand Wall Street’s complex jargon, according to the firm’s web site.

Natural Language Processing
Tejas Shastry

“From an AI perspective, firms are not only dealing with the volume of data, but there is a nuanced jargon that is specific to each desk, such as words and phrases used by each firm or a particular group,” said Shastry.  “One of the reasons that firms look at GreenKey’s NLP is to extract the value from the noise, and to take the little nuggets of information that might help further drive workloads downstream.  But it’s very challenging to capture the value while ignoring the noise.”

In fact, about 90% of this data is noise, said Shastry.

For example, a junior trader has limited capacity to understand the information on a trading desk. But the advantage of natural language processing is that models can be trained so that machines can read thousands of documents at one time.

Among the key functions are “summarization,” which means NLP can listen to a bunch of calls and then summarize exactly what was said in a format that is friendly to users, said Shastry.

Voice and Chat: Is it a Mess?

Capturing data from voice conversations is a priority for financial institutions, said Johnson.

According to a recent Greenwich study, 88% of trading executives view reliable voice communications as very critical or extremely critical to their trading workflow.

“You are doing deals and trades worth millions of euros or thousands of dollars. It’s essential you get the traders to communicate this, and what if you get it wrong? The consequences are bad,” warned Johnson.

“One of the most interesting use cases is that GreenKey has trained their NLP to recognize voice quotes, and translate them into a machine-readable format,” said Andy Mahoney, head of sales at FlexTrade UK.  “It’s capturing a voice quote for a negotiated trade – this would currently be input manually by the trader into an EMS,” said Mahoney, noting that this works with any asset class.

Natural Language Processing
Andy Mahoney

“This opens up the possibility for us to automatically capture voice quotes into the EMS and link them to a conversation recording for best execution evidencing purposes,” explained Mahoney.  As a result, this is giving the EMS a bigger world view, reduces operational risk, and allows the EMS to capture a more comprehensive view of risk. Basically – the unstructured world is starting to be structured, which is good for us because we can capture more trading flow,” said Mahoney.

Chat is another source of unstructured data that can be integrated with downstream applications to help traders off load repetitive tasks to boost productivity.

On the webinar, Kim Prado, managing director of RBC Capital Markets, said that 70% of its voice conversations are in chat and are falling on the ground every day.

Chat communications is all but ubiquitous on trading floors and firms can choose to use multiple systems, observed Greenwich’s Johnson. About 98% of trading firms use chat communications and 16% use more than one system, according to the Greenwich study.


“One of the central goals of Symphony is to eliminate the idea of using multiple chat tools,” said Goutam Nadella, EVP of client solutions for the chat collaboration platform, speaking on the webinar. Firms have existing chat tools for external, internal communications, as well as front office and back office, he said.

“We see that a number of clients are looking to bring together the entire community on one tool,” said Nadella. In terms of how people are using Symphony with machine learning, NLP and AI, Goutam pointed to the themes of automation, efficiency and context.

Natural Language Processing
Goutam Nadella

Sell-side desks are managing inbound and unstructured RFQs from multiple clients with limited automation and potential for errors.  On the buy side, traders receive inbound quotes and price discovery information through chat.  Symphony is building tools on top of its chat platform, leveraging NLP and machine learning to automate all of that, said Nadella. For example, it built SPARC, a workflow application, to standardize and automate RFQ negotiations for complex products like OTC derivatives.

Financial institutions have also built bots on the system— currently there are 1,500 to 2,000 active bots on Symphony. Essentially, bots are working on behalf of users or automating repetitive actions so that users can focus on higher value tasks, said Nadella. Users type a command into a text chat, and the bot responds.

In the past, users had to remember to type forward slash and remember the command in a rigid format, but front office people had no patience for that, said Nadella.

Today, if a buy-side trader wants a price from a dealer, they still need to remember a specific format for an interest-rate swap price.  In one use case, a buy side client could speak in natural language, and through an integrated bot, the dealer comes back with an automated price.

Clients are building trade surveillance and compliance applications on top of Symphony to reduce the number of false positives. As much as 70-80% of alerts in surveillance and compliance are false positives, said Nadella.

In terms of context, Symphony has been looking historically at how content is stored in silos and how much context exists in the conversations that are disconnected, said Nadella. Though firms can’t necessarily change where the data located, if that’s in legacy systems, Nadella suggests leaving the data where it is. It’s partnering with firms like GreenKey to build NLP and machine learning on top of Symphony that will help clients bring all their content together.

Adoption of AI in Trading

In a live poll, 42% of attendees said that their firms are currently using AI technology in trading, 31% said they are not using AI technology, while 27% said no, but are in the process of exploring AI solutions.

However, one of the main challenges to AI adoption has been integration of AI into other systems.  “There can be numerous integration points, especially for firms with a plethora of legacy systems that have existed for decades.,” said GreenKey’s Shastry.

Forty-two percent of webinar attendees cited challenging integrations with other systems while 20% singled out cost and difficulty measuring ROI as the main barriers to AI adoption. Sixteen percent cited lack of internal experts, 13% said lack of management buy-in, while 9% thought the technology was too new or hard to understand.

In order to make progress with NLP and AI, firms need to organize their data, noted panelists. “Most firms at this point are figuring out how to get the data organized to create models for AI to work,” said Nadella.  Data management is an ongoing challenge, especially for firms with legacy systems in existence for decades.

Panelists agreed there were opportunities for the industry to come together on reference data. They stressed the importance of open source and standards, noting that they were working through FINOS, the Fintech Open Source Standards organization, which was spun out of Symphony.

Though there can be a steep learning curve in adopting NLP, panelists said that a lack of internal experts should not deter firms since they can work with AI technology partners.

Nadella said he is seeing higher rates of adoption among clients and that AI tools are easier to implement than was seen two year ago.

Ultimately, a machine can read information quickly, parse it and understand it, which can solve problems on the trading desk and boost productivity by allowing traders to understand what they have, said Johnson. “Ideally it helps generate new insights for the trading desk to make more money.”

brokers routing
Ivy Schmerken

AI/Machine Learning Integrations Available with FlexTrade’s Execution Management System Technology

For further information, please contact us at

Past FlexAdvantage blog posts related to AI/Machine Learning and Data Issues