Skip to content
  1. Contact Us
  2. Search
Northern Trust Quantitative Strategies

How Does Natural Language Processing Work?

The Key Questions to Ask

Over 75% of the world’s data is textual1 — spanning firm filings, financial news, social media, earnings calls and press releases. Natural language processing (NLP), or the analysis of a large volume of language using artificial intelligence, can capture this vast body of unstructured information that numbers alone often miss, opening compelling opportunities for investors. But the barriers to generating durable alpha from text are steeper than they appear, and investors should take care to perform due diligence on investment managers who say they effectively incorporate NLP into their process.

Why NLP Is Hard to Do Well

With NLP, the raw inputs — news feeds, transcripts and filings — are widely available. The models to process them are increasingly commoditized. Anyone with basic coding skills can set up an AI agent that runs basic NLP algorithms on publicly available sources of company information. The advantage does not come from access to textual data, but from the quality, sensibility and intuition behind the economic questions you ask and the rigor with which you test the answers.

Several structural challenges make NLP particularly treacherous for investors:

  • Overfitting is endemic. Text datasets have many dimensions and a relatively short history. Models can appear to work brilliantly in-sample but capture noise rather than signal.
  • Publication bias compounds the problem. Academic NLP findings tend to be reported only when they work. The strategies that do not survive publication scrutiny are invisible to most market participants.
  • Crowding happens fast. When a vendor offers a pre-packaged sentiment score to the market, its informational edge begins decaying from the moment of first sale.

Our recent paper in the Journal of Portfolio Management, “Natural Language Processing for Asset Managers: Turning Text to Alpha”2 analyzes these structural challenges and offers a framework of best practices that separates disciplined application of NLP from exploratory data mining that tends to disappoint out of sample.

What the Research Tells Us

Most durable NLP signals are built on economic intuition first and data second. When practitioners invert that sequence, i.e. they mine text for patterns and then retrofit a story, the results rarely survive live implementation.

To ensure that NLP applications provide a sustainable edge in investment decisions, best practice suggests a robust and repeatable framework assessing sensibility, predictability, consistency and additivity. All four criteria should be met and understood prior to deployment – see NLP Investment Checklist below.

The Signal Landscape: What Has Held Up

Our research2 surveys a broad range of text-based signals, from classic bag-of-words sentiment derived from earnings calls, to more sophisticated peer similarity measures using neural embeddings. A few categories stand out for their consistency:

  • Graph theory. This models text as a network to uncover dependencies and information flow beyond simple word sequences and sentiment.
  • Peer and industry structure. Text-based firm similarity measures derived from business descriptions identify competitive dynamics and revenue exposures missed by Global Industry Classification Standards (GICS).
  • Forward-looking language in filings. The specificity and confidence of guidance-related language in 10-K and 10-Q filings correlates with subsequent earnings quality.

What these approaches share is clear economic rationale in advance. Management teams that hedge more are signaling something. Further, investing decisions are always peer relative. Firms’ descriptions of their own business provide a richer view than discrete industry classifications ­­ this can aid in identifying mispriced securities. These are testable propositions, not data artifacts.

Where the Evidence Is Weaker

Our research2 is equally clear about where NLP signals tend to disappoint. High-frequency sentiment derived from news wires suffers from rapid crowding and implementation friction. Social media signals, despite their popularity, have shown limited robustness in institutional equity settings after accounting for transaction costs. Generic vendor sentiment scores, applied without adjustment for sector or firm characteristics, tend to degrade quickly as the data becomes widely distributed.

Using NLP in this manner may add value at the margin that is meaningfully smaller than research abstracts often suggest — especially after considering implementation costs. That is not a reason to ignore the space; it is a reason to approach it with discipline.

What This Means in Practice: An Investor’s Checklist

For institutional investors allocating to a strategy that incorporates textual signals (NLP), thorough due diligence is important in this domain where the temptation to over-engineer is unusually high and the track record for live implementation is unusually short.

Due Diligence QuestionsBest PracticePitfalls
What is the economic hypothesis?Ground every NLP signal in economic intuition first — understand why it should work before testing whether it doesDon't treat NLP as a black box; if you can't explain the signal, you can't trust it in a drawdown
How well do you understand the signal?Build infrastructure for transparency — understand what the model is doing and whyDon't assume sentiment scores are interchangeable across vendors — methodology differences are material
How was the signal tested out-of-sample?Validate out-of-sample rigorously; NLP models are particularly susceptible to in-sample overfitDon't mistake recency for robustness; short backtests on text data are especially dangerous
What is the expected half-life of the advantage?Monitor for alpha decay actively; text-based signals crowd faster than you might expectDon't plug a vendor dataset directly into your model — if you can buy it, so can other investors
Is the signal being used to replace judgment or to inform it?Treat NLP signals as complementary to traditional factors, not replacementsDon't confuse linguistic fluency with predictive power; Large Language Models (LLMs) can be confidently wrong

``

Northern Trust's Perspective

At Northern Trust Asset Management, our approach to quantitative equity investing has always prioritized economic logic over pattern recognition. NLP is no exception. We see real potential in text-based signals, particularly in areas like peer structure and management language analysis. However, we apply the same standards of out-of-sample validation, risk management and cost discipline that we bring to any quantitative investment strategy.

 

[1] Tam Harbert, “Tapping the power of unstructured data,” MIT Sloan School of Management (Ideas Made to Matter), February 1, 2021, https://mitsloan.mit.edu/ideas-made-to-matter/tapping-power-unstructured-data

[2] Guido Baltussen, Gijsbert de Lange, Ashraf Mansur, Olivera Rakic, and Machiel Westerdijk, “Natural Language Processing for Asset Managers: Turning Text into Alpha,” The Journal of Portfolio Management 52, no. 2 (Quantitative Tools 2025): 184–211, https://www.pm-research.com/content/iijpormgmt/52/2/184

IMPORTANT INFORMATION

Northern Trust Asset Management (NTAM) is composed of Northern Trust Investments, Inc., Northern Trust Global Investments Limited, Northern Trust Fund Managers (Ireland) Limited, Northern Trust Global Investments Japan, K.K, NT Global Advisors, Inc., 50 South Capital Advisors, LLC, Northern Trust Asset Management Australia Pty Ltd, and investment personnel of The Northern Trust Company of Hong Kong Limited and The Northern Trust Company.

Issued in the United Kingdom by Northern Trust Global Investments Limited, issued in the European Economic Association (“EEA”) by Northern Trust Fund Managers (Ireland) Limited, issued in Australia by Northern Trust Asset Management (Australia) Limited (ACN 648 476 019) which holds an Australian Financial Services Licence (License Number: 529895) and is regulated by the Australian Securities and Investments Commission (ASIC), and issued in Hong Kong by The Northern Trust Company of Hong Kong Limited which is regulated by the Hong Kong Securities and Futures Commission.

This information is directed to institutional, professional and wholesale current or prospective clients or investors only and should not be relied upon by retail clients or investors. This document may not be edited, altered, revised, paraphrased, or otherwise modified without the prior written permission of NTAM. The information is not intended for distribution or use by any person in any jurisdiction where such distribution would be contrary to local law or regulation. NTAM may have positions in and may effect transactions in the markets, contracts and related investments different than described in this information. This information is obtained from sources believed to be reliable, its accuracy and completeness are not guaranteed, and is subject to change. Information does not constitute a recommendation of any investment strategy, is not intended as investment advice and does not take into account all the circumstances of each investor.

This report is provided for informational purposes only and is not intended to be, and should not be construed as, an offer, solicitation or recommendation with respect to any transaction and should not be treated as legal advice, investment advice or tax advice. Recipients should not rely upon this information as a substitute for obtaining specific legal or tax advice from their own professional legal or tax advisors. References to specific securities and their issuers are for illustrative purposes only and are not intended and should not be interpreted as recommendations to purchase or sell such securities. Indices and trademarks are the property of their respective owners. Information is subject to change based on market or other conditions.

All securities investing and trading activities risk the loss of capital. Each portfolio is subject to substantial risks including market risks, strategy risks, advisor risk, and risks with respect to its investment in other structures. There can be no assurance that any portfolio investment objectives will be achieved, or that any investment will achieve profits or avoid incurring substantial losses. No investment strategy or risk management technique can guarantee returns or eliminate risk in any market environment. Risk controls and models do not promise any level of performance or guarantee against loss of principal. Any discussion of risk management is intended to describe NTAM’s efforts to monitor and manage risk but does not imply low risk.

Past performance is not a guarantee of future results. Performance returns and the principal value of an investment will fluctuate. Performance returns contained herein are subject to revision by NTAM. Comparative indices shown are provided as an indication of the performance of a particular segment of the capital markets and/or alternative strategies in general. Index performance returns do not reflect any management fees, transaction costs or expenses. It is not possible to invest directly in any index. Net performance returns are reduced by investment management fees and other expenses relating to the management of the account. Gross performance returns contained herein include reinvestment of dividends and other earnings, transaction costs, and all fees and expenses other than investment management fees, unless indicated otherwise.

Forward-looking statements and assumptions are NTAM’s current estimates or expectations of future events or future results based upon proprietary research and should not be construed as an estimate or promise of results that a portfolio may achieve. Actual results could differ materially from the results indicated by this information.

Artificial Intelligence (AI): AI refers to computational systems designed to perform tasks that typically require human intelligence, such as pattern recognition, decision-making, and prediction. In investment management, AI may be used to support portfolio construction, risk assessment, and trading strategies.

Machine Learning (ML): ML is a subset of AI that enables systems to identify patterns based on data inputs without being explicitly programmed. ML models may be used in stock selection to identify investment opportunities based on historical and real-time data.

Natural Language Processing (NLP): NLP is a field of AI focused on the interpretation and generation of human language by machines. In financial contexts, NLP may be applied to analyze textual data such as earnings reports to inform investment decisions.

Large Language Models (LLMs): LLMs are advanced NLP systems trained on extensive datasets to understand and generate human-like text. In investment management, LLMs may assist in synthesizing qualitative information or generating insights, but do not independently make investment decisions.

© 2026 Northern Trust Corporation. Head Office: 50 South La Salle Street, Chicago, Illinois 60603 U.S.A.