What does machine learning mean for financial markets?
Machine learning brings many new tools. Having said that, I would argue that finance has implicitly used machine learning principles from the beginning. Chicago Booth’s Eugene F. Fama and Dartmouth’s Kenneth R. French selected three variables out of thousands as important for explaining changes in asset returns. They did it using economic acumen, but in exactly the same spirit. What they effectively did was a form of variable selection that is now streamlined by ML. Machines are imitating what humans were already doing.
The industry has long borrowed insights from academic research such as Harry Markowitz’s modern portfolio theory, the Black-Scholes formula used in options markets, and the Fama-French factor model. But when it comes to ML, six years ago, industry (or at least parts of it) was leading academic researchers. I once had the head of a major hedge fund say that he didn’t even read academic financial papers.
Brian T. Kelly from Yale University and I decided to write a paper introducing ML to academic finance. How do you use hundreds of variables to make better predictions about returns? That’s where we started. From there, we established a rationale for adopting that approach. The paper has attracted the attention of academics and Wall Street, indicating growing interest in the field. We recently co-authored a study summarizing the latest developments to date.
In the first paper, we introduced advanced ML technologies such as trees and neural nets to help predict stock returns. Since then, we have moved to alternative data in another study and illustrated the use of image recognition and natural language processing tools adapted from artificial intelligence. Alternative data includes News Feed, one of the largest databases in terms of text. Because language is a highly complex information encoding system, ML must be used to discover the information embedded in text. You also need a large language model to read between the lines. We do not use ChatGPT in our research. I started working on this research in 2019, before ChatGPT was born. But it uses the model behind it.
Today, the wealth management industry is increasingly boasting ML capabilities and attracting big names from the data science or ML community. I didn’t even mention China, but since 2019, the volume industry has grown to an incredible scale. There are quite a few quantitative funds with assets under management reaching 10 billion yen (approximately 1.5 billion US dollars).
However, finance is different from computer science, so you should be careful about introducing tools that may not be applicable to the market. We’re trying to demystify ML. We want to understand what its weaknesses and limitations are and how it can be improved. There is a lack of theoretical guidance.
One concern is the black-box nature of ML models. If a fund makes a loss, its manager needs to be able to explain to investors what happened. When your trading strategy is based on 1,000 variables instead of 3, it’s hard to figure out exactly what went wrong. However, if you want more accurate predictions, you may have to accept some drawbacks. There is a trade-off between performance and interpretability.
Finance is a conservative field, and there are also questions about how much better ML is than simple models. There are also areas where it may not work as well, such as long-term forecasting, where economists must rely more on their intuition.
But there’s a lot you can do with alternative data. First we analyzed the numbers, then we analyzed the words. Now we’re also looking at context. Traditional approaches cannot account for this, but large-scale language models can. I’m glad to have been an early adopter of ML technology and hope it continues to thrive. In a data-rich environment, ML has the potential to do far more than the best minds in economics could do without it.
Excellent success He is the Chicago Booth Professor of Econometrics and Statistics.