UCSD Machine Learning Engineering and AI Bootcamp Student Uses NLP to "Take on Wall Street"

While enrolled in the University of California San Diego’s Machine Learning Engineering and AI Bootcamp, Aditya Bahl watched with interest as a group of amateur traders on Reddit conspired to take on large hedge funds by short squeezing stock from GameStop, a video game and consumer electronics retailer. By buying GameStop stock for cheap (the stock dropped to a value of $2-$4 per share in March 2020, the lowest in the company’s history) in a coordinated way and agreeing not to sell for a period of time, the amateur traders drove up the price of GameStop stock, thereby causing short-selling investors to lose money as the value of their shares plummeted. 

As part of his bootcamp capstone project, Bahl decided to create a machine learning algorithm that would analyze market sentiment using headlines from financial news outlets. “Tracking market sentiment can be a powerful tool for investors because understanding the mood of where the market is going can allow one to capitalize on the changing direction,” he wrote in a blog post on Medium, where he explained the project in detail. 

“My UCSD boot camp experience was an interesting one, I learned so many new things,” said Bahl, who is involved in a Microsoft initiative called TEALS to equip teachers with the skills necessary to teach their students about computer science. “The course has definitely changed my perspective on everything as machine learning is all around us and will continue to disrupt almost every industry.”

A former consultant at Deloitte, Bahl appreciated the ability to learn at his own pace and have weekly calls with his mentor. “He provided me with great industry insights and shared his experiences with me of how he advises different clients,” Bahl said of his mentor, Zuraiz Uddin, a data scientist at Teradata. 

About the project

Bahl built his sentiment classifier using natural language processing, a subset of artificial intelligence that combines computational linguistics with statistics, machine learning, and deep learning to enable computers to “understand” the meaning of text and audio. While NLP models used to be trained for highly specific tasks (eg: handling rote customer service requests in a chatbot), today’s models tend towards general purpose language representation—that is, training models on an enormous dataset of unannotated text to enable them to respond in a variety of situations. 

Bahl trained his model on a dataset of 4,840 sentences from financial news headlines, which had been categorized by sentiment. The dataset had been annotated by 16 different professionals who have a background in financial markets. The goal of the project was to build a sentiment classifier that would determine the polarity of news headlines. Bahl experimented with three ML models for this project: VADER, Google’s BERT Model and Google’s XLNET model. 

Media coverage—both positive and negative—has a substantial impact on a company’s stock price. Analyzing the sentiment of financial news headlines in the context of the GameStop debacle provides a proxy for how confident investors are feeling about the market under the circumstances. 

“Most traders get their information from the news, which makes it an influential factor in forecasting change in the stock market,” Bahl wrote. 

His model tracks sentiment for a stock, the volume of shares being traded, and takes into account historical stock prices. Due to time constraints, Bahl could only take his project so far, but in the future, he would ideally like to use his model as the basis for training a bot to trade stocks based on time series data, sentiment data, and knowledge graphs—thereby enabling investors to capitalize on stocks with the greatest potential. 

To see a write-up of the sentiment analysis project, ‘Using NLP to Take on Wall Street,’ in Bahl’s own words, you can view his blog post on Medium.com.