Back to other posts

Chess, upside down helicopters and debt management

4
min read
January 9, 2025
September 23, 2021

Chess, inverted helicopter flight and debt management are seemingly unrelated topics. However, their common ground is that they are all great examples of applied reinforcement learning.

Reinforcement learning is a subset of artificial intelligence where a model learns to take actions from a particular state of a system to try to achieve an objective. For example, in chess this would be deciding the next move in order to win a match, or adjusting the controls and rotors to hover a helicopter.

Reinforcement learning is a particularly powerful technique when it's unfeasible to turn a system into an exact mathematical model. Helicopters have highly nonlinear dynamics which makes autonomous flight difficult. However, human pilots don't learn how to fly by calculating equations, but by experiencing how tilting the joystick affects their position. This same process can be used to train a model to fly a helicopter and perform difficult maneuvers without it ever having to be taught or solve the maths. Andrew Ng and Stanford University successfully demonstrated this by using reinforcement learning to teach a helicopter how to hover itself upside down.

The most large-scale and famous application of reinforcement learning is AlphaZero by Google’s DeepMind. AlphaZero is an algorithm which learns how to play board games from scratch by playing against itself millions of times. The algorithm is never shown examples of a good strategy, instead learning for itself by exploring different moves and evaluating their efficacy in order to win the match.

Source: DeepMind

DeepMind’s application of reinforcement learning is so powerful that AlphaZero is currently the strongest chess player in the world. It is speculated that no human player will ever be able to beat it and after learning chess for just 9 hours it was able to comfortably beat the existing best chess computer in the world, StockFish.

Since AlphaZero learnt how to play chess by itself it developed its own free-flowing and innovative style which differs greatly from other chess computers. Top players have even taken inspiration from AlphaZero’s play style and began incorporating some of its moves into their games.

Reinforcement Learning for debt management

Chess and debt management are of course completely different. However, the techniques which revolutionised chess can be used to greatly improve the debt management process for customers and organisations.

The role of a debt collection agency is to send outbound communications to a customer to try to encourage them to pay off their outstanding balance and become debt-free. Traditional debt collection agencies do this with rigid messaging strategies, where all customers receive the same communications at the same points throughout the collections process whilst receiving manual phone calls from agents.

Traditional agencies usually use an intense messaging strategy to improve recovery rates, which often results in customers being barraged by emails, letters and phone calls. If a customer is never going to be in a position to repay their debt, then what is the purpose of contacting them twice a week?

To improve this process we developed the Ophelos Decision Engine. Our aim is to help customers get out of debt whilst reducing the number of communications they receive.

The Ophelos Decision Engine

The debt management process closely resembles a reinforcement learning problem. We have a goal: to help customers become debt free, and we have actions we can take: sending a communication (SMS, Email, Letter, Phone Call) or not sending a communication.

One of the key differences between Ophelos solving the communications problem and AlphaZero solving a chess game is that we cannot run millions of simulations to train the model. There are no consequences to playing a game of chess against yourself to discover successful moves, but the consequences of poor debt management are huge. It would be incredibly irresponsible to allow a reinforcement learning model to learn from scratch by experimenting with sending communications to real customers.

To eliminate this problem the Ophelos Decision Engine was pre-trained using historical data. This gives the model a headstart in learning the effects that communications have on a customer’s likelihood to pay off their balance without having to experiment from scratch.

The Ophelos Decision Engine runs every day, analysing our past interactions with each customer. It then calculates whether or not sending a communication will improve the chances of that customer becoming debt-free in the long-term. If that chance improves then a communication is sent. If there is no benefit to sending a communication then nothing is sent.

The Outcome

Naively, we expected the Ophelos Decision Engine to want to send more communications in an attempt to prompt responses from customers. However, we discovered that the model actually wanted to send fewer. In the same way that chess players are learning from AlphaZero’s style, we have discovered that over-communicating is not only wasteful but even has a harmful effect on customers’ progress in clearing their debts.

The Ophelos Decision Engine achieved the desired result in a controlled live test. It was able to help more customers become debt-free while sending fewer communications than a traditional messaging strategy.

If you would like to discover more about how we're improving the customer experience, why not get a demo?