One of the biggest challenges for the modern business is learning to utilize all of the data available to them in a way that is both meaningful and actionable. However, the potential for using data generated by a website is often left unexplored, and as a result, the intentions and reactions of individual digital customers can be overlooked.
Focus is often placed on the broad strokes - key metrics such as the number of page views this month, or the number of unique visitors. While these figures have their place, we lose the ability to shape our individual customer’s journey, or to identify the customers who need engagement most. As a result, customers who may be on the verge of signing up for a trial, completing a checkout, or any other desirable outcome, can fall through the cracks. We know the outline of the picture, but we are missing all of the shades and complexities needed to understand our customers’ online experience entirely.
On the average website, there is an abundance of information to be collected about who interacts with your site and how. By leveraging all of this data, we can gain insights into customer behavior. Machine learning techniques can be used to determine which customers may be interested in achieving an outcome on your site.
For instance, if a customer is not en route to achieving a desirable outcome, a content offer or a chat offer could help to steer them in the right direction.
Predicting customer behavior can tell you which customers to reach out to on your site, in real time, to convert website visits into tangible outcomes.
So how do we do this? We can use machine learning techniques to create a model, using data collected about customer behavior to date. This model will then tell us how likely a customer is to achieve an outcome, based on what we know about that particular customer.
Creating this model can be broken down into a few key steps:
- Gather appropriate data
- Prepare and transform data
- Choose a machine learning algorithm
- Train, Test and re-evaluate your model
Let’s look at these steps in a bit more detail.
1. Gather appropriate data
When trying to predict the likelihood of an event occurring, we look at what has happened so far. We begin by gathering data about every customer visit to the site. This includes demographic information such as location and device type, as well as behavioral data such as how many pages they have viewed and how long they were on the site. To data scientists, these are known as features. We also record whether or not a customer has achieved a particular outcome. These are known as labels.
From here, the premise is simple enough: if we are aware of the features of customers who have previously achieved an outcome, future customers with similar combinations of features are the most likely to also achieve this outcome.
2. Prepare and transform data
This step, while often overlooked, is usually the most work-intensive. Now that we have collected relevant data, we must change it into a form where it can be used with a machine learning algorithm. Categorical data, such as location or device type, usually needs to be binary-encoded. This is so that it can be recognized in a form that our algorithm can understand. Numerical data often needs to be normalized. Many machine learning algorithms perform better when numbers are scaled between 0 and 1. For instance, the number of pages a customer has viewed would be normalized. We use these techniques on both the features and the labels, with the labels requiring binary-encoding.
Sometimes, certain features can be detrimental to overall performance. It would be more advantageous to omit the feature from the model than to leave it in, as that particular feature does not give us much information. This is where feature selection comes in. Feature selection is the process of deciding which features to use for the model. While some techniques may not require feature selection in all cases, it is a key step in most machine learning algorithms.
Although different algorithms may require slightly different steps to prepare the data, the above process is common for the majority of them.
When the data is prepared, we split it into three subsets :
- A training set, that we will use to build our model. This is usually 60 to 80 percent of the dataset, but it can vary.
- A validation set, that we use to compare the performance of our model, using different parameters for our algorithm of choice. We then select the parameters that maximize our accuracy. This is usually 10 to 20 percent of the dataset.
- A test set, that has not been used in creating the model. This is usually 10 to 20 percent of the dataset. Its purpose is to evaluate the performance of the fully trained model on unseen data from the same distribution.
3. Choose a machine learning algorithm
When calculating the probability of an event occurring, there are various machine learning techniques to choose from. In our case, we are specifically looking at supervised machine learning, where we are constructing a model from labelled training data. The model describes the relationship between the features and the labels and allows us to predict if a customer will get an individual label based on the set of features related to that customer.
Some supervised machine learning techniques include decision trees, regression, Bayesian methods and deep learning (neural networks). Many of these algorithms also have parameters which must be tuned to achieve the best accuracy. Some algorithms have very few parameters to be set, while others, such as neural networks, have quite a few and can require some investigation. We are currently doing some work using neural networks for predicting user behaviour. While they can require a lot of tuning, neural networks are a very powerful tool for making predictions, and with recent advancements (such as GPU-accelerated Tensorflow) they have the ability to build models with data at unprecedented scale.
4. Train, test and re-evaluate the model
Having chosen a machine learning technique and prepared our training data, it’s time to train a model. We pass each set of features along with its corresponding label through the algorithm.
This generates the initial model.
We then use our validation data to tune our model, by using it to check how well our model does when trained with various different parameters, and picking those that maximise the performance metric. Accuracy is one well-known performance measure that can be used, and can reach in excess of 95% (or typically 99%) given the right circumstances. However in some cases, it is better practice to use a metric other than accuracy when creating your model. This can happen when the consequences of false positives outweigh those of false negatives, or vice versa.
In this case, maximising a metric such as an F1 score may prove to be better practice, as it takes the number of false positives and false negatives into account. This allows us to tune the model to minimize cases when a model predicts a conversion for a user who actually did not convert, or lack of conversion for a user who eventually did convert.
Once a satisfactory level of performance has been reached on the validation set, we use the test set to assess the performance of the fully trained model on unseen data, to see how it generalizes to data it hasn’t encountered before. If performance remains satisfactory, we now have a model that can be used to predict whether or not a customer will achieve a particular outcome, based on the features of that customer.
It is important to remember that while important, performance should not be the only concern when creating a model. In some cases, increasing the accuracy of a model that is already very accurate can be both costly and time-consuming with very little return. Integrating a model into a product can be a long and complex process that is often overlooked. It is vital for a user to be able to access the predictions generated by a model easily, through a clear interface, and in real time, in order to make full use of them.
In this blog post we have discussed how machine learning can predict customer behaviour. Techniques such as deep learning, can be successfully applied to customer data to train high accuracy models, which can be applied in real time to produce accurate, personalized predictions. These insights can prove hugely beneficial to a business looking to engage with the right customers at the right time.