DEEP MULTILAYER NEURAL NETWORK FOR PREDICTING THE WINNER OF FOOTBALL MATCHES

1) Department of Intelligent Information Technology, Brest State Technical University, 267Moskovskaya st., Brest, 224017, Belarus, anfilets_sv@mail.ru, bescase@gmail.com, vladimir.golovko@gmail.com, val.tut@gmail.com, dzinushi.kun@gmail.com 2) Department for Information Computer Systems and Control, Ternopil National Economic University, 11 Lvivska st., Ternopil, 46020, Ukraine, as@tneu.edu.ua, mko@tneu.edu.ua, pb@tneu.edu.ua, oso@tneu.edu.ua 3) Allbebet OU, 26 Majakast., Tallinn, 11411, Estonia raman@allbebet.io


INTRODUCTION
The appearance and development of modern intelligent technologies promoted many branches of human activity to reach a qualitatively new level as well as achieve previously unthinkable results. A vivid example of the active evolution and integration of up-to-date technologies is modern sport. Today, sport involves more and more people, increases financial, material and intelligent flows and resources. All this enabled the sport to become an important political and economic component of the modern world.
It is no secret that one of the most popular kind of sport -football (soccer), is a multi-billion international market with a very extensive infrastructure. Transfer value of the best players is estimated of tens of millions of dollars. For example, the "Paris Saint-Germaine" football club bought a player Neymar from "Barcelona" for $260.9 million. According to Forbes rating, the total cost of the ten most expensive football clubs in 2017 is $23.29 billion.
One of the most attractive sports activities is predicting the outcomes of sports games. This task forms the basis of betting business, and it is extremely difficult due to the unpredictable nature of sporting events, especially football games (matches). There are many determining factors of scoring a goal including the strength of attack and defense, the home ground advantage and others taking place during the match. Along with this, such unpredictable factors as the removal of a player (red card, a penalty, the judge's rigor and many others can affect the final score. As a result, the task of forecasting the outcomes of sporting events, that can include such problems as big data analysis, data Artificial neural networks have proven themselves in such tasks as prediction, pattern recognition, classification, control, robotics, etc. In our opinion, the sport event predictive systems based on artificial neural networks are the most promising. The advantage of such systems is their flexibility, versatility and accuracy of prediction [14][15][16][17][18]. Such systems can be considered as universal approximators of nonlinear dependences. However, for their training and functioning the large volumes of all kinds of statistical data are needed. Nowadays, the largest providers of sports data for football are Wyscout [19] and Opta Sports [20]. While Opta Sports collects and distributes full, time-stamped, contextual data live, featuring complete x/y coordinates (as well as z coordinates where applicable, such as shots in football), and a granularity of event type unique amongst data providers, Wyscout is focused on collecting sports video data.
In this paper, we present the research outcomes in developing the oracle for forecasting the results of football matches. We tested the system on football matches of English Premier League. A proposed approach is based on artificial neural network methods and it uses the limited open access information about football team statistics, such as team shots, shots on target, a number of yellow and red cards, etc. The remainder of this paper is organized as follows. Section 3 describes the theoretical studies of applied methods. The results of forecasting as well as the comparative analysis are contained in Section 4. Conclusion is given in Section 5.

DATA PREPROCESSING
Many different indicators for description the strengths of the football teams can be used. The selection of indicators that form a football team rating is an important task. It is necessary to select such parameters that have a high degree of information and importance for the description of the team. The most significant parameters are a standing place, the number of points at a chosen time interval, the number of goals scored for a selected time interval, the number of conceded goals, etc. Table 1 shows the parameters that were selected for our system. The described parameters can be downloaded from Football-Data.co.uk [22] and include the information about all Premier League games since 2002 (Fig. 1). We are not limited to information only for the last match. We observe last 35 matches and calculate the aggregated statistical indicators. In our opinion, such indicators are more informative and they can be used for estimating the current state of a team.
The aggregation of the input pattern is performed by averaging the each parameter in Table 1 for 35, 15, 10, 5 and 5-10 matches. Fig. 2 shows the example of aggregating (averaging) the input pattern using the Team Shots indicator.
In addition, we calculated the standard deviation for 35 matches and added it to input pattern. The standard deviation provides information about the team stability and it is very important for the result forecasting.

Figure 2-Example of aggregating the input pattern
After the aggregation we get the input vector consisted of 108 values -54 values for both teams.  In addition, the input data is normalized according to the following statistical indicators: average value and standard deviation. That allows obtaining a more stable predictive model. The formula for normalization is: where μ -expected value; σ -standard deviation.

DEEP MULTILAYER NETWORK
For forecasting the sport outcomes, a deep multilayer neural network based on elastic net regularization was proposed. This NN structure is selected because it іs capable to produce good predictions using the limited data.
Zou and Hastie [23] considered the Elastic Neural Network as a new method for regularization and variable selection. It produces a sparse model with good prediction accuracy, while encouraging a grouping effect. This method demonstrates good results, especially when the number p of predictors is much larger than the number of observations n.
In our study, we used the multilayer neural network (Fig. 4). The input layer contains 108 neurons. The dimension of the input pattern determines the number of input neurons. The net has three hidden layers with 128, 64 and 32 neurons in each layer respectively and 3 neurons in the output layer. Such network contains more than two hidden layers and therefore it is called deep multilayer neural network.
A prepared pattern (see Fig. 3) enters to the network input. Further, three hidden layers with LeakyReLU activation functions perform calculations on the input pattern. Three output neurons reflect the results of the calculations, interpreted in the next form: win -draw -loss. The first output neuron is responsible for the victory of the home team, the second neuron -for a draw in the match, and the third neuron -for the victory of the guest team, respectively. The softmax activation function for the neurons of the output layer is used.
A distinctive feature of Elastic Net is that it uses L1, L2 regularization. While the L1 regularization (also known as Lasso Regression) employs to select parameters, the L2 regularization (also known as Ridge Regression) performs a network overfitting control (overfitting means the growth of model coefficients) in the learning process.

CASE STUDY
To test the system that was developed for the forecasting the results of English Premier League football matches, a data set consisting of 5018 patterns was used (i.e., the set contains the information about all games since 2002). To make the early termination of learning process we worked with the validation set with the size of 15% of training set. For the learning the Deep multilayer neural network, the following parameters were selected:  Coefficient for L1 regularization = 0.002;  Coefficient for L2 regularization = 0.0005;  Algorithm of learning -SGD (Stochastic Gradient Descent) with a step equal to 0.01;  Minibatch size equals 64.
The training process takes approximately 5 minutes on the following PC configuration: GPU NVidia 1070TI, CPU Xeon e5-2680 v2, RAM 32 GB.
The trained system was tested on the last 350 Premier League matches that were not included in training and validation sets. Table 2 demonstrates an example of the predicted outputs of the trained system. Wolves vs Fulham (score 1:0) The proposed system confirmed the prediction accuracy of 61.14% on the test dataset. This accuracy is calculated by dividing the number of correct predictions by the total matches.
We used as test data set last 250 matches of English Premier League. As it can be seen from the above, the proposed approach is characterized by the higher accuracy and low computational costs in comparison with existing approaches. So, for instance in [24] the deep sparse auto encoder for football match prediction has the accuracy of 51.4%. In [25] the comparisons of various extensions for Bradley-Terry models were made and a hierarchical log-linear Poisson model for predicting the outcomes of soccer matches was built. The best performing hierarchical log-linear Poisson model showed the 54% accuracy. We employed as test data set the last 250 matches of English Premier League. Using the prediction results of the developed system, users can bet on this or that team in the upcoming match. Fig. 5 represents the profitability of the sports bets using the outputs of the developed system. The value, which is equal to 100 on a y-axis, means the starting amount in soccer fan account (for instance, 100 EUR). As it can be seen in Fig. 5, the profit reaches 30.47 points (by bet365 ratio) that is a good result. In spite of the fact that the developed system is capable of making a profit, some sections with relative large drawdowns can be seen in the chart. Thus, a trend from 152-th till 190-th matches is downward.

CONCLUSION
The approach to predict the results of sports competitions, which is based on a Deep Multilayer Neural Network with elastic net regularization and a possibility to be trained on a limited open access dataset, is proposed. Instead of using raw data, authors selected the most valuable parameters and designed the data preprocessing procedure that allows building a representative team conditions and forming the input patterns. The developed system proved good prediction results and it is able to make a profit. So, it can be used as a basis for Oracle to predict the results of sporting events. The main advantage of proposed approach is a simplicity as well as low computer complexity.
The system can be improved by using the specified dataset and providing the paid resources and developing a more complex neural network architecture. For instance, further we are going to employ the deep autoencoder for preliminary data processing and the deep multilayer perceptron for forecasting.
This research starts our ambitious project to develop and implement the advanced artificial intelligent approaches for professional sport.

ACKNOWLEDGMENTS
We thank the Allbebet OU staff who provided insight and expertise that greatly assisted this research.