APPROACH TO THE SYNTHESIS OF NEURAL NETWORK STRUCTURE DURING CLASSIFICATION

Evaluating the number of hidden neurons necessary for solving of pattern recognition and classification tasks is one of the key problems in artificial neural networks. Multilayer perceptron is the most useful artificial neural network to estimate the functional structure in classification. In this paper, we show that artificial neural network with a two hidden layer feed forward neural network with d inputs, d neurons in the first hidden layer, 2d+2 neurons in the second hidden layer, k outputs and with a sigmoidal infinitely differentiable function can solve classification and pattern problems with arbitrary accuracy. This result can be applied to design pattern recognition and classification models with optimal structure in the number of hidden neurons and hidden layers. The experimental results over well-known benchmark datasets show that the convergence and the accuracy of the proposed model of artificial neural network are acceptable. Findings in this paper are experimentally analyzed on four different datasets from machine learning repository.


INTRODUCTION
The past two decades have seen an enormous change in the field of artificial neural networks and their applications. Especially, there has been a considerable growth of progress in a multi-layer feed-forward neural network [1][2][3][4]. The application areas of artificial neural network include wide fields in computer science, biology, decision science, medicine, finance, engineering, etc. While solving applied problems, one can use various kinds of architectures of neural network with some appropriately chosen activation functions. If a constructed artificial neural network model is not capable to solve a specific problem or to give acceptable precision, they increase hidden units (neurons) in the layers and/or increase hidden layers on the network model. However, in applications it is necessary to define or indicate how many neurons and how many layers one should take in the neural network architecture. Of course, the larger the number of neurons and hidden layers is, the larger the probability of the network to give more precise results is. Unfortunately, practicality decreases with the increase in the number of neurons and the number of hidden layers in the neural network model.
However, from a theoretical view, problems of the lower and upper bounds of neurons in hidden layers and the bounds of hidden layers to classification and pattern recognition architectures have not been completely studied yet. For instance, one of the current existing findings is related to binary neural networks [18]. They defined a structure named a most isolated samples in the Boolean field and proved that at least 1 neurons are necessary in binary neural network. However, these kinds of findings are restricted to specific situations and do not provide a fundamental theory for studying the computational capabilities of neural network in general. Also, there can't be made any extension of those existing results to feedforward neural network, such as layered feedforward neural networks with some activation functions.
In addition, previous works on evaluating the number of hidden layer neurons related with the capabilities of the feed-forward networks [19][20][21][22][23]. In [24,26], authors proposed to apply the singular value decomposition method in neural networks to estimate the number of hidden neurons. However, algorithms for computing the singular value decomposition is time consuming process. In addition, in [27] a new approach was proposed to fix the number of hidden neurons of multi-layer perceptron (MLP) architecture. They proposed an approach that consists of the post training employment of the singular value decomposition and principal component analysis (PCA) method to adjust parameters of network. The number of hidden neurons is then fixed to the number of singular values or eigenvalues, which is obtained from operations over matrices.
In [28], authors investigated to fix the bounds of the number of hidden neurons in a special type of network, which is called multi-valued multithreshold neural networks. They studied neural network properties with q -valued function defined on some space. The obtained results can find implementations to build learning algorithms with training set of finite numbers.
In other words, neural networks are not always effective if the number of neurons in the hidden layer and the number of layers are prescribed without theoretical background. In the current paper, we show that there exists optimal architecture for neural networks with two hidden layers. We want to give that a two hidden layer neural network with infinitely differentiable sigmoidal activation function  , d neurons in the first layer, 2 2 d  neurons in the second layer, k outputs and the ability to solve classification and pattern recognition problems within given arbitrary precision.
A single hidden layer neural network model with r neurons in its hidden layer and the input threshold values and i c -the coefficients in real numbers and  -is a univariate function, which is called activation function in neural network literature.

PROBLEM STATEMENT
A two hidden layer neural network architecture is defined by iteration of hidden layers. In general, artificial neural network with two hidden layers, input , r neurons in the first hidden layer, s neurons in the second hidden layer and output   1 , , k y y y   is as follows: The sigmoidal activation function is a special type of function, which plays a significant role in the research of artificial neural networks, especially in classification and pattern recognition problems.
In this current paper, we use the sigmoidal activation function, which is defined as follows:

THE MAIN RESULT
Our result is based on the capabilities of the universal approximation property of the neural networks. Neural network can do approximation of any continuous function on any compact subset of  with a given arbitrary precision. Kurkova's results [29,30] showed that an arbitrary continuous function can be approximated arbitrarily quite well off by two hidden layer neural network with a univariate sigmoidal activation function. However, according to these findings, the number of neurons (units) in the hidden layers needed to implement the approximation is exceedingly large. The problem of fixing the number of neurons in the hidden layers was first solved by Mairov and Pinkus [32]. They showed that there exists a two hidden layer network with a sigmoidal activation function for which it is sufficient to do approximation arbitrarily enough of any continuous multivariate function.
Later, the best result was obtained by Vugar [31] for two hidden layer feed-forward networks. Namely, he showed that a two hidden layer neural network with d inputs, d neurons in the first hidden layer, 2 2 d  neurons in the second hidden layer and with a specifically constructed sigmoidal and infinitely differentiable activation function can do approximation of any continuous multivariate function with a given (arbitrary) precision.
We extend the above mentioned approximation properties to multivariate output neural networks. Namely, we show that artificial neural network with a two hidden layer neural network with d inputs, d neurons in the first hidden layer, 2 2 d  neurons in the second hidden layer, k outputs and with a sigmoidal and infinitely differentiable activation function can solve classification and pattern recognition problems with arbitrary accuracy.
We continue this section with the definition of a  -increasing (  -decreasing) function which is given in [31]. Let 0   be any real number on .  The proof is completed. Remark 1. In some papers, a single layer network is defined as function A two hidden layer network then takes the form as follows From the proof of Theorem 3.1, for networks of type (2) the theorem is valid if we take 2 1 d  neurons in the second layer, instead of 2 2 d  .

SIMULATION RESULTS AND DISCUSSION
We have investigated the performance results of the training/testing process using the obtained criteria from Theorem 3.1, on four types of classification datasets: (a) Iris datasets, (b) Australian, (c) Wine and (d) Spambase, which are well-known benchmark datasets from the UCI Machine Learning database (https://archive.ics.uci.edu/ml/index.php).
We constructed the neural network architecture as stated in the problem statement section: a two hidden layer feed-forward neural network with infinitely differentiable sigmoidal activation function , which has d neurons in the first hidden layer, 2 2 d  neurons in the second layer and k outputs. The neural network model for each dataset was trained and tested to guarantee that it is sufficiently trained. The standard back-propagation method was performed using a batch learning. Figure 1 shows the training process for the Australian dataset, while Table 1 shows the training and testing results, and the proposed number of the hidden neurons and the number of hidden layers using criteria which are given in Section 3.   2, d  respectively, we can see that there is no need for that.
From the experimental results on well-known benchmark datasets, the convergence and the accuracy of the proposed model of neural network is acceptable and even better than we have expected.

CONCLUSION
In this paper, a theoretical result has been obtained on the constructing feed-forward neural networks for setting the number of hidden neurons and the number of hidden layers. The proposed result can be helpful to design a proper architecture of feed-forward neural networks (two hidden layers) with multivariate outputs. The theorem formulated in this paper provides a compact neural network architecture on the size of the feature space of input data to have a minimum degree of training complexity with sufficient number of parameters. In a nutshell, this can produce a compact classification/recognition model of high generalization with a relatively fast training time.