SELF-ORGANIZING MAP WITH NGUYEN-WIDROW INITIALIZATION ALGORITHM FOR GROUNDWATER VULNERABILITY ASSESSMENT

Assessment of groundwater vulnerability to contamination plays a vital role in the utilization and protection of groundwater resource. In this study, a vulnerability map for Boracay Island, Philippines was developed using a modified self-organizing map algorithm to determine groundwater vulnerability in light of massive tourism developments in the island. Self-organizing map using the Nguyen-Widrow initialization algorithm was used to cluster DRASTIC data which were pre-processed using data cleaning normalization schemes. The vulnerability map developed showed that groundwater resource in the island is susceptible to contamination as confirmed by groundwater quality analysis. The result of the study demonstrates the effectiveness of the improved SOM algorithm as a tool for assessment of groundwater vulnerability and is comparable with the traditional DRASTIC method. The developed methodology allows grouping of datasets into clusters that represent the level of vulnerability to contamination of the groundwater. Further, this approach can be applied to other islands to ensure the balance between tourism developments and ecological integrity of the scarce groundwater resource.


INTRODUCTION
Groundwater quality is a primary factor in resource management. It is critical since contamination may render groundwater unsafe to use with environmental and health consequences. Contaminants, though naturally filtered through soil and rock formations, still threaten the water resource quality. The gravity of vulnerability assessment of groundwater is a priority, especially for small islands, whose sustaining ability to maintain groundwater is at risk.
Several approaches were developed to evaluate groundwater vulnerability [1,2]. The most common among them is the DRASTIC Method, an overlay and index method frequently adopted in many studies. It employs seven parameters to estimate the vulnerability index -depth to water (D), net recharge (R), aquifer media (A), soil media (S), topography (T), vadose zone impact (I) and hydraulic conductivity (C). The vulnerability index is calculated by giving relative ranks and weights to these seven parameters [3][4][5]. Its limitations lie in the subjectiveness of assigning numerical values to the descriptive entities and relative weights of different attributes [6]- [8].
Clustering algorithms group set of data into multiple clusters so that objects within a cluster have high similarities. The study adopted the Selforganizing map (SOM) clustering algorithm to assess groundwater vulnerability. SOM, an artificial neural network model, is one of the algorithms used to cluster and visualize results through reduction of high-dimensional data [14]. It provides visualized results essential in studies like water quality assessment in hydro-geochemically complex areas, water chemical composition assessments, hydrogeochemical characterization, and water pollution mapping [15,16]. Also used for spatial analysis of computing@computingonline.net www.computingonline.net

Print ISSN 1727-6209 On-line ISSN 2312-5381
International Journal of Computing groundwater quality, SOM defined a classification based on the hydrochemical characteristics of groundwater. SOM algorithm successfully classified and characterized groundwater in terms of quality [17]. When combined with DRASTIC to produce a Hybrid SOM-DRASTIC model, SOM was able to classify groundwater vulnerability utilizing weighted ratings of DRASTIC parameters as input and vulnerability classes as output and using GIS software to develop the vulnerability maps. It proved useful for managers and land use planners by providing a robust alternative tool towards developing vulnerability-based classification and land use planning strategy [18].
The accuracy of the clustering result of SOM is dependent on weight initialization [19,20], thus, proper initialization of cluster centers is important and critical. Since weights in SOM are randomly initialized, the quality of results together with learning speed is greatly affected [21]. To address the issue of clustering in this study, the Nguyen-Widrow initialization algorithm was implemented. Nguyen-Widrow initialization algorithm is a method for initialization of the weights of neural networks to reduce training time. In weight initialization, some small number of random values for weight is assigned for the operation of backpropagation networks [22]. Hence, this study introduces the use of the Nguyen-Widrow initialization algorithm in the SOM algorithm to assess groundwater vulnerability to contamination in Boracay Island, Philippines. The study presents the use of the initialization algorithm in weights initialization for SOM clustering. The dataset of DRASTIC ratings is normalized and trained prior to clustering. A groundwater vulnerability map is developed from the extracted clusters.

STUDY AREA
Boracay is a small island with a total land area of 1,006.64 hectares located between 11°57'-12°00' latitude and 121°56'-121°57' longitude off the northwestern tip of Panay island in the Western Visayas, Philippines. The land mass measures 6.8 kilometers long by 3.3 kilometers at its widest distance, rising to elevations ranging from 50 to 105 meters above mean sea level. The depth to the water table as measured from existing wells in the three administrative units called barangays varies from 0.58 to 4.096 m based on the report of Mines and Geosciences Bureau (MGB) Regional Office VI [23]. The position of Boracay as the Philippines' top tourist destination has placed immense pressure on its groundwater resources owing to overwithdrawal and contamination from human activities. Over the years, tourist influx has increased dramatically and local immigrants have also settled in the island due to economic opportunities from tourism [24] such as hotels, restaurants, and other establishments openings. The increase in population and proliferation of residential and commercial establishments are negatively impacting the quality of the island's groundwater resources, thus the need to evaluate the vulnerability to contamination of its resources.

METHODOLOGY
The SOM algorithm is an unsupervised learning algorithm widely used for clustering large sets of data. SOM is made up of two layers. The input layer is one-dimensional with each data item associated with an n-length vector of elements, while the output layer consists of radial units typically organized in one or two-dimensions [14]. In this study, SOM is consists of 7-input vectors and 4-output neurons. Each input node is connected to the output neurons. The SOM architecture of the study is presented in Fig. 1.
The process flow for the study outlined in Fig. 2 utilized data from available maps of Boracay Island and the thematic map of the DRASTIC parameters developed through GIS. The dataset includes depth to water (D), net recharge (R), aquifer media (A), soil media (S), topography (T), vadose zone impact (I) and hydraulic conductivity (C). Ratings, as defined in the DRASTIC method, were used. The dataset was divided into two parts: 70% of the data used for training, 30 % for clustering.
The data sets were pre-processed using data cleaning normalization schemes. With the normalization process, better clustering result is achieved and the negative effects produced by noise and outliers can be avoided. The data were normalized using the equation (1): where is the raw data,  Clustering of data using the modified SOM algorithm is achieved mainly by two steps: 1. training the data with the initialized parameters; and 2. clustering of data. At the start of the learning process, weights were initialized to small numbers using the Nguyen-Widrow initialization algorithm.
Input vectors are then randomly selected from the given dataset. Winning neuron, also called the Best Matching Unit (BMU), was determined using Euclidean distance. To increase the similarity with the input vector, weights are then adjusted. The process will be repeated until all the vectors are trained. The final values of these weights are then used for clustering.
Clusters were extracted using the SOM algorithm with Nguyen-Widrow for initialized weights. The pseudocode for the modified SOM algorithm is

RESULTS AND DISCUSSION
Initial weights were generated using the Nguyen-Widrow algorithm from which final weights were obtained for SOM clustering. Three groundwater clusters were determined, as shown in Table 1. All DRASTIC parameters play a role in determining vulnerability indices. However, four parameters, -depth to water, net recharge, topography, and hydraulic conductivity -are significantly more influential than the other parameters in assessing groundwater vulnerability [7,25]. Hence, these parameters are used in discussing the resulting clusters using modified SOM algorithm.
Clusters 1, 2, and 3 are assigned to areas with very high vulnerability, high vulnerability, and moderate vulnerability, respectively. From the indices, Cluster 1 is made up of areas with high net recharge, topography, and hydraulic conductivity, all of which contribute to very high vulnerability. A higher rating in topography represents the lowest slope which implies that these low lying areas tend to retain water longer due to its almost uniform slope allowing greater infiltration of recharge and greater potential for contaminant migration. Referencing geological data in areas with very high vulnerability are observed to dominantly contain unconsolidated calcareous sand and silty clay deposits. This soil structure has large pores between them permitting more rapid permeability of recharge and faster contaminant transport.
Cluster 2 includes areas with high net recharge and high topography rating but with moderate hydraulic conductivity. This high vulnerability cluster is observed in areas with coralline limestone, calcareous sandstone, siltstone, shale, and basal conglomerate which is a medium-textured soil and has an intermediate rate of soil permeability.
It was observed that Clusters 1 and 2 are lowland areas with a slope from 0-6%. On the other hand, Cluster 3 areas had a low net recharge and hydraulic conductivity rating but with moderate topography. Cluster 3 is located in hilly areas within the island, with a slope ranging from 6-18%.
Across the three clusters, depth to water table had a rating of 10.0 showing that the island is highly susceptible to contamination from surface pollution. This was confirmed in the groundwater data which shows a water table 0-5 meters deep [23].
The clusters generated using the modified SOM were used to develop the vulnerability map of Boracay Island as shown in Fig. 3. The clustering results were confirmed with the water quality analysis of Boracay Island presented in Table 2. Chloride analysis revealed that groundwater in all barangays contained high concentrations of chlorides in the groundwater compared to the standard of 250mg/L maximum. Groundwater in barangays Manoc-manoc, Yapak, and Balabag contained 420 mg/L, 1,999 mg/L, and 3,129 mg/L, respectively. Results of bacteriological analysis also showed that water is unsafe for drinking based on the elevated number of bacteria present, with values ranging from 2.20 to 16.00 MPN/100 ml where the standard value is 0.00 MPN/100 ml. Meanwhile, coliform test results show that groundwater from Brgy. Balabag has fecal and total coliform values of 16 MPN/100ml while those in Brgy. Yapak and Brgy. Manoc-manoc values excess 16 MPN/100ml. The groundwater samples all exceed the standard of 0 MPN/100 mL and show that the water is unsafe for drinking [23]. The water quality results are consistent with the vulnerability mapping developed using clustering. This shows that using the clustering method for assessing groundwater vulnerability is an effective method for assessment of groundwater vulnerability to contamination.
To verify the validity of the modified SOM method used in groundwater vulnerability assessment, T-test was used to determine if data results for the cluster means and DRASTIC means are significantly different from each other. The Pvalues calculated for each cluster show that the results are not significantly different for each method implying that groundwater vulnerability assessment using modified SOM is comparable with the traditional DRASTIC method. Table 3 shows the result of the T-test.

CONCLUSION
Modified SOM clustering is an applicable method for groundwater contamination vulnerability assessment. Further, the use of modified SOM algorithm methodology allows grouping of datasets into clusters that represent the level of vulnerability to contamination of the groundwater. The vulnerability map developed out of the clustered data clearly revealed that the groundwater resources in Boracay Island were vulnerable to contamination as confirmed by the water quality analysis. Finally, the output of the clustering method is comparable with the traditional DRASTIC method and is an effective method for assessment of groundwater vulnerability to contamination.