Ana Elizabeth Marín Celestino, Diego Armando Martínez Cruz, Elena María Otazo Sánchez, Francisco Gavi Reyes, David Vásquez Soto. Groundwater Quality Assessment: An Improved Approach to K-Means Clustering, Principal Component Analysis and Spatial Analysis: A Case Study. Water 2018, 10(4), 437
K-means clustering and principal component analysis (PCA) are widely used in water quality analysis and management. Nevertheless, numerous studies have pointed out that K-means with the squared Euclidean distance is not suitable for high-dimensional datasets. We evaluate a methodology (K-means based on PCA) for water quality evaluation. It is based on the PCA method to reduce the dataset from high dimensional to low for the improvement of K-means clustering. For this, a large dataset of 28 hydrogeochemical variables and 582 wells in the coastal aquifer are classified with K-means clustering for high dimensional and K-means clustering based on PCA. The proposed method achieved increased quality cluster cohesion according to the average Silhouette index. It ranged from 0.13 for high dimensional k-means clustering to 5.94 for K-means based on PCA and the practical spatial geographic information systems (GIS) evaluation of clustering indicates more quality results for K-means clustering based on PCA. K-means based on PCA identified three hydrogeochemical classes and their sources. High salinity was attributed to seawater intrusion and the mineralization process, high levels of heavy metals related to domestic-industrial wastewater discharge and low heavy metals concentrations were associated with industrial wastewater punctual discharges. This approach allowed the demarcation of natural and anthropogenic variation sources in the aquifer and provided greater certainty and accuracy to the data classification.