Predicting and Classifying Contaminants in Mexican Water Bodies: A Machine Learning Approach Chapter in Scopus uri icon

abstract

  • Water contamination presents a significant issue in Mexico, with a variety of pollutants affecting both groundwater and surface water, including agricultural runoff. This study applied machine learning (ML) techniques to predict and classify contaminant presence in Mexican water bodies using publicly available datasets. The objectives were to collect and analyze water quality data, develop predictive models, implement ML algorithms for classification and clustering, and evaluate model performance. The methodology involved data cleaning, integration, and formatting of extensive Mexican water quality datasets, followed by clustering and classification analyses. Using a binary matrix of water quality features, clustering revealed distinct contaminant patterns with a moderate silhouette score, indicating reasonable cluster cohesion. Dendrograms illustrated relationships between contaminants, identifying related groups to guide monitoring efforts. A decision tree classifier assessed the significance of features in predicting water quality and found ¿Fecal coliforms¿ to be the most crucial, achieving an accuracy of 99.99%. Additionally, Random Forest, Support Vector Machine, and AdaBoost models predicted contaminant presence based on demographic and environmental variables. Random Forest performed best but with moderate overall accuracy, highlighting the task¿s complexity. While machine learning models provided valuable insights into water quality and contaminant patterns, the findings suggest incorporating additional variables could enhance predictive accuracy. This research contributes to water quality analysis knowledge in Mexico, laying the foundation for future studies and policymaking to improve water quality and safeguard aquatic ecosystems. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

publication date

  • January 1, 2025