Machine Learning approach to predicting PFAS contamination in Community Water Systems (CWSs)
Per- and polyfluoroalkyl substances or PFAS are an emergent class of contaminant molecules harmful to human health. PFAS testing is currently conducted on a limited, statewide basis. While millions of Americans are potentially exposed to PFAS compounds through their drinking water, only 17 states have enforceable standards, guidance levels, or require notification of presence in drinking water. To predict PFAS presence in community water systems (CWSs), we train a series of machine learning models based on observations from existing testing performed by the Kentucky Department of Environmental Protection. These testing data represent eight unique PFAS compounds, with an effective minimum detection limit (MDL) of 1.32 ppt. These contaminant testing data were linked to environmental properties that could potentially affect PFAS transport in the environment. This dataset was then used to train a series of machine learning models to predict the presence or absence of PFAS in each CWS. To the best of our knowledge, this is the first application of machine learning to predict the presence of PFAS in CWSs. The best performing model was extreme gradient boosting, which had an accuracy range of .74-.81 and an AUROC range of .74-.94. The most important predictors of contamination were soil pH, precipitation, proximity to potential industrial polluters, water source, and temperature. Model application to CWSs across Kentucky and Tennessee revealed that 2.4 million and 3.5 million residents are potentially exposed to PFAS at the 1.32 ppt threshold.
Research Assistants: Aakash Manapat
- Manapat, Aakash. Using a Deep Learning Model to Predict PFAS Contamination in Kentucky Community Water Systems (poster: presenter). Vanderbilt University Undergraduate Research Fair, Fall 2021.