Agricultural nonpoint source pollution has been identified as one of the leading causes of surface water quality impairment in the United States. Such an impact is important, particularly in predominantly agricultural areas, where application of agricultural fertilizers often results in excessive nitrate levels in streams and rivers. When nitrate concentration in a public water supply reaches or exceeds drinking water standards, costly measures such as well closure or water treatment have to be considered. Thus, having accurate nitrate-N predictions is critical in making correct and timely management decisions. This study applied a set of data mining tools to predict weekly nitrate-N concentrations at a gauging station on the Sangamon River near Decatur, Illinois. The data mining tools used in this study included artificial neural networks, evolutionary polynomial regression and the naive Bayes model. The results were compared using seven forecast measures. In general, all models performed reasonably well, but not all achieved best scores in each of the measures, suggesting that a multi-tool approach is needed. In addition to improving forecast accuracy compared with previous studies, the tools described in this study demonstrated potential for application in error analysis, input selection and ranking of explanatory variables, thereby designing cost-effective monitoring networks.
Keywords: artificial neural networks, drinking water, forecasting, genetic algorithms, naive Bayes model, nitrate-N