A promising new approach for eco-environmental modelling, such as algal growth prediction, is the data-driven modeling using machine learning techniques: an artificial neural network (ANN) being a typical method. Another method growing in popularity, based on the M5 model tree (MT) algorithm, is the use of piecewise linear regression models at the leaf nodes of the tree. M5 MTs using partial least-squares regression (PLSR) proposed in this paper were tested on a particular dataset and then compared to M5 MTs, MLF- and RBF-ANN and k nearest neighbours (kNN). With the dataset partitioned to periods of algal growth and no growth, M5 MTs using PLSR showed better results for algal growth prediction in the reservoir than using the annual dataset and other algorithms. This gives the idea that the M5-PLSR MTs, in spite of the lack of data, more effectively seeks latent vectors between the closely correlated multivariate dataset partitioned using clustering techniques. M5-PLSR MTs is a promising approach when there is a shortage of data required to build a more transparent learning process model, and a combination with clustering is recommended.
Keywords: ANN, data-driven modelling (DDM), kNN, LSER, M5 MTs, PLSR