John Wiley & Sons, Ltd.

PLS and KNN algorithms for improved 3D‐QSDAR consensus modeling of acute toxicity

A diverse set of 154 chemicals that included FDA‐regulated compounds tested for their aquatic toxicity in Daphnia magna were modeled by a three‐dimensional quantitative spectral data‐activity relationship (3D‐QSDAR). Two distinct algorithms, namely partial least squares (PLS) and Tanimoto similarity based k‐nearest neighbors (KNN) were used to process bin occupancy descriptor matrices obtained after tessellation of the 3D‐QSDAR space into regularly sized bins. The performance of models utilizing bins ranging in size from 2 ppm x 2 ppm x 0.5 Å to 20 ppm x 20 ppm x 2.5 Å was explored. Rigorous quality control (QC) criteria were imposed: i) one hundred randomized 20% hold‐out test sets were generated and the average R2test of the respective models was used as a measure of their performance and ii) a Y‐scrambling procedure was used to identify chance correlations. A consensus between the best performing composite PLS model using 0.5 Å x 14 ppm x 14 ppm bins and 10 latent variables (average R2test = 0.770) and the best composite KNN model using 0.5 Å x 8 ppm x 8 ppm and 2 neighbors (average R2test = 0.801) offered an improvement of about 7.5% (R2test consensus= 0.845). Projection of the most frequently occurring bins on the standard coordinate space indicated that the presence of primary or secondary amino group ‐ substituted aromatic systems ‐ would result in an increased toxic effect in Daphnia. The presence of a second aromatic ring with highly electronegative substituents 5 to 7 Å apart from the first ring would lead to a further increase in toxicity. Environ Toxicol Chem © 2014 SETAC

Customer comments

No comments were found for PLS and KNN algorithms for improved 3D‐QSDAR consensus modeling of acute toxicity. Be the first to comment!