The discovery of carcinogenic nitrosamine impurities above the safe limits in pharmaceuticals has led to an urgent need to develop methods for extending structure–activity relationship (SAR) analyses from relatively limited datasets, while the level of confidence required in that SAR indicates that there is significant value in investigating the effect of individual substructural features in a statistically robust manner. This is a challenging exercise to perform on a small dataset, since in practice, compounds contain a mixture of different features, which may confound both expert SAR and statistical quantitative structure–activity relationship (QSAR) methods. Isolating the effects of a single structural feature is made difficult due to the confounding effects of other functionality as well as issues relating to determining statistical significance in cases of concurrent statistical tests of a large number of potential variables with a small dataset; a naïve QSAR model does not predict any features to be significant after correction for multiple testing. We propose a variation on Bayesian multiple linear regression to estimate the effects of each feature simultaneously yet independently, taking into account the combinations of features present in the dataset and reducing the impact of multiple testing, showing that some features have a statistically significant impact. This method can be used to provide statistically robust validation of expert SAR approaches to the differences in potency between different structural groupings of nitrosamines. Structural features that lead to the highest and lowest carcinogenic potency can be isolated using this method, and novel nitrosamine compounds can be assigned into potency categories with high accuracy.
What Makes a Potent Nitrosamine? Statistical Validation of Expert-Derived Structure–Activity Relationships
Thomas RF; Tennant RE; Oliveira A; Ponting DJ;