- Publisher:Lhasa Limited
- Publication Date:
- Publication Type:Poster
- Scientific Area:
- Industry Type:
Improving Chemical Space Coverage of an In Silico Prediction System by Targeted Inclusion of Fragments Absent from the Training Set
Sarah Nexus is a statistical-based in silico system, providing accurate mutagenicity predictions that can be used in submissions under ICH M7 guidelines. To obtain a prediction, a query compound is fragmented and compared against a network of fragment-based hypotheses generated from a curated training set of ~10,000 compounds with associated Ames mutagenicity data. A prediction is then made by matching fragments from the query compound to those in the network. Alternatively, a query returns an "Outside domain" result if any of the fragments found in the query are not present in the training set.
To target improvements in the coverage of the training set, a proprietary dataset by the Vitic Intermediates group was fragmented and compared to the fragments generated from the training set. A set of fragments that were present in the proprietary compounds but absent in the training set was obtained. The respective compounds (16/1068), and their mutagenicity data, were then added to the training set and the model rebuilt. This targeted data donation improved the predictive performance of Sarah Nexus against the Vitic Intermediates dataset and prevented a number of "Outside domain" predictions. It was also shown to reduce "Outside domain" predictions from other proprietary datasets.
The method presented identifies those compounds within proprietary datasets that will be most beneficial to supplement the Sarah Nexus training set. The specific targeting of absent fragments improves the chemical space coverage of the model and accuracy of predictions across multiple datasets.
This poster was presented by Dr Rob Foster at the 2017 EUROTOX Meeting.