Lhasa Limited shared knowledge shared progress

Improving Chemical Space Coverage of an In Silico Prediction System by Targeted Inclusion of Fragments Absent from the Training Set

pdf fileFoster R; Cayley A; Kocks G;


Sarah Nexus is a statistical-based in silico system, providing accurate mutagenicity predictions that can be used in submissions under ICH M7 guidelines. To obtain a prediction, a query compound is fragmented and compared against a network of fragment-based hypotheses generated from a curated training set of ~10,000 compounds with associated Ames mutagenicity data. A prediction is then made by matching fragments from the query compound to those in the network. Alternatively, a query returns an "Outside domain" result if any of the fragments found in the query are not present in the training set.

To target improvements in the coverage of the training set, a proprietary dataset by the Vitic Intermediates group was fragmented and compared to the fragments generated from the training set. A set of fragments that were present in the proprietary compounds but absent in the training set was obtained. The respective compounds (16/1068), and their mutagenicity data, were then added to the training set and the model rebuilt. This targeted data donation improved the predictive performance of Sarah Nexus against the Vitic Intermediates dataset and prevented a number of "Outside domain" predictions. It was also shown to reduce "Outside domain" predictions from other proprietary datasets.

The method presented identifies those compounds within proprietary datasets that will be most beneficial to supplement the Sarah Nexus training set. The specific targeting of absent fragments improves the chemical space coverage of the model and accuracy of predictions across multiple datasets.


This poster was presented by Dr Rob Foster at the 2017 EUROTOX Meeting.





© 2018 Lhasa Limited | Registered office: Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK Tel: +44 (0)113 394 6020
VAT number 396 8737 77 | Lhasa Limited is registered as a charity (290866)| Company Registration Number 01765239 (England and Wales).

Thanks to QuestionPro's generosity, we now have survey software that powers our data intelligence.