Lhasa Limited shared knowledge shared progress

Building models of bacterial mutagenicity from biased training data.pdf

pdf fileBarber CG; Hanser T; Kruhlak NL; Stavitskaya L; Vessey J; Werner S;

Models for predicting bacterial mutagenicity are now widely used by pharmaceutical sponsors to assess the genotoxic potential of impurities in pharmaceutical products. Models built using machine learning (ML) techniques are commonly trained using balanced datasets where, in this case, equal numbers of compounds are positive and negative for mutagenicity. Building accurate models using ML from biased training data – unequal numbers of positive and negative compounds – can be a challenge. Sarah Nexus is a program for predicting bacterial mutagenicity that uses a self-organising hierarchical network (SOHN). Hitherto SOHN models have been built using data that have little bias; however, if models are built using biased training data, then there is a need to ensure that the model learns sufficiently well about the minor class. If the dataset is biased towards negative compounds, this would result in a model for mutagenicity with depressed sensitivity.

Presented by Chris Barber at SOT, San Diego, USA; 22nd - 26th March 2015.


Related Publications

© 2019 Lhasa Limited | Registered office: Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK Tel: +44 (0)113 394 6020
VAT number 396 8737 77 | Lhasa Limited is registered as a charity (290866)| Company Registration Number 01765239 (England and Wales).

Thanks to QuestionPro's generosity, we now have survey software that powers our data intelligence.