Lhasa Limited shared knowledge shared progress

Related pages

Related documents


This FAQs section details frequently asked questions about the ICH M7 guidelines. For FAQs on Lhasa products, please refer to the relevant product area.
Use the form below to search the ICH M7 for FAQs containing specific words or combinations of words.

Search for:
It was mentioned by an FDA representative that a training set may contain examples that have an additional non-relevant alert that makes them positive in the Ames-test. Is it possible to remove these and run a second prediction within Sarah Nexus? Moreover, is it possible that Sarah Nexus removes the most obvious non-relevant alerts? I ask, because if there are for instance 5 training set examples with non-relevant alerts (and thus still 5 relevant examples), I do not know what the impact will be on the prediction if I can ignore those 5 non-relevant examples.

The potential for innocent fragments to be associated with a positive outcome is widely recognised as a risk for statistical approaches and the FDA presenters identified this as an area where expert knowledge can be used to challenge and over-rule such a prediction. 

Since this problem is well known, when designing Sarah, we included a recursive partitioning step into the model building.  In this process, the most strongly mutagenic fragment in the training dataset is identified and positive compounds containing that fragment are not used to associate activity to other fragments. This is done starting with the most activating group and the supporting compounds are then removed before the process is repeated until all the fragments have been considered.  This approach significantly reduces the potential for a model to mistakenly ‘blame’ a fragment for activity.   However, ultimately the training dataset determines how successful this approach can be – for example, if two fragments are always seen together and only in positive compounds, then the model can never learn whether one or both of these fragments is the cause of activity.   No statistical model building approach can resolve such a problem, although an expert system may be able to make a sound judgment by applying additional knowledge..

If a statistical model building approach does not include a recursive partitioning step, you are correct, one possible solution would be to manually remove examples from the training set, rebuild the model and regenerate a prediction.  We recently published a paper describing a similar approach to support in the interpretation of ‘black box’ statistical models. 

Since Sarah significantly reduces the potential for false positive fragments to be learnt, shows which parts of the molecule are the cause of predicted activity and provides the training examples for manually inspection, we have not implemented the functionality to remove training examples, rebuilding and then rerunning the model.  Doing so would mean that a validated model was no longer being used and depending upon which training compounds were removed, different users could get different answers.  A much preferred approach (and the one recommended by the FDA presenters) is to present the model prediction, the supporting compounds and then present the expert analysis of why some of those training compounds have misled the model and thus why the prediction should be over-ruled.  One way to approach this is to run the training compounds from Sarah through Derek and if they are found positive for a different alert (reason) to that for which Sarah is ascribing activity, then it would be reasonable to discount those examples and thence the prediction.

© 2018 Lhasa Limited | Registered office: Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK Tel: +44 (0)113 394 6020
VAT number 396 8737 77 | Lhasa Limited is registered as a charity (290866)| Company Registration Number 01765239 (England and Wales).

Thanks to QuestionPro's generosity, we now have survey software that powers our data intelligence.