The potential for innocent fragments to be associated with a positive outcome is widely recognised as a risk for statistical approaches and the FDA presenters identified this as an area where expert knowledge can be used to challenge and over-rule such a prediction.
Since this problem is well known, when designing Sarah, we included a recursive partitioning step into the model building. In this process, the most strongly mutagenic fragment in the training dataset is identified and positive compounds containing that fragment are not used to associate activity to other fragments. This is done starting with the most activating group and the supporting compounds are then removed before the process is repeated until all the fragments have been considered. This approach significantly reduces the potential for a model to mistakenly ‘blame’ a fragment for activity. However, ultimately the training dataset determines how successful this approach can be – for example, if two fragments are always seen together and only in positive compounds, then the model can never learn whether one or both of these fragments is the cause of activity. No statistical model building approach can resolve such a problem, although an expert system may be able to make a sound judgment by applying additional knowledge..
If a statistical model building approach does not include a recursive partitioning step, you are correct, one possible solution would be to manually remove examples from the training set, rebuild the model and regenerate a prediction. We recently published a paper describing a similar approach to support in the interpretation of ‘black box’ statistical models.
Since Sarah significantly reduces the potential for false positive fragments to be learnt, shows which parts of the molecule are the cause of predicted activity and provides the training examples for manually inspection, we have not implemented the functionality to remove training examples, rebuilding and then rerunning the model. Doing so would mean that a validated model was no longer being used and depending upon which training compounds were removed, different users could get different answers. A much preferred approach (and the one recommended by the FDA presenters) is to present the model prediction, the supporting compounds and then present the expert analysis of why some of those training compounds have misled the model and thus why the prediction should be over-ruled. One way to approach this is to run the training compounds from Sarah through Derek and if they are found positive for a different alert (reason) to that for which Sarah is ascribing activity, then it would be reasonable to discount those examples and thence the prediction.