Is expert review an essential step when assessing mutagenic potential of drug impurities?

When assessing the mutagenic potential of drug impurities in silico, the ICH M7 guideline states that two complementary (Q)SAR prediction methodologies should be applied. A provision for the application of ‘expert knowledge’ is also mentioned within the guideline. However, as (Q)SAR models continue to be updated and their predictive performance and structural coverage improves, this raises the question of whether expert review is still necessary…

It can be said that expert review plays a lengthy role in an ICH M7 prediction workflow. To increase prediction confidence and resolve conflicting calls, expert knowledge is applied and works by considering structural analogues and mechanisms of activity. Expert knowledge is particularly valuable in cases where predictions are unclear (e.g., equivocal) or a structure is considered outside of the model’s applicability domain – the area of chemical space in which a prediction can be considered reliable.¹ In such cases, expert knowledge can be effective in filling data gaps and providing evidence to support a clear positive or negative prediction.

Recently, an exciting paper was published on this topic in Regulatory Toxicology and Pharmacology, titled ‘Assessing the impact of expert knowledge on ICH M7 (Q)SAR predictions. Is expert review still needed?’. This paper investigates whether expert review of (Q)SAR predictions is still necessary through the evaluation of 1002 unique drug impurities. FDA/CDER generate predictions for bacterial mutagenicity through combination of results from 3 (Q)SAR models including Derek Nexus, our expert-rule based system for fast and accurate toxicity predictions, including mutagenicity.

The application of expert knowledge was found to overturn 26% of default predictions. Most notably, 91% of all equivocal predictions were resolved and 24% of predictions were downgraded from positive, compared to 4.6% being upgraded from negative. This is consistent with the use of 2 (Q)SAR systems to maximise sensitivity at the expense of false positives.

The predictions were examined based on:

Percentage and type of overturns

9 reasons were identified for overturning a default prediction.
Most common reasons were “alert in different chemical environment in supporting training set structures compared to the impurity” and “supporting training set structures contain other structural alerts not relevant to the impurity”.

Model confidence

Prediction confidence metrics provided by the model can reliably guide the expert about the likelihood that the prediction will change following review.

Known classes of structural alerts

Literature-based structure alert classes have high false positive rates as they are broad and do not consider mitigating features.²

External validation

External validation exercise was conducted against 36 compounds from the Lhasa Vitic Intermediates database with known Ames data.

In the paper, the FDA use Derek Nexus as the expert rule-based system alongside two competitor statistical models. Although the FDA use 2 statistical systems to increase the chance that at least 1 gives a valid prediction, ICH M7 only requires the use of a single expert rule-based & statistical-based system; for this purpose you may use Derek and Sarah. Although you may use tools from different suppliers, the benefit of using Derek & Sarah for accurate predictions of Ames mutagenicity under the ICH M7 guideline, is in the interconnectivity between the two systems. As expert review is a beneficial but time-consuming practise, we have implemented features into Nexus to increase the efficiency of expert review and facilitate a semi-automated expert review process and ICH M7 classification.

Read the publication in full to find out how expert review impacts false positive predictions and (Q)SAR review efficiency…

Are you aware of the expert review requirement but do not have access to the functionality? Get in touch with us to find out more!

If you are interested in how Derek Nexus performed and how the this relates to Lhasa’s ICH M7 prediction tool, keep reading…

How does this relate to Derek Nexus?

Model confidence

Derek predictions are presented with a reasoning level that provides a likelihood of the prediction being correct and guiding where to focus reviews. This paper demonstrates the high concordance with predictions at the “plausible” level being positive & “inactive” being negative. However, predictions at the level of “equivocal” should be considered positive predictions requiring review, as expert review of the 24 predictions in this paper results in 10 positive, 7 negative and 7 equivocal following review.
Derek provides negative predictions with additional information presented where the query contains a misclassified or unclassified feature, which notify the user that Ames reference test set may contain conflicting data or be missing supporting data and require review. This analysis by the FDA, agrees with our analysis in ‘it’s difficult, but important, to make negative predictions’ that all negative predictions can reliably be called negative.

Known classes of structural alerts

Attached to every alert are detailed comments discussing the understanding of the hazard presented by the chemical class. These comments provide information relevant to the review such as mitigating features (e.g. the effect of electronic withdrawing substituents on mutagenicity of aromatic amines) and the solvent-dependent activity of acid halides.

External validation

Validation comments are provided for each alert, showing the performance of the alert against several proprietary datasets.

ICH M7 predictions – Our single-click tool that simultaneously runs Derek and Sarah, providing an overall in silico prediction & ICH M7 classification

Percentage and type of overturns

ICH M7 predictions combine Derek and Sarah Nexus predictions, presenting them alongside automated expert review arguments and highlighting standardised arguments likely to be used to resolve predictions to a single conclusion.³
The relationship between Derek alerts and Sarah training set examples is considered in the automated expert review feature, presenting information related to overturn scenarios such as where the training set examples used to provide a positive prediction in Sarah are due to features not present in the query and the Derek negative prediction should be used instead.

Model confidence

ICH M7 predictions present the associated Derek and Sarah predictions with a graphical representation of the reasoning level and confidence, respectively.

Known classes of structural alerts

The automated expert review arguments include information pertaining to the hazard presented by a chemical class generally or a specifically for the query. For example, 1 argument highlights the potential solvent-dependent activity for all acid halides, whereas another argument will highlight when a compound is positive in Sarah but the toxicophore has been excluded from the corresponding Derek alert for reasons discussed within that alert.
A notification appears if the query belongs to a cohort of concern; this requires a compound-specific risk assessment as per ICH M7 guideline.