Statistical-based software for the prediction of mutagenicity
- Sarah Nexus is a statistical software tool that gives you accurate mutagenicity predictions.
- Sarah Nexus can be used as part of an ICH M7 workflow.
- Early, accurate in silico toxicity testing using Sarah Nexus is the quick, inexpensive way to identify potentially toxic chemicals, aiding experts in rejecting unsuitable drug candidates.
The ICH M7 guideline1 proposes that a computational toxicology assessment should be performed using two complementary (Q)SAR methodologies that predict the outcome of a bacterial mutagenicity assay. Specifically, one methodology should be expert rule-based and the second methodology should be statistical-based.
(Q)SAR models utilising these prediction methodologies should also follow the validation principals set forth by the Organisation for Economic Co-operation and Development (OECD)2.
Sarah Nexus and Derek Nexus (the Lhasa Limited expert toxicity prediction tool), in combination, can provide you with the means to meet the computational toxicological assessment requirements of the ICH M7 guidelines from one intuitive interface. To see a list of our current member organisation, click here.
Using the M7 prediction feature, you can assess your potential genotoxic impurities quickly and easily and submit those results to regulators reducing the need for time consuming and expensive in vitro tests.
Both Derek Nexus and Sarah Nexus have been designed independently to meet the OECD validation principles, and both systems can be run from within the same Nexus interface to help simplify your workflow. Find out more about Lhasa's ICH M7 solutions here.
The models provide completely independent predictions, with the option to consolidate into a single report.
- Sarah Nexus uses a unique, machine-learning methodology.
- The Sarah training set contains 9882* individual structures which have been standardised and fragmented in order to build the Sarah model.
- The query structure is fragmented and the fragments are refined and ranked depending on the similarity of the query to the training set of compounds.
- Sarah gives an overall transparent prediction supported by a level of confidence and relevant examples.
Sarah Nexus uses a unique, hierarchical, machine-learning methodology to build a model for Ames mutagenicity.
Query structures which are imported into Sarah are standardised and then fragmented. These fragments are reviewed for activity versus inactivity. Sarah further refines the fragments by considering the similarity of the query structure to a training set of compounds.
The structure standardisation in Nexus 2.2 uses a set of transform rules including, but not limited to, aromaticity perception, transforming tautomers and resonance forms, and removing salts. The aim of the standardisation is to interpret structures more accurately, in order to optimise predictions and minimise false signal strengths.
The fragments are arranged into a network of hypotheses (or nodes) and the fragments which are perceived to be of a greater value contribute to an overall prediction of toxicity. Fragments may be of various sizes and can even overlap, ensuring greater accuracy in predictions. Figures 1-3 highlight a step-by-step guide to the fragmentation process.
The overall prediction is comprised of the following items:
- A conclusion about the Ames mutagenicity of a structure, and a confidence rating in the prediction.
- The fragments on which the prediction is based and relevant examples from the training set, ordered by structural similarity to the query.
- Additional compounds which are similar to the query but were not used in the prediction.
- Strain information for prediction results and additional information compounds.
- Contribution information for training set example compounds and additional compounds. This includes references, if available.
This high level of transparent information facilitates the expert review process.
The advantages of this methodology include:
- The ability to generate fragments that are contained within the training set molecules, thereby avoiding the bias of models built using pre-determined fragments, which may not reflect the training data.
- The ability to build a hierarchy of models - some more global and some more local, giving users the best of both worlds.
- A single global model while having broad coverage, will not be adequately sensitive to local variations (activity cliffs).
- Local models whilst more accurate for fragments that fall inside their chemical space will be narrower in their scope (applicability domain).
- Sarah Nexus contains both and will select the most appropriate model for each fragment.
- Sarah Nexus looks at the information available for each fragment and uses scientifically valid rules to combine these. The relative importance of the contribution of each local model is provided, along with the data that underlies it, thereby providing a very transparent prediction. Furthermore, Sarah Nexus gives a measure of confidence for each prediction it makes. Lhasa believes that this uniquely gives the information that experts need to be able to understand and judge the prediction.
Sarah Nexus provides a confidence score for each prediction along with direct access to supporting data to aid expert analysis. The confidence score is based on each fragment’s contribution to the overall prediction and the weight placed upon each fragment. Lhasa’s analysis shows that the confidence strongly correlates to accuracy (figure 4 - for the full graphic which explains confidence in Sarah, click here).
For more information on confidence in Sarah Nexus, please view Lhasa’s video on Model Building and Interpreting Confidence, presented by Account Manager Dr Dave Yeo.
When considering the interpretation of a prediction, expert review is very important. This is why Lhasa has worked hard to ensure Sarah Nexus facilitates this by providing sufficient information to support an expert analysis.
Sarah Nexus represents a different approach to ‘black box’ statistical models, where the user can have an understanding of the model’s accuracy against a test set, but can’t assess an individual prediction.
The structural explanation for the prediction provided by Sarah Nexus is conveyed by highlighting those fragment(s) that Sarah considers meaningful. Derek Nexus also highlights fragments of the query compound in order to illustrate the matches to patterns used to hold knowledge within Derek.
The reason that both models highlight structural fragments is the same: to draw the user’s attention to parts of the query compound which influenced the prediction. However, Derek relies on patterns drawn by experts to find these fragments, whereas Sarah identifies those with statistical significance.
Sarah Nexus, like Derek Nexus, presents all the information an expert requires in order to come to an informed decision. Sarah incorporates additional data such as strain information, similar compounds which were not included in the model, and references. This additional information is available for single predictions, batch predictions, and batch validations.
There is detailed strain information available for the compounds in the Sarah Nexus training set. Including strain data in Sarah Nexus facilitates decision making and helps to reduce uncertainty.
The strain data is displayed alongside Sarah predictions, without changing the prediction, in order to provide context for the expert (figure 5). The strain information available for each example compound is displayed, showing whether the data is positive, negative or equivocal/conflicted. This data is then used to create a strain profile for the hypothesis, using a heat map to show which strains have the highest contribution. Strain data is also available for compounds in the additional information tab.
This strain data can be particularly important in the following cases:
- When a positive hypothesis is overturned based on nearest neighbour information
- When results contradict a positive result from Derek Nexus
Figure 5: An example of the strain information which is shown in Sarah Nexus. A strain profile for the hypothesis is shown above, based on the strain profiles of the example compounds.
The additional information panel contains two types of compounds:
- Those that were rejected from the model, but have a similarity of 30% or greater to the query compound.
- Those that have been used to train the model but are not present in any of the returned hypotheses for this prediction, and have a similarity of 30% or greater to the query compound.
A compound may be rejected from a model for a variety of reasons, including: containing data which is not reliable enough to be used in model building, or containing conflicting results for same test protocol from different sources. Lhasa has included this additional information to provide the expert with as much transparent information as possible to facilitate the decision making process.
The Example compound panel enables you to see contribution information for training set example compounds and additional information compounds.
The information that you can see for each compound includes:
- The original structure, so that you can compare it with the Sarah standardised structure.
- The source of the contribution, if available.
- The original experimental result for this structure.
- References, if available.
- If applicable, the reason this structure was rejected from the training set.