Are you aware of the 5 OECD principles, which validate (Q)SAR models for their use in regulatory assessment of chemical safety?
The Joint Research Centre (JRC) have established a harmonised template known as the QSAR Model Reporting Format (QMRF), which can be used to summarise key information on (Q)SAR models, including results of any validation studies. At Lhasa Limited, we produce QMRFs for Derek Nexus, our expert, knowledge-based toxicity prediction tool and Sarah Nexus, our statistical-based mutagenicity prediction tool. QMRFs contain the 5 OECD principles and are used primarily within the life sciences and chemical industries to supply to regulators as part of hazard/risk assessments of products/impurities, they confirm that the Q(SAR) model is acceptable from a scientific and regulatory perspective.
Outlined below are the 5 OECD principles, with explanations on how Derek and Sarah Nexus meet them:
1. A defined endpoint
Derek Nexus makes qualitative predictions for and against toxicity and contains alerts for multiple endpoints, including mutagenicity, chromosome damage, carcinogenicity, hepatotoxicity, teratogenicity and skin irritation.
Sarah Nexus makes an overall statistical prediction for mutagenicity.
2. An unambiguous algorithm
Derek uses expert derived structural alerts (2D SARs), physicochemical properties and associated reasoning. Following alert evaluation, Derek will make a prediction of skin sensitisation potency for alerting query compounds, where possible [1]. This is based on the activity (EC3 values) for nearest neighbours derived from a local lymph node assay data set. For the mutagenicity and skin sensitisation endpoints, Derek evaluates whether non-alerting query compounds contain any features that are either (i) also present in non-alerting mutagens in a large Ames test reference set/non-alerting skin sensitisers in a large Skin Sensitisation reference set (misclassified features) or (ii) not present in a large Skin Sensitisation reference set (unclassified features) [2].
Sarah uses a self-organising hypothesis network (SOHN) to generate structure fragment-based hypotheses which are used to make predictions. More detail on this approach can be found in Hanser et al 2014 [3].
3. A defined domain of applicability
The scope of the structure-activity relationships described in Derek are defined by the developer to be the applicability domain for the model. Therefore, if a chemical activates an alert describing a structure-activity for toxicity it is within the applicability domain. For the mutagenicity and skin sensitisation endpoints, if a compound does not activate an alert or reasoning rule, Derek makes a negative prediction, which gives the user more confidence in no alert. Where endpoints do not provide a negative prediction, a ‘nothing to report’ message appears.
The applicability domain of Sarah is defined by comparing the structural fragments present in the training set with those present in the query compound. If all the atoms in the query compound are covered by structural fragments found in the Sarah training set, then the query compound is considered within the applicability domain of the model. If one or more of the atoms in the query structure is not represented by a fragment in the training set, then the query structure is out of the applicability domain of the model. In this case, the fragments from the query structure associated with the out of domain atoms are highlighted on the query structure and the possible in-domain fragments are also shown. This allows the user to carry out an expert interpretation of the result to assess the out of domain fragment using knowledge not contained in the model which may in turn allow them to resolve this out of domain prediction into a positive or negative overall call.
4. Appropriate measures of goodness-of–fit, robustness and predictivity
Internal and external validation are carried out on each Derek knowledge base release using proprietary and public data. The software reports the number of positive and negative compounds from the validation data sets that activate each alert and calculates positive predictivity using this data. Non-proprietary elements of the training set are available through the references, and illustrated by the examples, within Derek.
External validation of each released Sarah model is conducted using proprietary data sets.
When reviewing a Sarah prediction, the relevance of the training set examples can be reviewed in relation to a query compound. The relevance can be increased by adding supplementary inhouse data to a model in Sarah.
5. A mechanistic interpretation, if possible
In Derek, all alerts describing structure-activity relationships contain information in the comments associated with an alert and where available include information on both the mechanism of action and biological target.
In Sarah, the presence of structural features in a compound that can be directly reactive or produce reactive species capable of reacting with DNA may lead to mutations and positive results in reverse mutation assays. These structural features may be programmed as the structure fragment-based hypotheses which are used to make predictions and model toxicity.
If you work within the life sciences or chemical industry and are interested in the Q(SAR) models mentioned within the article, please get in touch with our Applied Sciences Team to arrange a free trial or demonstration.
Find current QMRFs here, for historical versions of Derek and Sarah Nexus, QMRFs can be located within the Nexus help centre.
References
[1] Canipa SJ, Chilton ML, Hemingway R, Macmillan DS, Myden A, Plante JP, Tennant RE, Vessey JD, Steger-Hartmann T, Gould J, Hillegass J, Etter S, Smith BPC, White A, Sterchele P, De Smedt A, O’Brien D, Parakhia R (2017). A quantitative in silico model for predicting skin sensitization using a nearest neighbours’ approach within expert-derived structure-activity alert spaces. Journal of Applied Toxicology 37, 985-995.
[2] Chilton ML, Macmillan DS, Steger-Hartmann T, Hillegass J, Bellion P, Vuorinen A, Etter S, Smith BPC, White A, Sterchele P, De Smedt A, Glogovac M, Glowienke S, O’Brien D, Parakhia R (2018). Making reliable negative predictions of human skin sensitisation using an in-silico fragmentation approach. Regulatory Toxicology and Pharmacology 95, 227-235.
[3] Hanser T, Barber C, Rosser E, Vessey JD, Webb SJ & Werner S (2014). Self-organising hypothesis networks: a new approach for representing and structuring SAR knowledge. Journal of Cheminformatics 6:21