Application of conformal prediction in a more formal definition of applicability domain
Defining the applicability domain (AD) is the art of quantifying the uncertainty or confidence of predictions produced by classification and regression techniques in the field of machine learning. This quantification is of great importance when assessing the potential liability of a chemical to cause toxic/adverse effect(s) and promoting the use of in silico predictions. Hence it requires solutions so that the in silico model user is confident and has the possibility of setting up acceptable thresholds in accordance with one's use case.
Domain of applicability has many aspects that need to be considered. They include interpolation of the training set, the density and the quality of information of the training set around a query compound as well as the distance of the query compound to the decision boundary of an in silico model. Attempting to merge these aspects into a single metric is highly complex and may results in a "black box" framework where some aspects of AD may not be considered. We present a conceptual framework in which the AD is broken down into 3 steps: applicability, reliability and decidability domains, making the quantification of AD stepwise, intuitive and transparent.
A conformal predictor is an algorithmic framework which complements an underlying machine learning algorithm that allows the resulting system to produce predictions with information on their confidence. In the context of a classification problem such information includes p-values which delimit prediction regions where one of them contains a unique label. Intuitively the larger this region is the more a user can trust the conclusion of the prediction made by a conformal predictor and its underlying in silico model. This approach was used for the decidability domain of our framework of AD.
Firstly this framework will be introduced. Secondly this presentation will describe how conformal prediction can be projected onto the Lhasa Limited AD framework at the decidability level with a dataset of inhibitors of the bile salt export pump (BSEP).
This content was presented by Sebastien Guesne at the Joint UK-QSAR and Molecular Graphics and Modelling Society conference in April 2018.