Sarah Nexus Model Building
The Sarah Model Building functionality means users can now duplicate the Lhasa model and supplement it with their own data, or build an entirely new model. These custom models can then be exported and shared with other people.
For more information on model building in Sarah Nexus, please view Lhasa’s video on Model Building and Interpreting Confidence, presented by Principal Application Scientist, Dr. Dave Yeo.
There are five simple steps to take when building a model in Sarah Nexus:
- Add New Data - Import a dataset to base the model on, or import data to supplement an existing model.
- Conflict/Duplicate - The data is processed to remove any conflicting or duplicated data. Following standardisation, if there are any compounds within your dataset with the same or conflicting results, these will be detected and dealt with. Those with conflicting results are removed and those with the same results will be reduced to a single entry. If you are creating a brand new model, the standardisation techniques can be altered from the default techniques.
- Model Settings - The user can name the model and decide which reasoning type is used (weighted/ single most confident/ conservative). The Equivocal and Sensitivity settings can be defined.
- Results & Cross-Validation - This section details the hypotheses used within the model, a summary of the constraints, the structures used in the dataset, and the cross-validation results.
- External Validation - Users can test their model against external datasets. The results are displayed as a colour-coded heat map, which shows the best Equivocal and Sensitivity settings.
The following charts are displayed for both the cross-validation and the external validation:
- Performance - A pie chart showing the distribution of the results into five categories: true positive, true negative, false positive, false negative, outside domain. (Figure 1)
- ROC (Receiver Operator Characteristics) - A plot of the true positive rate against the false positive rate, showing the trade-off between sensitivity and specificity. (Figure 2)
- Accuracy - This plot shows the accuracy, sensitivity and specificity against the confidence. (Figure 3)
The following screenshots are of the Performance, ROC and Accuracy Graphs from the Cross-Validation of Sarah Model 2.0
Figure 1: Performance Pie Chart for the Cross-Validation of Sarah Model 2.0
Figure 2: ROC graph for the Cross-Validation of Sarah Model 2.0
Figure 3: Accuracy Graph for the Cross-Validation of Sarah Model 2.0
- Enhanced Predictivity for Proprietary Chemical Space - The Sarah model builder enables users to supplement Sarah’s known chemical space with their own proprietary data to enhance predictivity within the chemical space of the user's query compound.
- Reduced Bias - Duplicating the Sarah model and then adding your own data to create a new model can reduce bias in a smaller dataset.
- Generate the Best Model for a Specific Dataset - Using the External Validation produces a heat map, for which the optimum Sensitivity and Equivocal parameters can be found. This means the model can be fine-tuned to find the best overall settings for a particular dataset to be tested within Sarah.
- Fewer False Strengths - New data added to a model, or used to create a brand new model, is standardised. Standardisation of the data within a model leads to fewer duplicates in a training set. These duplicates can lead to false strengths in signals and so reducing them provides experts with a more accurate prediction.