Lhasa Limited shared knowledge shared progress

Sarah Nexus Model Building

The Sarah Model Building functionality means users can now duplicate the Lhasa model and supplement it with their own data, or build an entirely new model. These custom models can then be exported and shared with other people.

For more information on model building in Sarah Nexus, please view Lhasa’s video on Model Building and Interpreting Confidence, presented by Principal Application Scientist, Dr. Dave Yeo. 

How to Build a Model

Features and Benefits 

How to Build a Model

There are five simple steps to take when building a model in Sarah Nexus:

  1. Add New Data - Import a dataset to base the model on, or import data to supplement an existing model.
  2. Conflict/Duplicate - The data is processed to remove any conflicting or duplicated data. Following standardisation, if there are any compounds within your dataset with the same or conflicting results, these will be detected and dealt with. Those with conflicting results are removed and those with the same results will be reduced to a single entry. If you are creating a brand new model, the standardisation techniques can be altered from the default techniques.
  3. Model Settings - The user can name the model and decide which reasoning type is used (weighted/ single most confident/ conservative). The Equivocal and Sensitivity settings can be defined.
  4. Results & Cross-Validation - This section details the hypotheses used within the model, a summary of the constraints, the structures used in the dataset, and the cross-validation results.
  5. External Validation - Users can test their model against external datasets. The results are displayed as a colour-coded heat map, which shows the best Equivocal and Sensitivity settings.

The following charts are displayed for both the cross-validation and the external validation:

  • Performance - A pie chart showing the distribution of the results into five categories: true positive, true negative, false positive, false negative, outside domain. (Figure 1)
  • ROC (Receiver Operator Characteristics) - A plot of the true positive rate against the false positive rate, showing the trade-off between sensitivity and specificity. (Figure 2)
  • Accuracy - This plot shows the accuracy, sensitivity and specificity against the confidence. (Figure 3)

The following screenshots are of the Performance, ROC and Accuracy Graphs from the Cross-Validation of Sarah Model 2.0

Figure 1: Performance Pie Chart for the Cross-Validation of Sarah Model 2.0

Figure 2: ROC graph for the Cross-Validation of Sarah Model 2.0

Figure 3: Accuracy Graph for the Cross-Validation of Sarah Model 2.0

Features and Benefits 

  • Enhanced Predictivity for Proprietary Chemical Space - The Sarah model builder enables users to supplement Sarah’s known chemical space with their own proprietary data to enhance predictivity within the chemical space of the user's query compound.
  • Reduced Bias - Duplicating the Sarah model and then adding your own data to create a new model can reduce bias in a smaller dataset.
  • Generate the Best Model for a Specific Dataset - Using the External Validation produces a heat map, for which the optimum Sensitivity and Equivocal parameters can be found. This means the model can be fine-tuned to find the best overall settings for a particular dataset to be tested within Sarah.
  • Fewer False Strengths - New data added to a model, or used to create a brand new model, is standardised. Standardisation of the data within a model leads to fewer duplicates in a training set. These duplicates can lead to false strengths in signals and so reducing them provides experts with a more accurate prediction.

Back to top

© 2023 Lhasa Limited | Registered office: Granary Wharf House, 2 Canal Wharf, Leeds, LS11 5PS, UK Tel: +44 (0)113 394 6020
VAT number 396 8737 77 | Lhasa Limited is registered as a charity (290866)| Company Registration Number 01765239 (England and Wales).