This is a really good question, but first let us take a step back and consider the concept of similarity carefully since it should always be described in the context of how it is measured and how it is being applied. Firstly, any measure of similarity must be driven by the use. For example when Lipinski published his analysis on orally bioavailable drug-like compounds, his measures of similarity were simple physiochemical properties (molecular weight, H-bond counts and clogP). For mutagenicity those same measures have very little predictive value whereas the presence or absence of structural fragments is much more significant. Once an appropriate measure of similarity have been demonstrated, then the rationale for its application should also be made clear, including the decision as to where and why to apply any cut-off. For a statistical approach, it is perfectly valid to apply a cut-off that is optimized for performance against a test-set, although such an approach should always be ‘validated’ against different datasets to ensure that such a threshold is robust. I suspect that is what you refer to in your question, but I can’t answer for how Leadscope defines and uses any similarity measures.
Sarah doesn’t work quite that way since while we have applied machine-learning (statistical) approaches to construct the model, we have been driven to making the output as supportive as possible for the subsequent expert assessment. However, Sarah does use similarity in the analysis. For each query compound, Sarah generates a fingerprint of fragments, which is compared to the training set to determine those most similar. Within each hypothesis, those more similar compounds are displayed first, but we do not apply a threshold below which examples are not displayed. We believe this is important since the expert may have specific knowledge that leads them to conclude that a particular example is more relevant than our ranking would suggest and we do not want to hide such data. We also look at the signal from the most similar training compounds (using the same fragment-based fingerprint) and identify when those most similar compounds would give us a different conclusion from the dataset as a whole. This then allow us to use the principle that similar compounds are likely to exhibit similar activities but to do so in a transparent way.