If the prediction hinged upon knowledge of 10 compounds, 5 positive and 5 negative, for which there was a scientifically robust reason for rejecting the use of 3 of those positives in the model then you are still left with 2 positive examples. This is indeed likely to (statistically) reduce the strength of evidence for activity or even change the conclusion the model would arrive at.. I do understand your question and you are right, it would be interesting to see what impact that would have on the overall prediction should those examples be removed from the training set... This is a scenario that we have considered - essentially allowing the user to select the training compounds from which to make a prediction, however, this is not something we are currently planning to implement since it does make it difficult for a regulator to accept such a prediction since essentially the user is manipulating the model's training data.... That is not to say that it is immediately wrong - your scenario is a fair one which could at times, potentially be scientifically justifiable...
So where does that leave you today? Even if you have removed 3 compounds, you are still left with 2 presumably relevant positive compounds for which there is no reason to ignore them - that to my mind is an albeit weaker argument for activity but not an argument against activity... Irrespective of whether you could test the model's response to the removal of some of those training compounds, if I was a regulator I would ask some questions... Specifically, by removing them, you are implicitly making the judgement that those 3 compounds would only have been active because of the feature for which you are now discounting them (and that there is no other possible reason for them to be active). That is potentially quite a big step... For this reason, unless you could discount all the positive compounds and still leave a reasonable number of negatives I would suggests that it may be questionable for you to remove some compounds from the analysis and then ignore other positives in order to come to an argument for inactivity. This may be defensible, but I guess it would depend upon the specific case - any mechanistic/stereoelectronic argument as to why the remaining compounds are 'outliers', the absence of any deactivating features that could account for some of the compounds being inactive... all the way back to challenging the original data source /conditions used... I'm afraid I really cannot think of a universal approach to replicate the knowledge of an expert!