Chat with us, powered by LiveChat

Data-driven federated learning in drug discovery with knowledge distillation

Thierry Hanser, Ernst Ahlberg, Alexander Amberg, Lennart T. Anger, Chris Barber, Richard J. Brennan, Alessandro Brigo, Annie Delaunois, Susanne Glowienke, Nigel Greene, Laura Johnston, Daniel Kuhn, Lara Kuhnke, Jean-François Marchaland, Wolfgang Muster, Jeffrey Plante, Friedrich Rippmann, Yogesh Sabnis, Friedemann Schmidt, Ruud van Deursen, Stéphane Werner, Angela White, Joerg Wichard & Tomoya Yukawa

A main challenge for artificial intelligence in scientific research is ensuring access to sufficient, high-quality data for the development of impactful models. Despite the abundance of public data, the most valuable knowledge often remains embedded within confidential corporate data silos. Although industries are increasingly open to sharing non-competitive insights, such collaboration is often constrained by the confidentiality of the underlying data. Federated learning makes it possible to share knowledge without compromising data privacy, but it has notable limitations. Here, we introduce FLuID (federated learning using information distillation), a data-centric application of federated distillation tailored to drug discovery aiming to preserve data privacy. We validate FLuID in two experiments, first involving public data simulating a virtual consortium and second in a real-world research collaboration between eight pharmaceutical companies. Although the alignment of the models with the partner specific domain remains challenging, the data-driven nature of FLuID offers several avenues to mitigate domain shift. FLuID fosters knowledge sharing among pharmaceutical organizations, paving the way for a new generation of models with enhanced performance and an expanded applicability domain in biological activity predictions.