Quantifying Degree of Aromaticity from Structural Features
The term aromaticity is used to describe a cyclic unsaturated structure that exhibits more stability than and different reactivity to a similar
non-aromatic one. For a ring to be aromatic, the system should obey Hückel’s rule. However, counting these electrons is harder than it
might seem; while many compounds exhibit aromatic character, not all are as perfectly aromatic as benzene. In order to adopt a flexible
and usable approach to aromaticity, we chose to introduce the concept of a “degree of aromaticity”, based on the HOMED approach.
A set of reference bond lengths for each of 29 pairs of atom types were determined. A dataset of 4 million compounds was then
downloaded from PubChemQC, and HOMED indices calculated for all rings. The method is both rapid and scaleable, however it does
require either QM-optimised or crystal structures, since it is dependent on measuring the actual bond lengths in the ring. This method was
used to develop machine-learned and expert-rule based categorisations of aromatic ring types: Molecules were fragmented to separate the
different ring types from each other, and distributions of aromaticity extracted. A variety of structural features were then used as model
input, removing the requirement for accurate conformations. Our aim is to distinguish between “strongly”- (e.g. benzene, thiophene),
“weakly”- (e.g. uracil), and non-aromatic structures. This categorisation should allow us to describe chemical knowledge more accurately.