In this perspective we explore the use of strategies from drug discovery, pattern recognition, and machine learning in the context of computational materials science. We focus our discussion on the development of donor materials for organic photovoltaics by means of a cheminformatics approach. These methods enable the development of models based on molecular descriptors that can be correlated to the important characteristics of the materials. Particularly, we formulate empirical models, parametrized using a training set of donor polymers with available experimental data, for the important current–voltage and efficiency characteristics of candidate molecules. The descriptors are readily computed which allows us to rapidly assess key quantities related to the performance of organic photovoltaics for many candidate molecules. As part of the Harvard Clean Energy Project, we use this approach to quickly obtain an initial ranking of its molecular library with 2.6 million candidate compounds. Our method reveals molecular motifs of particular interest, such as the benzothiadiazole and thienopyrrole moieties, which are present in the most promising set of molecules.

Visit publication