Haverford College Researchers Mine Failed Experiments to Predict New Synthetic Materials Reactions
A team of Haverford College researchers, including several students, published a cover story in Nature on using unpublished "dark reactions" to create a machine-learning algorithm that is able to predict reaction successes or failures with greater accuracy than human intuition.
Many chemical reactions are never reported because those considered failures don't get published in journals. These "dark reactions," which, despite being unsuccessful, still offer valuable information about the bounds on reaction conditions needed for compound formation, have always just languished in lab notebooks, helping only the scientists who record them. Until now.
A team of Haverford College researchers used those unpublished "dark reactions" to create a machine-learning algorithm that is able to predict reaction successes or failures with greater accuracy than human intuition. As detailed in a cover story in Nature, published May 4, the Haverford scientists have demonstrated both the value of wider dissemination of unsuccessful syntheses and the possibility of using machine learning to arrive at potential synthetic compounds faster than traditional means.
"Leveraging unpublished data in an unbiased way by machine learning models can lead to invaluable predictions," says Harvard Professor of Chemistry and Chemical Biology Alán Aspuru-Guzik. "In particular, the authors show that non-trivial correlations and predictions can arise from laboratory notebook data that can accelerate new materials discovery."
The research team, which includes Assistant Professor of Computer Science Sorelle Friedler, Associate Professor of Chemistry Alexander Norquist, Associate Professor of Chemistry Joshua Schrier, and several student researchers, began with the notebooks from Norquist's lab, which synthesizes and studies organically templated metal oxide compounds. Though Norquist and his students have created a hosts of compounds over the past decade, that work was the result of many more individual experiments that did not result in any unique compounds.
"There tends to be a lot more failures than successes," says Norquist. "If I write a paper with five different compounds, we’ll include the details for five different individual reactions for the wider community, but there could have been a hundred total reactions that went into the development or the refinement of the conditions in order to give those specific reactions. So I think about the failures as the bit of the iceberg that’s underwater—we only ever see the top."
In order to help the wider scientific community learn from these "failures," Norquist partnered with Friedler, an expert in machine-learning algorithms, and Schrier, a computational chemist who distilled the underlying properties of the chemical recipes in the lab notebooks to break down what makes a reaction successful or not.
"I think it's a true Haverford success story," says Schrier. "This work came out of the fact that Sorelle happened to have an office next to ours and we would have lunch together. By being at a small place where people from different departments have close interactions with one another, spontaneous things can arise like this project."
Several Haverford students are co-authors on the paper—computer science majors Casey Falk '16 and Paul Raccuglia '14 and chemistry majors Malia Wenny '17, Katherine Elbert '14, and Aurelio Mollo '17—as are the College's Postdoctoral Research Fellow in Cheminformatics Philip Adler and Purdue University's Matthias Zeller.
This work, which was funded by the National Science Foundation, is significant for several reasons. In addition to creating a vast repository of over 4000 individual chemical reactions that enriches the scientific community, the machine-learning algorithm that the team created is actually better than expert humans at predicting future reactions.
"You can get a huge productivity boost in your exploratory chemistry by using our methods," says Schrier. "And it actually tells you something about your data that you didn’t know. It reveals these new chemical hypotheses that are testable and that have eluded humans who have been working on this project for many many years. Capturing these dark reactions is the prerequisite, but then it gives you a tool that is demonstrably better than humans and actually teaches you something. It’s not just about machine learning, but also human learning, in some sense."
The database of reactions—including the properties associated with each one—is now publicly accessible, and the authors hope that outside scientists will contribute their own dark reactions.
"Part of the motivation for this work was to learn as much from the reactions as we possibly can and help others learn as well," says Norquist. "There has to be an accessible quality to it."
"This work will be invaluable, because the authors have shared the precise schemes that they have used, as well as software tools," says Professor Ram Seshadri of the University of California, Santa Barbara's Materials Department and Department of Chemistry and Biochemistry. "This work is truly path-breaking."
Read the full Nature paper, "Machine-learning-assisted materials discovery using failed experiments."
Read more about this research: