Tikalon Header Blog Logo

Data Mining for Material Synthesis

February 19, 2018

While most people think that science operates in a totally logical, Mr. Spock, style, there's actually a lot of intuition involved in successful theory-building. I've found that I get my best ideas when I'm in that dreamlike state experienced before nodding-off to sleep. Another time of idea-generation is when I'm trapped in my own thoughts without any distraction, as when I'm in a waiting room or sitting quietly before the start of a movie, concert, or church service.

I don't think I've ever solved a problem in an actual dream, a concept called lucid dreaming. There are a few examples of lucid dreaming leading to some famous ideas in science, technology, and mathematics, but these are so few in number that I wouldn't spend too much time sleeping instead of working.

Of course, the most famous example of a dream-induced scientific idea is the discovery of the ring structure of benzene by German chemist, August Kekulé (1829-1896). By his own account, the idea of the benzene ring structure came to Kekulé in a day-dream of the ouroboros, the image of a snake biting its own tail.

Engraving of August Kekulé

An 1898 engraving of August Kekulé (1829-1896) from vol. 54 of Popular Science Monthly.

(Via Wikimedia Commons)

An example from technology is Elias Howe's idea for an improved sewing machine in 1845. Howe had been trying to design an effective sewing machine when, according to family records, he had a dream that he was held captive by the king of a strange country who demanded that he build a sewing machine. Threatened by spear-carrying warriors, Howe noticed that their spears had holes near the pointed tip. Unlike a manual sewing needle, which is threaded at the end opposite to the point, this was realized to be the ideal place for threading a machine needle. Howe awoke, headed to his workshop, and had a prototype working in a few hours.

Fig. 2 of US Patent No. 4,750, 'Improvement in sewing-machines,' by Elias Howe, Jr., September 10, 1846

Fig. 2 of US Patent No. 4,750, "Improvement in sewing-machines," by Elias Howe, Jr., September 10, 1846.

As stated in the patent, "The needle used has the eye that is to receive the thread within a small distance - say, an eighth of an inch - of its inner or pointed end."

Via Google Patents.[1]

Mathematics had its dreamers, also, the prime example of which was Ramanujan. Ramanujan said that a Hindu goddess, Namakkal, presented mathematical formulas in his dreams that he verified after waking. In this, Ramanujan follows in the tradition of the ancient Greek philosophers who believed that some dreams carried messages from the gods. Although Aristotle (384-322 BC) treated dreams as a scientific phenomenon in his short essay, On Dreams (Περι ενυπνιων, or De insomniis),[2-4] the dialogues of his mentor, Plato (c.427 BC - c.347 BC) consider dreams to be messages from the gods.

The intuitions that come to scientist in day-dreams probably arise from random connections between memorized facts in analogy to how genetic algorithms function. The mind pieces together random ideas, subjects them to a "fitness test," and selects the ones that make the most sense. When not dreaming, I would perform a similar process in materials development, combining knowledge of known materials to generate new materials. Having handy recall of things such as ionic sizes, crystal structure, material properties, and some rules-of-thumb, made the process proceed more quickly.

In this age of ubiquitous computing, it's easy to see how materials development can be automated. That's what a team of materials scientists and computer scientists from the Massachusetts Institute of Technology (MIT, Cambridge, Massachusetts), the University of Massachusetts Amherst (Amherst, Massachusetts), and the University of California Berkeley (Berkeley, California) thought when they examined the use of text extraction from the scientific literature and machine learning to generate schemes for materials synthesis. Their work is reported in a recent article in the journal, Chemistry of Materials.[5-6]

Kitchen chemistry (MIT, Chelsea Turner)

Kitchen chemistry.

Much chemistry involves production of known chemicals according to a recipe. That's the reason for the popular analogy between cooking and chemistry.

(MIT image by Chelsea Turner.)

Many computational efforts have generated novel materials for catalysis, thermoelectrics, and other applications, so the bottleneck becomes the synthesis of these materials.[5] The usual development of synthesis processes relied on the intuition and experience of the materials scientist, guided by a traditional literature search and review.[6] The present study uses artificial intelligence techniques to data mine tens of thousands of research papers to automatically deduce recipes for producing such novel materials using natural language processing techniques.[5]

Says Elsa Olivetti, a professor in MIT's Department of Materials Science and Engineering and co-author of this study,
"Computational materials scientists have made a lot of progress in the 'what' to make - what material to design based on desired properties... but because of that success, the bottleneck has shifted to, 'Okay, now how do I make it?'"[6]

As a proposed end-product of this research program, there would be a database of material recipes data mined from millions of journal articles. An underlying machine-learning system would use natural language processing to deduce materials recipes and synthesis parameters from these articles.[6] Suggested recipes for synthesis of a target material would result from entry of the target material's name and criteria such as proposed precursor materials and reaction conditions, among other parameters.[6]

As a demonstration of this approach, the research team examined the synthesis conditions for various metal oxides from data mining more than twelve thousand articles and then predicted recipes for synthesis of titania nanotubes via hydrothermal synthesis. Both supervised and unsupervised machine-learning techniques were used, the supervised method involved annotation by humans while the unsupervised method had the system learn how to organize the data.[6] Using an algorithm called Word2vec, which was developed at Google, the researchers were able to train their system with about 640,000 papers.[6]

In an analysis of the system’s accuracy, they found that it was able to identify paragraphs that contained recipes with 99% accuracy and label the words within those paragraphs with 86% accuracy.[6] The further objective of this research are to improve accuracy by using deep learning techniques to automatically devise recipes for those materials not included in the existing scientific literature.[6] This research was funded by the National Science Foundation, the Office of Naval Research, and the Department of Energy, among other sources.[6]


  1. Elias Howe, Jr., "Improvement in sewing-machines," US Patent No. 4,750, September 10, 1846.
  2. Aristotle, "On Dreams," J. I. Beare, Trans., The Internet Classics Archive by Daniel C. Stevenson.
  3. Aristotle, "On Dreams," J. I. Beare, Trans., The University of Adelaide Library.
  4. Aristotle, "On Dreams," Greek text via Wikisource.
  5. Edward Kim, Kevin Huang, Adam Saunders, Andrew McCallum, Gerbrand Ceder, and Elsa Olivetti, "Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning," Chem. Mater. (Article ASAP, October 19, 2017), DOI: 10.1021/acs.chemmater.7b03500.
  6. Larry Hardesty, "Artificial intelligence aids materials fabrication," MIT Press Release, November 5, 2017.

Linked Keywords: Science; logic; logical; Mr. Spock; intuition; theory; idea; daydream; dream; sleep; waiting room; movie; concert; church service; lucid dreaming; technology; mathematics; heterocyclic compound; ring structure; benzene; German; chemist; August Kekulé (1829-1896); ouroboros; snake; tail; Popular Science Monthly; Wikimedia Commons; Elias Howe; sewing machine; family; monarch; king; spear; warrior; sewing needle; thread; workshop; prototype; Google Patents; Srinivasa Ramanujan; Devi; Hindu goddess; mathematical formula; ancient Greek philosopher; Greek mythology; god; Aristotle (384-322 BC); phenomenon; essay; On Dreams; dialogues of Plato; Plato (c.427 BC - c.347 BC); randomness; random; memorization; memorize; analogy; genetic algorithm; fitness function; fitness test; material; development; ionic radius; ionic size; crystal structure; material properties; rules-of-thumb; ubiquitous computing; automated; materials science; materials scientist; computer scientist; Massachusetts Institute of Technology (MIT, Cambridge, Massachusetts); University of Massachusetts Amherst (Amherst, Massachusetts); University of California Berkeley (Berkeley, California); scientific literature; machine learning; chemical synthesis; scientific journal; Chemistry of Materials; kitchen; chemistry; chemical compound; chemical; recipe; analogy; cooking; Chelsea Turner; computation; computational; catalysis; thermoelectric material; thermoelectric; production bottleneck; artificial intelligence; data mining; data mine; natural language processing; Elsa Olivetti; professor; MIT's Department of Materials Science and Engineering; co-author; precursor; chemical reaction; metal; oxide; titanium dioxide; titania; nanotube; hydrothermal synthesis; annotation; algorithm; Word2vec; Google; accuracy; deep learning; funding of science; funded; National Science Foundation; Office of Naval Research; Department of Energy; Elias Howe, Jr., "Improvement in sewing-machines," US Patent No. 4,750, September 10, 1846.