At the start of his academic career, Jason Terry was primarily interested in astronomy. But while earning his master’s degree at Brown University, he veered into new territory: particle physics.
In 2018 Terry took an opportunity to analyze data for the CMS experiment at the Large Hadron Collider—and who wouldn’t? he says. The CMS experiment was instrumental in one of the most important recent discoveries in particle physics, the Higgs boson. And building the 17-mile-long underground particle collider where the CMS experiment sits is “probably one of the coolest things that people have done, ever.”
Using the focus of his degree, data science, and his experience analyzing data for astronomy, Terry worked to improve the energy reconstruction of particles passing through the CMS detector. He did this by feeding the data through a machine-learning model.
Not long afterward, while pursuing his PhD at the University of Georgia, he set out to demonstrate that the same machine-learning method could work in astronomy research as well. The effort turned out to be even more effective than he had hoped.
A chance encounter
In December 2021, back in astronomy after his stint on CMS, Terry attended a conference to present about research into protoplanetary disks, rings of gas that rotate around newly formed stars. While there, he ran into CMS scientist Sergei Gleyzer, a professor at the University of Alabama and an expert on machine-learning models.
The two talked about the kinds of datasets Terry had been working on at CMS and the kinds that astronomers used to study protoplanetary disks. At CMS, scientists use different types of data representations, including images of collisions, to search for interesting particles, and in astronomy, scientists use images of faraway star systems to search for exoplanets.
Exoplanets are any planets outside our solar system. According to Terry, we know two major things about them.
One, exoplanets are very common: Scientists have found close to 6,000 so far. While not every star will have an exoplanet, many stars are orbited by multiple of them, so planets may outnumber the stars.
Two, most of the exoplanets we’ve discovered are very different from the planets in our solar system. Many planets seem to have formed more quickly than ours, in ways that scientists’ traditional theories can’t explain. In other solar systems, planets can have rings many times larger than the ones that span more than 170,000 miles around Saturn. Solar systems can have big Jupiters, hot Jupiters, big Earths, hot Earths, far Jupiters, one planet, or nine planets.
Studying the formation of exoplanets can teach us about the origins of our own solar system and the possibility of life outside it. But to study exoplanets, scientists have to find them first.
Early ways of looking for exoplanets involved searching for gaps in protoplanetary disks—the shadows of exoplanets. This was the method used in the majority of discoveries of exoplanets. The method Terry was studying, on the other hand, was so new it had thus far resulted in the discovery of only three.
The method involved looking at the velocity of protoplanetary disks. When an exoplanet is present in the disk, scientists will see a small deviation in the velocity at which the star travels through its orbit. Studying the velocity of protoplanetary disks is a more effective way to find young, “baby” planets, aged millions rather than billions of years, Terry says. In the data, an exoplanet would take the form of a characteristic wiggle.
Working with Gleyzer and colleagues Cassandra Hall and Sean Abreau, Terry analyzed data to show that machine learning could be used to search for exoplanets.
Hidden in the stars
The researchers heavily borrowed from the method Terry and Gleyzer used to analyze CMS data, called an end-to-end deep-learning approach, in which data in its raw form goes through a model without feature engineering.
A large amount of the work involved programming and running simulations. While at CMS Terry was able to use data straight from the detector. But because only a few dozen protoplanetary disks have been discovered, training a machine-learning algorithm to search for exoplanets required synthetic data as well. It took a year to prepare synthetic simulations for the model.
The group’s goal was to demonstrate that the models could recreate previous discoveries and that the synthetic data matched real data without any false positives. But when Terry finally ran the disk, an unexpected signature appeared on his screen.
It wasn’t a false positive, though; it was a sign of a real exoplanet, one that six previous years of analysis had missed. Terry was elated. “When that popped up, that was the best little wiggle I’ve seen in my entire life,” he says.
The discovery shows the potential of using machine-learning models in the search for exoplanets, Terry says.
In the future, Terry is hoping to bring in more advanced machine-learning techniques to improve current results and introduce new capabilities. He plans to make the work open-source. That way, other researchers will be able to use it as well, possibly leading to more fortuitous discoveries.