Machine learning, nature benefits: How AI helps 2 USC environmental scientists unlock the natural world’s mysteries

How do you measure a cloud? How do you count a swarm of bees? Machine learning provides insights into complex natural phenomena.

July 20, 2022 Paul McQuiston

Machine learning is a very specific form of artificial intelligence. Through algorithms designed to learn from experience, machine learning — also known as ML — adapts and grows in efficiency over time as more data is added. The ML-driven program “learns” from its mistakes, and in doing so can reduce the time it takes to analyze mountains of data from years to minutes.

Two recently hired USC faculty members, Melissa Guzman and Sam Silva, are already garnering attention for their usage of machine learning to find insights into the seemingly unknowable — the patterns underlying the natural world. Guzman is looking for trends in migratory patterns of bees, among our most important pollinators, as well as their community makeup. Silva is studying the chemical makeup of clouds. Recently named recipients of the USC Wrigley Institute for Environmental Studies’ Faculty Innovation Award, both are using their expertise to develop solutions to environmental challenges.

AI and science: Melissa Guzman and Sam Silva
Melissa Guzman and Sam Silva are using machine learning to find insights into patterns underlying the natural world. (Photos/Courtesy of Melissa Guzman and Sam Silva)

“Dr. Guzman and Dr. Silva are using exciting new computational tools to address complex environmental questions,” says Jessica Dutton, associate director for research and engagement at the Wrigley Institute. “Their programs are not just poised to generate new scientific knowledge about climate and biodiversity, but also new insights for decision-makers about trends and possible solutions in a changing world.

Climate change disrupts bees’ migratory patterns, community formation: How AI and science can help

California is home to the most diverse and largest population of bees in all of North America. Of the 4,000 species of bees found in the United States, 1,600 can be found in the state. They’re also among nature’s most active pollinators — everything from your backyard garden to major agricultural operations depend in some part on their role in the ecosystem.

However, as their numbers have dipped in the past decade, identifying and protecting safe and sustainable bee sanctuaries has taken on an increased importance. But how do you find where they are most likely to flourish? It’s a bigger challenge than you might think, according to Guzman, Gabilan Assistant Professor of Biological Sciences at the USC Dornsife College of Letters, Arts and Sciences.

“One of the hardest things about figuring out what’s happening to insects is that we have very good data for a few species in a few places,” Guzman says. “Researchers are going to the same place and counting the total number of different insects, which gives you an idea of how the population fluctuates through time. But that data is very rare. What I try to do with my research is to fill the gaps through spatial science methodologies.”

Using museum records, community science apps and data from diversity surveys, Guzman identifies trends in distribution patterns and community makeup. Even with those resources, the data isn’t great, she says — oftentimes it is biased and geographically concentrated. This results in data clusters around cities and close to roads, but not in more remote locations.

One of the things we’ve been finding in the case of the bumblebees is that not every species is declining.

Melissa Guzman, USC Dornsife

Enter machine learning. Guzman utilizes these tools to speed up the data cleaning process. Databases frequently can contain wrong or incomplete information, and incorrect species names, dates and locations will spoil a study. By bringing in experts to analyze and correct the data, the researchers can take that knowledge, apply it to the dataset and allow the machine learning tools to isolate and correct incorrect data points.

“Bumblebees are a very different type of bee — they’re big, they’re fussy, they’re hairy — and they generally love more temperate areas. One of the things we’ve found is that changes of temperature in the last century seem to explain why some species are declining,” Guzman says. “We want to use life history traits to understand which of the species are benefitting the most from things like climate change, and which are being hindered the most. One of the things we’ve been finding in the case of the bumblebees is that not every species is declining.”

AI and science: Advanced computing paves way to more accurate, faster climate models

Los Angeles’ air is legendary, if for all the wrong reasons. For Silva, assistant professor of earth sciences at USC Dornsife College, it’s perfect for his research: the analysis of the atmosphere’s chemical composition.

“The chemical composition of clouds and Earth’s atmosphere matters in nearly every facet of air quality and climate change,” says Silva, also a member of the civil and environmental engineering department at the USC Viterbi School of Engineering. “With air quality, we’re looking at chemical compounds in the air that are bad for us to breathe. Meanwhile, climate change is partially caused by this imbalance between the amount of compounds entering the system versus the amount leaving — that’s what leads to warming.

“Our understanding of all these processes is imperfect for a lot of reasons: Either we don’t have enough data, we just simply don’t know or we might have a good idea, but when we enter that into the computer model it takes forever to run the code. We leverage machine learning to help us sift through the data that we have — which is sometimes an enormous amount of partially relevant data — and figure out what’s going on.”

Silva describes clouds as “some of the largest uncertainties in our understanding of the physical climate” due to their complex mixture of physics (wind velocity and direction) and chemistry (various molecules mixing in the atmosphere). Understanding their behavior is important because of the role they play in reflecting sunlight back into space and global hydrological cycles. Correctly measuring their location, brightness and duration is essential to properly understand and predict their behavior.

Current climate models could provide highly detailed explanations for how clouds form, but an actual simulation “would take years to finish,” Silva said. This is partly due to parameterization, a process scientists use to approximate the effects of these phenomena mathematically. However, what parameterization boasts in efficiency, it lacks in accuracy. Silva said utilizing machine learning will keep the speed provided by parameterization without sacrificing accuracy.

We hope to be able to make climate predictions better and faster, while also identifying interesting data to potentially motivate future study.

Sam Silva, USC Dornsife

“We think the limitations of parameterization might be one of the reasons why clouds and climate models are so uncertain,” he added. “What we’ll be doing in this project is using machine learning techniques to speed up that very slow process, giving us the great accuracy from the model without the associated computational cost. We hope to be able to make climate predictions better and faster, while also identifying interesting data to potentially motivate future study.

And what he learns in L.A. will unfortunately take on greater relevance as the conditions of other cities begin to mimic those in Southern California.

“L.A. is similar to other cities in many ways. Most cities have high populations, a lot of cars and they’re not super walkable,” he says. “The chemistry that we learn about in Los Angeles is transferable to many other locations. What happens here is relevant to human health and air quality.

“This is not an issue that only affects people in places like China or India, which we typically think of having very poor air quality — it’s a problem here too.”