Chris Mattmann, second from left, and students Eric Hachuel, Matheos Asfaw and Pablo Guidice, from left, conducted a data analysis of UFO sightings for an engineering course. (USC Photo/Caitlin Dawson)


Looking toward the sky, Trojans trace UFO sightings

Sci-fi film releases and weather patterns can affect the number of sightings, USC computer science students discover

July 13, 2018 Caitlin Dawson

Computer science master’s students sifted through thousands of documented UFO sightings to tackle one of the greatest unsolved mysteries: Are we alone in the universe?

Glowing objects near Mount Rainier, crashes in Roswell — for millennia, people have looked up to the skies and witnessed mysterious unidentified flying objects. But are these signs of alien civilizations or the result of earthly phenomena such as weather balloons, rocket launches or even the release of a popular sci-fi movie?

These are some of the questions that 60 USC computer science master’s students set out to answer last semester in “Content Detection and Analysis for Big Data,” an advanced content and data analysis class taught and developed by Adjunct Professor Chris Mattmann. A lifelong fan of all things space-related, Mattmann is a principal data scientist with NASA’s Jet Propulsion Laboratory in Pasadena and the director of USC’s Information Retrieval and Data Science group.

UFO sightings by students: any conclusive evidence of alien life?

Working in small teams, students in the class pored over thousands of documented sightings of UFOs to discover how factors such as movie releases and weather events influenced sighting patterns.

What did they discover? While they didn’t find any conclusive evidence of alien life visiting our planet, the students did unearth some interesting patterns hidden within the data.

We found sci-fi movie releases correlated with an increase in sightings. We also found that events like thunderstorms caused an uptick in reports.

Matheos Asfaw

“We found sci-fi movie releases correlated with an increase in sightings,” said Matheos Asfaw, who took the course in the spring. “We also found that events like thunderstorms caused an uptick in reports.”

Asfaw, who graduated in May and has since secured a full-time role with the Santa Monica software company Cornerstone OnDemand, was a member of Team 8, along with Eric Hachuel, Pablo Guidice, Bruno Mazetti and Teague Ashcraft. The team’s visualization and analysis of its research is available.

Analyzing a public database of 60,000 sightings

Although the topic of UFOs may raise a smile, make no mistake — exploring a mystery of this magnitude requires rigorous analyses involving massive amounts of unstructured data, including dates, locations and descriptions of sightings.

In fact, each team analyzed a public database of more 60,000 sightings, which they enriched with data from other sources during the semester. Finally, they used data visualization techniques to evaluate the data analysis tools and communicate their insights. Along the way, they learned important lessons about using data to paint a clearer picture of a complex problem.

“I came to USC for my master’s because I wanted to be a computer scientist,” said Hachuel, who earned his undergraduate in industrial engineering and is working at SpaceX this summer as a software flight reliability intern.

“I really enjoyed this class because it taught us the process a data scientist would follow in real life. It’s a dynamic process — it’s not straightforward and it can be frustrating. You have to really dig to find the answers, but that’s how you learn.”

Close encounters of the data kind

As a PhD student in computer science at USC, Mattmann, who graduated in 2007, relished the opportunity to work on projects with real-world implications. It was during this time that he co-developed the Apache Tika software used to extract data in the Panama Papers, exposing how wealthy individuals exploited offshore tax regimes.

Now he vows to bring the same practical experience to his own students by focusing on a different real-world topic with every course. In the past, his students have wrangled data about polar ice and the sale of illegal weapons online, both of which were real-life research projects Mattmann was working on at the time.

Most importantly, no matter their skill level on the first day, Mattmann hopes his students will leave the class prepared to tackle even the most unwieldy data.

“In this class, I’m hoping to expose students to real-world big data end-to-end: how to collect it, what to do with it and how to infer knowledge ethically and responsibly,” Mattmann said.

In addition to the UFO project, students also participated in weekly class presentations addressing diverse topics, including the ethical considerations of data collection, from the rise and fall of bitcoin to the Cambridge Analytica data leak.

“I want my students to do something new and relevant every time,” Mattmann said. “We have students from a broad range of backgrounds with a variety of different undergraduate degrees and many come in knowing very little about data analysis. But they leave well on their way to becoming masters of the tools.”

Data detectives

To extract as much meaningful information to analyze as possible, students scraped data from, a UFO sighting website, and converted PDF files from archival sightings into a searchable format that could be added to the database. They also used the machine-learning software co-created by Mattmann to analyze images of UFO sightings to automatically group similar images.

Through this type of project-based learning, students learn to solve problems the same way a computer scientist would in the real world  — including facing the fundamental challenges of dealing with large amounts of raw data.

“We learned the importance of cleaning the data early on in the process,” said Guidice, a member of Team 8.

“For example, we scraped data from an online forum about UFO sightings, but people were also posting ads in the same forum. If we hadn’t cleaned the data first, it could have led us to the wrong conclusions.”

The struggle was all part of the plan for Mattmann. By confronting this real-world data messiness, he aims to give students a taste of life as professional data scientists and prepare them to tackle data projects outside the classroom.

“In their future jobs, they may get asked to find similar images in a huge data set or combine data sets with different values  —after this class they’re ready to do that,” said Mattmann, adding with a smile: “And who knows, maybe someday they’ll get asked to investigate UFOs.”