GeoSCIFramework: Scalable Real-time Streaming Analytics and Machine Learning for Geoscience and Hazards Research develops a real-time processing system capable of handling a large mix of sensor observations using automated detection of natural hazard events using machine learning, as the events are occurring. A four-organization collaboration (UNAVCO, University of Colorado, University of Oregon, and Rutgers University) develops a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. This work will support rapid analysis and understanding of data associated with hazardous events (earthquakes, volcanic eruptions, tsunamis).
Science Challenges
Earthquakes, tsunamis and volcanoes pose natural hazards on nearly unimaginable scales and compel geoscientists to find new ways to better understand the processes that cause them and to mitigate their effects on population and the built environment.
The shear volume and complexity of the data from these data streams, coupled with the need to model, analyze and assess hazards in a matter of only moments, makes geophysical applications to hazards early warning a Big Data problem. This project will unite computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.
Technical Approach
This project uses a collaboration between computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.
It focuses on the aggregation and integration of a large number of data streams into a coherent system that supports analysis of the data streams in real-time. The framework will offer machine-learning-based tools designed to detect signals of events, such as earthquakes and tsunamis, that might only be detectable when looking at a broad selection of observational inputs.
The architecture sets up a fast data pipeline by combining a group of open source components that make big data applications viable and easier to develop. Machine learning (ML) algorithms will be researched and applied to the tsunami and earthquake use cases. Integral to the project will be development, documentation and training using collaborative online resources such as GitLab and Jupyter Notebooks, and utilizing NSF XSEDE resources to make larger datasets and computational resources more widely available.
Science Drivers
Our project focuses on use cases in the Cascadia subduction zone and Yellowstone: these locations combine the expertise of the science team with locations where EarthScope and OOI have the greatest concentration of instruments. Data sources for the project draw primarily upon the 1500+ sensors from the EarthScope networks currently managed by UNAVCO and the Incorporated Research Institutions for Seismology (IRIS), as well as the Ocean Observatories Initiative (OOI) cabled array data managed by Rutgers University.
Use Case: Real-time Short-term Events. The fundamental open science question for earthquake and tsunami hazards is about determinism, when, within a minutes long rupture process, is a very large earthquake different from an only large one? This is important for the physics of earthquakes, the rupture behavior, and is indicative of the material properties, the state of stress, and the overall dynamics. For Earthquake (EEW) and Tsunami (TWW) early warning, it defines the minimum possible time at which characterization of an event and its resulting hazards could be made. The focus of this study is earthquake and tsunami warning in the Cascadia subduction zone.
Use Case: Long-term Events. Natural catastrophes occur at a variety of spatial and temporal scales. In particular, solid earth hazards, such as large earthquakes and volcanic eruptions, often have very long interevent times and this makes it difficult to forecast their behavior. This part of the project pulls in multiple data sets to address the long- intermediate- and short-term forecasting of these types of events. Test sites include the Yellowstone magmatic center and the Hawaiian island volcanoes.
Benefits to Researchers
GeoSCIFramework will provide scientists and researchers the capability to instantly recognize that a tsunamigenic earthquake has occurred or to identify longer term subtle motions of the earth's surface on previously unrealized scales. Trained in this multi-data environment and informed by physical models, machine learning algorithms and spatio-temporal analyses, this project’s approach is extensible to not just detection and characterization of earthquakes but also to the onset of other geophysical signals like slow-slip events or magmatic intrusion, expanding the potential for new scientific discoveries.
Participating Institutions
UNAVCO
Dr. Charles Meertens, Principal Investigator, Director of Geodetic Data Services
Dr. David Mencin, Co-Principal Investigator, Project Manager, Geodetic Data Services
Dr. Scott Baker, Co-Principal Investigator, Software Engineer III, Geodetic Data Services
Dr. Kathleen Hodgkinson, Data Engineer III, Geodetic Data Services, UNAVCO
Shelley Olds, Science Education Specialist, Education and Community Engagement, UNAVCO
Rutgers University
Dr. Ivan Rodero, Principal Investigator, Associate Director for Technical Operations, Associate Research Professor, Rutgers Discovery Informatics Institute
J.J. Villalobos, Rutgers, Co-Principal Investigator, Assistant Director of Research Computing and Cybersecurity, Rutgers Discovery Informatics Institute
Dr. Anthony Simonet, Post-Doctoral Research Associate, Rutgers Discovery Informatics Institute
University of Colorado at Boulder
Dr. Kristy Tiampo, Principal Investigator, Director of Earth Science and Observation Center (ESOC), Department of Geological Sciences and Cooperative Institute for Research in Environmental Sciences
Brie Corsa, Graduate Research Assistant, Department of Geological Sciences
University of Oregon Eugene
Dr. Diego Melgar, Principal Investigator, Assistant Professor, Department of Earth Sciences
This 4-year award by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Cross-Cutting Program and Division of Earth Sciences within the NSF Directorate for Geosciences, the Big Data Science and Engineering Program within the Directorate for Computer and Information Science and Engineering, and the EarthCube Program jointly sponsored by the NSF Directorate for Geosciences and the Office of Advanced Cyberinfrastructure.