Projects Help with Projects Major Projects EarthCube GETSI NASA Support NOTA POLENET RESESS ShakeAlert


GeoSCIFramework: Scalable Real-time Streaming Analytics and Machine Learning for Geoscience and Hazards Research develops a real-time processing system capable of handling a large mix of sensor observations using automated detection of natural hazard events using machine learning, as the events are occurring. A four-organization collaboration (UNAVCO, University of Colorado, University of Oregon, and Rutgers University) develops a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. This work will support rapid analysis and understanding of data associated with hazardous events (earthquakes, volcanic eruptions, tsunamis).

Science Challenges

Earthquakes, tsunamis and volcanoes pose natural hazards on nearly unimaginable scales and compel geoscientists to find new ways to better understand the processes that cause them and to mitigate their effects on population and the built environment.

The shear volume and complexity of the data from these data streams, coupled with the need to model, analyze and assess hazards in a matter of only moments, makes geophysical applications to hazards early warning a Big Data problem. This project will unite computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.

GeoSCIFramework system and process architecture

GeoSCIFramework system and process architecture The framework is composed of four layers through which data is acquired, analysed and presented to users (from bottom to top): 1)The Data Ingest/Stream Processing layer is the point of entry for time series streamed from different sensor networks. 2) The Stream/Time Series Data Store layer is responsible for storing raw and curating time series in a format suitable for analytics. 3) Data Search/Analysis/Visualization layer manages computing resources, analytics and machine learning framework. 4) GeoSCIFramework Portal is the user-facing layer.

Technical Approach

This project uses a collaboration between computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.

It focuses on the aggregation and integration of a large number of data streams into a coherent system that supports analysis of the data streams in real-time. The framework will offer machine-learning-based tools designed to detect signals of events, such as earthquakes and tsunamis, that might only be detectable when looking at a broad selection of observational inputs.

The architecture sets up a fast data pipeline by combining a group of open source components that make big data applications viable and easier to develop. Machine learning (ML) algorithms will be researched and applied to the tsunami and earthquake use cases. Integral to the project will be development, documentation and training using collaborative online resources such as GitLab and Jupyter Notebooks, and utilizing NSF XSEDE resources to make larger datasets and computational resources more widely available.

Science Drivers

Our project focuses on use cases in the Cascadia subduction zone and Yellowstone: these locations combine the expertise of the science team with locations where EarthScope and OOI have the greatest concentration of instruments. Data sources for the project draw primarily upon the 1500+ sensors from the EarthScope networks currently managed by UNAVCO and the Incorporated Research Institutions for Seismology (IRIS), as well as the Ocean Observatories Initiative (OOI) cabled array data managed by Rutgers University.

Use Case: Real-time Short-term Events. The fundamental open science question for earthquake and tsunami hazards is about determinism, when, within a minutes long rupture process, is a very large earthquake different from an only large one? This is important for the physics of earthquakes, the rupture behavior, and is indicative of the material properties, the state of stress, and the overall dynamics. For Earthquake (EEW) and Tsunami (TWW) early warning, it defines the minimum possible time at which characterization of an event and its resulting hazards could be made. The focus of this study is earthquake and tsunami warning in the Cascadia subduction zone.

Use Case: Long-term Events. Natural catastrophes occur at a variety of spatial and temporal scales. In particular, solid earth hazards, such as large earthquakes and volcanic eruptions, often have very long interevent times and this makes it difficult to forecast their behavior. This part of the project pulls in multiple data sets to address the long- intermediate- and short-term forecasting of these types of events. Test sites include the Yellowstone magmatic center and the Hawaiian island volcanoes.

Benefits to Researchers

GeoSCIFramework will provide scientists and researchers the capability to instantly recognize that a tsunamigenic earthquake has occurred or to identify longer term subtle motions of the earth's surface on previously unrealized scales. Trained in this multi-data environment and informed by physical models, machine learning algorithms and spatio-temporal analyses, this project’s approach is extensible to not just detection and characterization of earthquakes but also to the onset of other geophysical signals like slow-slip events or magmatic intrusion, expanding the potential for new scientific discoveries.

Participating Institutions


  • Dr. Charles Meertens, Principal Investigator, Director of Geodetic Data Services
  • Dr. David Mencin, Co-Principal Investigator, Project Manager, Geodetic Data Services
  • Dr. Scott Baker, Co-Principal Investigator, Software Engineer III, Geodetic Data Services
  • Dr. Kathleen Hodgkinson, Data Engineer III, Geodetic Data Services, UNAVCO
  • Shelley Olds, Science Education Specialist, Education and Community Engagement, UNAVCO

Rutgers University

  • Dr. Ivan Rodero, Principal Investigator, Associate Director for Technical Operations, Associate Research Professor, Rutgers Discovery Informatics Institute
  • J.J. Villalobos, Rutgers, Co-Principal Investigator, Assistant Director of Research Computing and Cybersecurity, Rutgers Discovery Informatics Institute
  • Dr. Anthony Simonet, Post-Doctoral Research Associate, Rutgers Discovery Informatics Institute

University of Colorado at Boulder

  • Dr. Kristy Tiampo, Principal Investigator, Director of Earth Science and Observation Center (ESOC), Department of Geological Sciences and Cooperative Institute for Research in Environmental Sciences
  • Brie Corsa, Graduate Research Assistant, Department of Geological Sciences

University of Oregon Eugene

  • Dr. Diego Melgar, Principal Investigator, Assistant Professor, Department of Earth Sciences

Award Information

This 4-year award by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Cross-Cutting Program and Division of Earth Sciences within the NSF Directorate for Geosciences, the Big Data Science and Engineering Program within the Directorate for Computer and Information Science and Engineering, and the EarthCube Program jointly sponsored by the NSF Directorate for Geosciences and the Office of Advanced Cyberinfrastructure.

Publications and Presentations

Corsa, B.D., Tiampo, K.F., Kelevitz, K., Baker, S., Meerteens, C., Automated processing, streaming, and integration of InSAR time-series and GNSS data; as part of the collaborative GeoSciFramework research project, Poster Presented at American Geophysical Union 2019 Fall Meeting, Session G13C-0574, 2019, Dec 09, San Francisco, CA

Poster thumbnail

Poster PDF The main goal of the GeoSciFramework project is to improve intermediate-to-short term forecasts of catastrophic natural hazard events, allowing researchers to instantly detect when an event has occurred and reveal the more suppressed, long-term motions of Earth's surface at unprecedented scales. This will be accomplished by training machine learning algorithms to recognize patterns across various data signals during geophysical events. In particular, automated Interferometric Synthetic Aperture Radar (InSAR) processing can be combined with Global Navigation Satellite System (GNSS) data to obtain 3-D ground surface motions (Samsonov and Tiampo, 2006). This presentation concentrates on building an automated system to generate comprehensive InSAR time series over Hawaii and Yellowstone National Park, which later will be combined with additional monitoring data.

Fauvel, K., Balouek-Thomert, D., Melgar, D., Silva, P., Simonet, A., Antoniu, G., Costan, A., Masson, V., Parashar, M., Rodero, I., Termier, A. (2020). A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 403–411. DOI: 10.1609/aaai.v34i01.5376 Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/5376 08 Jun. 2020.

Figure 1 thumbnail illustrating the Distributed Multi-Sensor Earthquake Early Warning Algorithm (DMSEEW).

Paper PDF Recipient of the "Outstanding Paper Award: Special Track on AI for Social Impact
Abstract: Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems. In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.

Meertens, C., Mencin, D., Baker S., Hodgkinson, K., Olds, S., Melgar, S., Rodero, I.,  Simonet, A., Villalobos, J.J., Tiampo, K., Corsa, B. (2019) When seconds matter – Big Data real-time streaming analytics and machine learning for geoscience and hazards research, Presentation at October 2019 GeoSciFramework Annual Project Meeting, Alexandria, VA.

Presentation thumbnail

Presentation PDF

Last modified: 2020-06-17  16:52:19  America/Denver