Machine Learning Powered Earthquake Hazard Prediction

Based on a seminar by Dr. Daniel Trugman https://youtu.be/3MDzndzAzbQ

Background

In his seminar, Dr. Trugman explores two areas where the application of machine learning (ML) to earthquake seismology is being actively studied: ground motion prediction and earthquake early warning (EEW) systems. Traditionally, ground motion is predicted using a linear function of earthquake magnitude and distance. This approach assumes that an earthquake's size can be measured statically by its magnitude. In reality, earthquakes of the same magnitude can have significantly different source properties, which correspond to differences in the observed displacement pulse. Dr. Trugman seeks to quantify how dynamic stress drop, which is one of these source properties, correlates to peak ground acceleration (PGA). He asks if the between-event variability which is lacking from traditional ground motion prediction equations (GMPEs) can be modelled using ML. ML can also be applied to a major open question in seismology: is it possible to conclude, solely by examining the early rupture process, if an earthquake is large or not? Dr. Trugman asks whether weak determinism, as observed by Goldberg et al. (2018), can be assumed in EEW algorithms.

Summary: Random Forest GMPE

Trugman and Shearer's analysis of the influence of stress drop on the PGA of earthquakes involves three components: stress drop estimates for the events studied, PGA measurements at the stations recording those events, and a reference GMPE to model the expected variability of PGA values. Dynamic stress drop values are estimated using spectral decomposition. P-wave spectra are used to minimize the effects of attenuation and the vertical component is used to reduce noise within the waveform. Spectral decomposition works well on datasets in which every station in the dataset records many earthquakes and every earthquake is recorded by many stations. The authors choose to work with a large dataset of moderate magnitude earthquakes occuring in the San Francisco Bay Area between 2002 and 2016. Since this region has frequent earthquakes and high station density, the observational constraints on source, path, and site effects can be used to derive a stress drop value from the spectra. PGA measurements are obtained for each earthquake by considering all stations within 100 km from its source. The horizontal-component records from those stations are passed through band-pass filtration and quality control. The PGA is then computed by taking the geometric mean of the remaining records. Since existing GMPEs have been derived from records of earthquakes in regions or magnitude ranges outside of those in question, a novel GMPE must be developed. Using an ML technique called Random Forest (RF) allows the GMPE to account for interactions between source, site, and path effects and to avoid overfitting to observational noise in the input data, both of which are limitations of the linear regressions traditionally used. The base of the RF regression model accounts for the nonlinear effects of magnitude and distance. The model then learns the between-event residual terms, which represent the impact of the effects that are not accounted for in the base, namely the contributions of the individual events and stations. The model outputs ΔPGA measurements, which represent the deviation of that specific event's ground motion from the expected ground motion for a typical event with the same distance, magnitude, and station. Stress drop Δσ is strongly correlated with the ΔPGA residuals generated by the model (Figure 1). After correcting for magnitude and source depth, the Δσ-ΔPGA correlation is still strong, suggesting that the between-event variability in PGA has a physical basis in stress drop (Trugman & Shearer, 2018).

Summary: Rupture Determinism and EEW

Dr. Truman focuses on the real-time magnitude estimation step of ShakeAlert, an EEW system developed for use in California. ShakeAlert uses a deterministic rupture model based on the assumption that the log of peak ground displacement (PGD) is linear with respect to magnitude and that this relation is time-independent. This study analyses a dataset of over 140,000 vertical-component waveforms collected between 1997 and 2018 from earthquakes over magnitude 4.5 around Japan, to quantify the accuracy of these deterministic magnitude estimations. Starting at P-wave onset, the PGD is computed continuously. The waveforms are double integrated to convert the recorded acceleration into displacement. Since PGD has a first order dependency of the distance to the hypocenter, a distance correction is performed. The measurements are normalized to an epicentral distance of 10 km. While the rupture is still growing, these PGD values exhibit steep power law growth regardless of the magnitude of the earthquake that they are measured from (figure 2). This implies that earthquakes of different magnitudes are likely indistinguishable by PGD during rupture (Trugman et al., 2019).

Although not detailed in the paper, Dr. Truman presents research into alternative statistical features of waveforms that could be used to estimate magnitude. If distinguishing between large and small earthquakes during rupture is indeed possible, it would imply that some feature of the source properties determines the earthquake’s final magnitude, thus rupture determinism is valid for that feature (Trugman, 2018). In the case of PGD, however, the results in the paper suggest that rupture determinism cannot be assumed to be valid.

Critique

The strength of the RF GMPE framework lies in its extensibility to new datasets. In the future, it can be used to predict other measures of ground motion and to quantify the importance of rupture dynamics and finite fault effects (Kong et al., 2018). The RF approach also reduces the overfitting to input data observed in traditional GMPEs. This indicates potential for the model to learn about ground motion more generally, and thus make accurate predictions for data that is different from the data that it was trained on. Such a general model would be useful for data limited situations such as large magnitudes, close source receiver distances, and less seismically active regions like Cascadia. The conclusion that PGD cannot determine magnitude at short timescales has alarming implications for the accuracy of EEW systems. However, in the paper, there is no use of ML, despite the fact that the large, complex data set that they used is well suited for it. ML ensemble models can avoid the saturation described in the paper by combining different features. ML could also be used to extract features which may be able to estimate magnitude in place of PGA (Trugman, 2018). These approaches could allow EEW systems to estimate the final magnitude before the rupture process has completed (Cuéllar et al., 2018).

References

Cuéllar, A., Suárez, G., & Espinosa-Aranda, J. M. (2018). A Fast Earthquake Early Warning Algorithm Based on the First 3 s of the P-Wave Coda. Bulletin of the Seismological Society of America, 108( 4), 2068-2079. doi:10.1785/0120180079

Goldberg, D. E., Melgar, D., Bock, Y., & Allen, R. M. (2018). Geodetic Observations of Weak Determinism in Rupture Evolution of Large Earthquakes. Journal of Geophysical Research: Solid Earth,123(11), 9950-9962. doi:10.1029/2018jb015962

Kong, Q., Trugman, D. T., Ross, Z. E., Bianco, M. J., Meade, B. J., & Gerstoft, P. (2018). Machine Learning in Seismology: Turning Data into Insights. Seismological Research Letters,90(1), 3-14. doi:10.1785/0220180259

Trugman, D. (2018). Characterizing Earthquake Hazards and Source Dynamics Using Machine Learning. Seminar presented at MIT Earth Research Laboratory in Cambridge, Massachusetts. Retrieved December 13, 2020, from https://youtu.be/3MDzndzAzbQ

Trugman, D. T., Page, M. T., Minson, S. E., & Cochran, E. S. (2019). Peak Ground Displacement Saturates Exactly When Expected: Implications for Earthquake Early Warning. Journal of Geophysical Research: Solid Earth,124( 5), 4642-4653. doi:10.1029/2018jb017093

Trugman, D. T., & Shearer, P. M. (2018). Strong Correlation between Stress Drop and Peak Ground Acceleration for Recent M 1–4 Earthquakes in the San Francisco Bay Area. Bulletin of the Seismological Society of America, 108(2), 929-945. doi:10.1785/0120170245