The rapid decline in air quality across Southeast and Western Pacific Asia is occurring at an accelerated pace due to population growth and industrial development. The region’s Meteorological factors, including the monsoon seasonality, exert a significant influence on air pollution levels, particularly PM 2.5 concentrations. In this study, we employ a statistical modeling approach to derive daily PM 2.5 levels from meteorological parameters in five major polluted cities: Lahore (Pakistan), Delhi (India), Dhaka (Bangladesh), Hanoi (Vietnam), and Shanghai (China). The incorporated meteorological parameters are wind speed, barometric pressure, temperature, and rainfall, which are known to affect air pollution levels from 2020 to 2022. The statistical modeling was based on the comparative analysis of 35 different machine learning (ML) regression techniques with the purpose of selecting the algorithms most efficient for reconstructing and predicting PM 2.5 levels from meteorological variables alone. Specifically, each ML regression model was trained to reconstruct daily PM 2.5 levels in 2020–2021, and then used to reconstruct both missing daily PM 2.5 levels in 2020–2021 and forecast the whole of 2022 using only the 2022 meteorological records. The results indicated that most of the daily and seasonal variability in daily PM 2.5 levels could be reconstructed from meteorological conditions. However, the performance of the various ML models (as assessed by Root Mean Square Error tests) exhibited considerable variability. Among the tested models, the Ensembles Boosted Tree ML method demonstrated optimal efficiency during the training period (the first 2 years, 2020 and 2021) and it also was highly efficient in predicting the third year (2022) using only meteorological data. Additionaly, the Trilayer Neural Network ML method was found the most effective at reconstructing the data after 3 years of training and may therefore be preferred to fill in short periods of missing PM 2.5 data. In contrast, our comparative analyses showed that the traditional multi-linear regression models under-performed in both constructing and predicting PM2.5 data. This study demonstrates the necessity and usefulness of assessing multiple ML regression methodologies for selecting which ones better perform for reconstructing the data of interest (in our case PM 2.5 records) from their hypothesized constructors (in our case meteorological parameters). In particular, this study has highlighted the utility of using ML regression techniques for forecasting air quality and reconstructing missing pollution data, which is crucial for policy-making across South-East and Western-Pacific Asia regions, where only limited pollution monitoring infrastructure are available.

Optimal machine learning techniques for meteorological modeling of PM2.5 concentration in five major polluted cities of South-East Asia / Shafi, Sedra; Scafetta, Nicola. - In: NATURAL HAZARDS. - ISSN 0921-030X. - 121:6(2025), pp. 6981-7025. [10.1007/s11069-024-07077-z]

Optimal machine learning techniques for meteorological modeling of PM2.5 concentration in five major polluted cities of South-East Asia

Shafi, Sedra
Primo
Formal Analysis
;
Scafetta, Nicola
Ultimo
Conceptualization
2025

Abstract

The rapid decline in air quality across Southeast and Western Pacific Asia is occurring at an accelerated pace due to population growth and industrial development. The region’s Meteorological factors, including the monsoon seasonality, exert a significant influence on air pollution levels, particularly PM 2.5 concentrations. In this study, we employ a statistical modeling approach to derive daily PM 2.5 levels from meteorological parameters in five major polluted cities: Lahore (Pakistan), Delhi (India), Dhaka (Bangladesh), Hanoi (Vietnam), and Shanghai (China). The incorporated meteorological parameters are wind speed, barometric pressure, temperature, and rainfall, which are known to affect air pollution levels from 2020 to 2022. The statistical modeling was based on the comparative analysis of 35 different machine learning (ML) regression techniques with the purpose of selecting the algorithms most efficient for reconstructing and predicting PM 2.5 levels from meteorological variables alone. Specifically, each ML regression model was trained to reconstruct daily PM 2.5 levels in 2020–2021, and then used to reconstruct both missing daily PM 2.5 levels in 2020–2021 and forecast the whole of 2022 using only the 2022 meteorological records. The results indicated that most of the daily and seasonal variability in daily PM 2.5 levels could be reconstructed from meteorological conditions. However, the performance of the various ML models (as assessed by Root Mean Square Error tests) exhibited considerable variability. Among the tested models, the Ensembles Boosted Tree ML method demonstrated optimal efficiency during the training period (the first 2 years, 2020 and 2021) and it also was highly efficient in predicting the third year (2022) using only meteorological data. Additionaly, the Trilayer Neural Network ML method was found the most effective at reconstructing the data after 3 years of training and may therefore be preferred to fill in short periods of missing PM 2.5 data. In contrast, our comparative analyses showed that the traditional multi-linear regression models under-performed in both constructing and predicting PM2.5 data. This study demonstrates the necessity and usefulness of assessing multiple ML regression methodologies for selecting which ones better perform for reconstructing the data of interest (in our case PM 2.5 records) from their hypothesized constructors (in our case meteorological parameters). In particular, this study has highlighted the utility of using ML regression techniques for forecasting air quality and reconstructing missing pollution data, which is crucial for policy-making across South-East and Western-Pacific Asia regions, where only limited pollution monitoring infrastructure are available.
2025
Optimal machine learning techniques for meteorological modeling of PM2.5 concentration in five major polluted cities of South-East Asia / Shafi, Sedra; Scafetta, Nicola. - In: NATURAL HAZARDS. - ISSN 0921-030X. - 121:6(2025), pp. 6981-7025. [10.1007/s11069-024-07077-z]
File in questo prodotto:
File Dimensione Formato  
s11069-024-07077-z.pdf

accesso aperto

Licenza: Copyright dell'editore
Dimensione 5.8 MB
Formato Adobe PDF
5.8 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/990408
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact