Air quality in South-East and Western-Pacific Asia is deteriorating rapidly because of population growth and industrial development. However, meteorological factors, and hence the monsoon seasonality that characterizes such an area, have a considerable influence on air pollution levels. In this study, we model daily PM2.5 levels using meteorological parameters known to affect air pollution levels, such as wind speed, surface pressure, temperature, and rainfall amount, from 2020 to 2022 in five major polluted cities: Lahore (Pakistan), Delhi (India), Dhaka (Bangladesh), Hanoi (Vietnam), and Shanghai (China). We employed machine learning (ML) regression techniques to conduct comparative analyses across 35 distinct ML models, aiming to identify the most effective algorithm for reconstructing and forecasting PM2.5 levels from meteorological variables alone. In particular, each machine learning regression model was trained in reconstructing daily PM2.5 levels in the 2020–2021 period and then used to reconstruct missing points in 2020–2021 and forecasts, the daily PM2.5 levels in 2022 using only the meteorological records of the year 2022. We found that most of the day to day and seasonal variability in daily PM2.5 levels could be reconstructed from meteorological conditions. However, the performance among the ML models (as evaluated by Root Mean Square Error values) varied greatly. For example, the Ensembles Boosted Tree ML method exhibited optimal efficiency during the training period (the initial two-year spans 2020 and 2021) and was also very effective in forecasting the third year (2022) using only meteorological data. The Trilayer Neural Network ML was the one that better reconstructed the data after three-years training and therefore may be preferred in filling short periods of missing PM2.5 data. Conversely, the results indicated that the traditional multi-linear regression model proved less effective in both the construction and prediction of PM2.5 data. This study contributes to demonstrating the usefulness of assessing multiple machine learning regression methodologies for a better understanding of the complicated relationship between meteorological conditions and air quality in socioeconomically constrained areas affected by extreme meteorological conditions. Our modeling has implications for pollution level forecasting and reconstruction of missing pollution data, supporting policymakers in the creation of pollution-reduction measures across South-East and Western-Pacific Asia where there may be a limited number of ground stations for data collection, making monitoring and direct measuring of air pollution difficult.
Optimal machine learning techniques for meteorological modeling of PM2.5 concentration in five major polluted cities of South-East / Shafi, Sedra; Scafetta, Nicola. - (2024). ( AGU24 annual meeting Washington DC, USA 9-13 December).
Optimal machine learning techniques for meteorological modeling of PM2.5 concentration in five major polluted cities of South-East
Sedra Shafi
;Nicola Scafetta
2024
Abstract
Air quality in South-East and Western-Pacific Asia is deteriorating rapidly because of population growth and industrial development. However, meteorological factors, and hence the monsoon seasonality that characterizes such an area, have a considerable influence on air pollution levels. In this study, we model daily PM2.5 levels using meteorological parameters known to affect air pollution levels, such as wind speed, surface pressure, temperature, and rainfall amount, from 2020 to 2022 in five major polluted cities: Lahore (Pakistan), Delhi (India), Dhaka (Bangladesh), Hanoi (Vietnam), and Shanghai (China). We employed machine learning (ML) regression techniques to conduct comparative analyses across 35 distinct ML models, aiming to identify the most effective algorithm for reconstructing and forecasting PM2.5 levels from meteorological variables alone. In particular, each machine learning regression model was trained in reconstructing daily PM2.5 levels in the 2020–2021 period and then used to reconstruct missing points in 2020–2021 and forecasts, the daily PM2.5 levels in 2022 using only the meteorological records of the year 2022. We found that most of the day to day and seasonal variability in daily PM2.5 levels could be reconstructed from meteorological conditions. However, the performance among the ML models (as evaluated by Root Mean Square Error values) varied greatly. For example, the Ensembles Boosted Tree ML method exhibited optimal efficiency during the training period (the initial two-year spans 2020 and 2021) and was also very effective in forecasting the third year (2022) using only meteorological data. The Trilayer Neural Network ML was the one that better reconstructed the data after three-years training and therefore may be preferred in filling short periods of missing PM2.5 data. Conversely, the results indicated that the traditional multi-linear regression model proved less effective in both the construction and prediction of PM2.5 data. This study contributes to demonstrating the usefulness of assessing multiple machine learning regression methodologies for a better understanding of the complicated relationship between meteorological conditions and air quality in socioeconomically constrained areas affected by extreme meteorological conditions. Our modeling has implications for pollution level forecasting and reconstruction of missing pollution data, supporting policymakers in the creation of pollution-reduction measures across South-East and Western-Pacific Asia where there may be a limited number of ground stations for data collection, making monitoring and direct measuring of air pollution difficult.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


