Analyzing Air Quality Data by Machine Learning

Authors

  • Ardson dos S. Vianna Jr. USP
  • Fernando de Come Departamento de Engenharia Química, Escola Politécnica, Universidade de São Paulo, São Paulo, Brasil

DOI:

https://doi.org/10.63595/vetor.v35i1.18205

Keywords:

Air quality, Clustering, Classification

Abstract

Machine learning (ML) allows for the continuous analysis of large volumes of data, including information on consumption, public health, and industrial processes. One example of such datasets is the parameters produced by air quality monitoring. This study utilized ML tools to assess air quality at CETESB station 66 - Parisi in Cubatão, São Paulo, Brazil. Data from a one-year period, from 1/1/2022 to 1/1/2023, for Inhalable Particulate Matter (PM10), nitrogen oxides (NO, NO2, and NOx), and SO2 were examined. Feature engineering, clustering, and classification were conducted, resulting in valuable analyses that improve pollutant control in the atmosphere. The dendrogram indicated the presence of four clusters, which was confirmed by the K-mean method. The k nearest neighbor algorithm emerged as the classifier with the best performance, with a coefficient of 0.953138. Protecting the environment should be a collective responsibility; even small initiatives can significantly contribute to movements and public policies.

Downloads

Author Biography

Fernando de Come, Departamento de Engenharia Química, Escola Politécnica, Universidade de São Paulo, São Paulo, Brasil

Chemical engineer graduated from the Polytechnic School, University of São Paulo. He is currently a master's student at the same institution.

References

S. Lohr, “The age of big data,” New York Times, 11, 2012. Disponível em: https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html

National Academies of Sciences, Engineering, and M. and others, Data science for undergraduates: opportunities and options. National Academies of Sciences, Engineering and Medicine Tech. rep, 2018. Disponível em: http://nap.edu/25104

National Academies of Sciences, Engineering, and M. and others (2018b) Data science: opportunities to transform chemical sciences and engineering: proceedings of a workshop in brief. National Academies of Sciences, Engineering and Medicine Tech. rep., 2018. Disponível em: https://doi.org/10.17226/25191

K. Schwab, The fourth industrial revolution. Currency, 2017. Disponível em: http://voicebucketvoitto.s3.amazonaws.com/pdf/ingles/%5BENG%5D%20A%20Quarta%20Revolucao%20Industrial.pdf

Techjury. Acesso em 23 de março de 2022. Disponível em: https://techjury.net/blog/how-much-data-is-created-every-day/#gref

V. Dhar, “Data science and prediction,” Communications of the ACM, vol. 56, no. 12, pp. 64–73, 2013. Disponível em: https://doi.org/10.1145/2500499

J. Leek, “The key word in Data Science is not Data, it is Science,” Simply Statistics, vol. 12, 2013. Disponível em: https://www.linkedin.com/pulse/keyword-data-science-kanika-garg-85o3c

Cetesb. Acesso em 02 de maio de 2024. Disponível em: https://cetesb.sp.gov.br/ar/padroes-de-qualidade-do-ar/

P. Norvig e S. Russell, Inteligência Artificial, tradução da 3a ed., Elsevier, 2013. Disponível em: https://www.grupogen.com.br/livro-inteligencia-artificial-uma-abordagem-moderna-stuart-russell-e-peter-norvig-9788595158870

A. Géron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow”, O'Reilly Media, Inc., 2022. Disponível em: https://anayamultimedia.es/primer_capitulo/aprende-machine-learning-con-scikit-learn-keras-y-tensorflow-tercera-edicion.pdf

A. Kadiwal. Acesso em 09 de maio de 2024. Disponível em: https://www.kaggle.com/datasets/adityakadiwal/water-potability

J. P. Mueller e L. Massaron, Machine Learning for Dummies, IBM Limited Edition. New Jersey: John Wiley, 2018. Disponível em: https://www.wiley.com/en-mx/Machine+Learning+For+Dummies-p-9781119245513

E. Alpaydin, Introduction to machine learning, 4th edition, MIT press, 2020. Disponível em: https://www.bme.ufl.edu/wp-content/uploads/2018/07/Fall-2015-Syllabus-BME6938-Machine-Learning.pdf

S. Marsland, Machine learning: an algorithmic perspective, 2nd edition, Chapman and Hall/CRC, 2018. Disponível em: http://2.180.2.83:801/opac/temp/11623.pdf

J. Guttag, Introduction to Computation and Programming Using Python: With Application to Understanding Data, 2nd ed. MIT Press, Cambridge, 2016. Disponível em: https://thuvienso.hoasen.edu.vn/bitstream/handle/123456789/8846/Contents.pdf?sequence=3

Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830, 2011. Disponível em: https://doi.org/10.1002/hbm.25822

Published

2025-07-09

How to Cite

dos S. Vianna Jr., A., & de Come, F. (2025). Analyzing Air Quality Data by Machine Learning. VETOR - Journal of Exact Sciences and Engineering, 35(1), e18205. https://doi.org/10.63595/vetor.v35i1.18205

Issue

Section

Special Section XXVII ENMC/XV ECTM