The email threat landscape is constantly evolving and hence difficult to counteract even by carrier-grade spam filters. Dangerous spam emails may thus reach the users and then result in damaging attacks spreading through the corporate network. This paper describes a collaborative approach for early detection of malicious spam emails and its application in the context of large companies. By the joint effort of the employees and the security analysts during the last two years, a large dataset of potentially malicious spam emails has been collected with each email being labeled as critical or irrelevant spam. By analyzing the main distinguishing characteristics of dangerous emails, a set of both traditional and novel features was identified and then tested and optimized by applying common supervised machine learning classifiers. The obtained massive experimental results show that Support Vector Machine and Random Forest classifiers achieve the best performance, with the optimized feature set of only 36 features achieving 91.6% Recall and 95.2% Precision. These results, confirmed by a large empirical experiment conducted on 40,000+ company employees, led to the re-engineering of the email threat management process to ensure a high level of security in the company, as well as an increased security awareness of all company employees.

2 Years in the anti-phishing group of a large company / Gallo, L.; Maiello, A.; Botta, A.; Ventre, G.. - In: COMPUTERS & SECURITY. - ISSN 0167-4048. - 105:(2021), p. 102259. [10.1016/j.cose.2021.102259]

2 Years in the anti-phishing group of a large company

Gallo L.;Botta A.;Ventre G.
2021

Abstract

The email threat landscape is constantly evolving and hence difficult to counteract even by carrier-grade spam filters. Dangerous spam emails may thus reach the users and then result in damaging attacks spreading through the corporate network. This paper describes a collaborative approach for early detection of malicious spam emails and its application in the context of large companies. By the joint effort of the employees and the security analysts during the last two years, a large dataset of potentially malicious spam emails has been collected with each email being labeled as critical or irrelevant spam. By analyzing the main distinguishing characteristics of dangerous emails, a set of both traditional and novel features was identified and then tested and optimized by applying common supervised machine learning classifiers. The obtained massive experimental results show that Support Vector Machine and Random Forest classifiers achieve the best performance, with the optimized feature set of only 36 features achieving 91.6% Recall and 95.2% Precision. These results, confirmed by a large empirical experiment conducted on 40,000+ company employees, led to the re-engineering of the email threat management process to ensure a high level of security in the company, as well as an increased security awareness of all company employees.
2021
2 Years in the anti-phishing group of a large company / Gallo, L.; Maiello, A.; Botta, A.; Ventre, G.. - In: COMPUTERS & SECURITY. - ISSN 0167-4048. - 105:(2021), p. 102259. [10.1016/j.cose.2021.102259]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/909869
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 7
social impact