K-means algorithm is one of the most widely used methods in data mining and statistical data analysis to partition several objects in K distinct groups, called clusters, on the basis of their similarities. The main problem of this algorithm is that it requires the number of clusters as an input data, but in the real life it is very difficult to fix in advance such value. In this work we propose a parallel modified K-means algorithm where the number of clusters is increased at run time in a iterative procedure until a given cluster quality metric is satisfied. To improve the performance of the procedure, at each iteration two new clusters are created, splitting only the cluster with the worst value of the quality metric. Furthermore, experiments in a multi-core CPUs based environment are presented.

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments / Laccetti, Giuliano; Lapegna, Marco; Mele, Valeria; Romano, Diego. - 11874:(2019), pp. 89-99. (Intervento presentato al convegno 12th International Conference, IDCS 2019 tenutosi a Napoli) [10.1007/978-3-030-34914-1_9].

A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments

Laccetti, Giuliano;Lapegna, Marco;Mele, Valeria;Romano, Diego
2019

Abstract

K-means algorithm is one of the most widely used methods in data mining and statistical data analysis to partition several objects in K distinct groups, called clusters, on the basis of their similarities. The main problem of this algorithm is that it requires the number of clusters as an input data, but in the real life it is very difficult to fix in advance such value. In this work we propose a parallel modified K-means algorithm where the number of clusters is increased at run time in a iterative procedure until a given cluster quality metric is satisfied. To improve the performance of the procedure, at each iteration two new clusters are created, splitting only the cluster with the worst value of the quality metric. Furthermore, experiments in a multi-core CPUs based environment are presented.
2019
978-3-030-34913-4
978-3-030-34914-1
A High Performance Modified K-Means Algorithm for Dynamic Data Clustering in Multi-core CPUs Based Environments / Laccetti, Giuliano; Lapegna, Marco; Mele, Valeria; Romano, Diego. - 11874:(2019), pp. 89-99. (Intervento presentato al convegno 12th International Conference, IDCS 2019 tenutosi a Napoli) [10.1007/978-3-030-34914-1_9].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/798064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact