Clustering on graphs (networks) is becoming intractable due to increasing sizes. Nyström spectral graph clustering (NSC) is a popular method to circumvent the problem. However, NSC currently faces two issues: (1) how to efficiently obtain representative samples for large networks; (2) the NSC is irrational for the mapping of nodes. To address the issues, in this paper, we propose an improved Nyström spectral graph clustering based on k-core decomposition sampling for large networks. In the proposed method, we first employ k-core decomposition as a sampling method, then use the samples to conduct the NSC to acquire the clusters of the sample nodes and the nodes connected with the samples, and finally utilize the proposed label propagation algorithm to group the remaining nodes into previously found clusters. To evaluate the performance, we use spectral clustering and NSC as baselines and compare our algorithm with the baselines on 9 small networks. Moreover, the proposed method is applied to 5 large networks. The proposed method can achieve a Modularity of 0.355 and cost 1234.95 s for a large network with approximately 120 million edges, which demonstrates that the proposed method has higher accuracy than NSC and higher efficiency than spectral clustering.

An improved Nyström spectral graph clustering using k-core decomposition as a sampling strategy for large networks

Piccialli F.
2022

Abstract

Clustering on graphs (networks) is becoming intractable due to increasing sizes. Nyström spectral graph clustering (NSC) is a popular method to circumvent the problem. However, NSC currently faces two issues: (1) how to efficiently obtain representative samples for large networks; (2) the NSC is irrational for the mapping of nodes. To address the issues, in this paper, we propose an improved Nyström spectral graph clustering based on k-core decomposition sampling for large networks. In the proposed method, we first employ k-core decomposition as a sampling method, then use the samples to conduct the NSC to acquire the clusters of the sample nodes and the nodes connected with the samples, and finally utilize the proposed label propagation algorithm to group the remaining nodes into previously found clusters. To evaluate the performance, we use spectral clustering and NSC as baselines and compare our algorithm with the baselines on 9 small networks. Moreover, the proposed method is applied to 5 large networks. The proposed method can achieve a Modularity of 0.355 and cost 1234.95 s for a large network with approximately 120 million edges, which demonstrates that the proposed method has higher accuracy than NSC and higher efficiency than spectral clustering.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11588/893876
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact