Analysis of existing data clustering algorithms. Advantages and disadvantages

DOI: 10.31673/2412-9070.2020.061719

Authors

  • Є. С. Тихонов, (Tykhonov Ye. S.) State University of Telecommunications, Kyiv
  • К. В. Тихонова, (Tykhonova K. V.) State University of Telecommunications, Kyiv

DOI:

https://doi.org/10.31673/2412-9070.2020.061719

Abstract

Analyzing data clustering algorithms is increasingly becoming a popular practice adopted by many organizations to create valuable information from large amounts of data. A great deal of research aims to organize the data obtained into supervisory structures. In fact, cluster analysis is a set of different classification algorithms. The clustering technique is used in various fields, such as psychology, biology, pedagogy, marketing, information technology. Clustering is the division of data into groups of similar objects. Clustering is performed to understand the data obtained, the volume of which is problematic for human analysis. Thanks to this, clustering algorithms have become a meta-learning tool for analyzing research data. Each group, called a cluster, is defined as a set of objects that have a higher degree of similarity to each other than objects that are not in the same set. The type of clustering algorithm used depends on the application and the data set used in this field. The numerical data set is relatively simple to implement since the data is invariably real numbers and can be used for statistical applications. It is important to understand the difference between clustering (uncontrolled classification) and discriminant analysis (controlled classification). At one stage, the researchers were refining some of the data clustering algorithms, the second was implementing new ones, and at the third, they were studying and comparing different data clustering algorithms. This article provides a classification and analysis of existing cluster analysis algorithms, as well as the advantages and disadvantages of these algorithms.

Keywords: clustering; cluster analysis; hierarchical clustering; non-hierarchical clustering; Clustering Using REpresentatives (CURE) algorithm; Minimum Spanning Tree (MST) algorithm; Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm; k-means algorithm; Partition Around Medoids (PAM) algorithm; Clustering with sLOPE (CLOPE) algorithm.

References
1. Чубукова І. А. Data Mining: навч. посіб.: Інтернет-університет інформаційних технологій. БІНОМ: Лабораторія знань, 2006. 382 с.
2. Rokach Lior, Oded Maimon. «Clustering methods» Data mining and knowledge discovery handbook. Springer US, 2005. Р. 321–352.
3. Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. CURE: An Efficient Clustering Algorithm for Large Databases.
4. Tian Zhang, Raghu Ramakrishnan, Miron Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases.
5. Akerkar R. Big data computing. CRC Press, Taylor & Francis Group, Florida, USA, 2014.

Published

2020-06-25

Issue

Section

Articles