Improved Mean Shift Algorithm for Maximizing Clustering Accuracy

Chinmay Bepery; Shaneworn  Bhadra; Md. Mahbubur Rahman; Mihir Kanti Sarkar; Mohammad Jamal Hossain

doi:10.38032/jea.2021.01.001

Authors

Chinmay Bepery Department of Computer Science and Information Technology (CSIT), Patuakhali Science and Technology University (PSTU)
Shaneworn Bhadra Department of Computer Science and Information Technology (CSIT), Patuakhali Science and Technology University (PSTU)
Md. Mahbubur Rahman Department of Computer Science and Information Technology (CSIT), Patuakhali Science and Technology University (PSTU)
Mihir Kanti Sarkar Ministry of Housing and Public Works, Bangladesh
Mohammad Jamal Hossain Department of Computer Science and Information Technology (CSIT), Patuakhali Science and Technology University (PSTU)

DOI:

https://doi.org/10.38032/jea.2021.01.001

Keywords:

Clustering, Mean Shift, KD-tree, kNN, Mean Imputation

Abstract

Clustering is a machine learning method that can group similar data points. Mean Shift (MS) is a fixed window-based clustering algorithm, which calculates the number of clusters automatically but cannot guarantee the convergence of the algorithm. The main drawback of the Mean Shift Algorithm is that the algorithm requires to set a stopping criterion (threshold point) otherwise all clusters move towards one cluster and fixed bandwidth is used here. It cannot define the upper bound of iteration numbers and need to set the iteration numbers. This paper proposed a new Mean Shift Algorithm, called Improved Mean Shift (IMS) algorithm, which overcomes the all defined pitfalls of Mean Shift Algorithm. The IMS process KD-tree data structure was used to sort the dataset and all data points as initial cluster centroids without a random selection of initial centroids. In each iteration, it shifts the variable bandwidth sliding window to the actual data point nearest to the mean using k-nearest neighbours (kNN) algorithm and finds the number of clusters automatically. Also, this paper handles the missing values using Mean Imputation (MI). The IMS algorithm produces better results than the Mean Shift Algorithm on both synthetic and real datasets.

References

Jiawei, H. and Micheline, K., 2005. Data Mining: Concepts and Techniques [M] Beijing. Mechanical Industry Press.

Han, J. and Kamber, M., 2001."Data Mining Concepts and Techniques, Morgan Kaufmann Publishers," San Francisco, CA, pp. 335-391.

Bindra, K. and Mishra, A., 2017, September. A detailed study of clustering algorithms. In 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 371-376). IEEE. DOI: https://doi.org/10.1109/ICRITO.2017.8342454

Fukunaga, K. and Hostetler, L., 1975. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on information theory, 21(1), pp.32-40. DOI: https://doi.org/10.1109/TIT.1975.1055330

AbdAllah, L. and Shimshoni, I., A distance function for data with missing values and its applications on knn and kmeans algorithms. Submitted to Int. J. Advances in Data Analysis and Classification.

Zhang, S., Qin, Z., Ling, C.X. and Sheng, S., 2005. "Missing is useful": missing values in cost-sensitive decision trees. IEEE transactions on knowledge and data engineering, 17(12), pp.1689-1693. DOI: https://doi.org/10.1109/TKDE.2005.188

Xiao, C. and Liu, M., 2010. Efficient mean‐shift clustering using gaussian kd‐tree. In Computer Graphics Forum Vol. 29, No. 7, pp. 2065-2073. DOI: https://doi.org/10.1111/j.1467-8659.2010.01793.x

Georgescu, B., Shimshoni, I. and Meer, P., 2003, October. Mean shift based clustering in high dimensions: A texture classification example. In null (p. 456). IEEE. DOI: https://doi.org/10.1109/ICCV.2003.1238382

Chau, V.T.N., Loc, P.H. and Tran, V.T.N., 2015, November. A robust mean shift-based approach to effectively clustering incomplete educational data. In 2015 International Conference on Advanced Computing and Applications (ACOMP) (pp. 12-19). IEEE. DOI: https://doi.org/10.1109/ACOMP.2015.14

Comaniciu, D., Ramesh, V. and Meer, P., 2001, July. The variable bandwidth mean shift and data-driven scale selection. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 1, pp. 438-445). IEEE.

AbdAllah, L. and Shimshoni, I., 2014, September. Mean shift clustering algorithm for data with missing values. In International Conference on Data Warehousing and Knowledge Discovery (pp. 426-438). Springer, Cham. DOI: https://doi.org/10.1007/978-3-319-10160-6_38

Masud, M.A., Rahman, M.M., Bhadra, S. and Saha, S., 2019, December. Improved k-means Algorithm using Density Estimation. In 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/STI47673.2019.9068033

Singh, G., 2017. "Introductory guide to Information Retrieval using kNN and KDTree," analytics vidhya.

Adams, A., Gelfand, N., Dolson, J. and Levoy, M., 2009. Gaussian kd-trees for fast high-dimensional filtering. In ACM SIGGRAPH 2009 papers (pp. 1-12). DOI: https://doi.org/10.1145/1531326.1531327

Masud, M.A., Huang, J.Z., Zhong, M., Fu, X. and Mahmud, M.S., 2018, November. Slice_OP: Selecting Initial Cluster Centers Using Observation Points. In International Conference on Advanced Data Mining and Applications (pp. 17-30). Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-05090-0_2

Improved Mean Shift Algorithm for Maximizing Clustering Accuracy

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Conference Series Advertisement

submit-block

info-block

journalfeature

indexing-abstracting