Improved Mean Shift Algorithm for Maximizing Clustering Accuracy
Clustering is a machine learning method that can group similar data points. Mean Shift (MS) is a fixed window-based clustering algorithm, which calculates the number of clusters automatically but cannot guarantee the convergence of the algorithm. The main drawback of the Mean Shift Algorithm is that the algorithm requires to set a stopping criterion (threshold point) otherwise all clusters move towards one cluster and fixed bandwidth is used here. It cannot define the upper bound of iteration numbers and need to set the iteration numbers. This paper proposed a new Mean Shift Algorithm, called Improved Mean Shift (IMS) algorithm, which overcomes the all defined pitfalls of Mean Shift Algorithm. The IMS process KD-tree data structure was used to sort the dataset and all data points as initial cluster centroids without a random selection of initial centroids. In each iteration, it shifts the variable bandwidth sliding window to the actual data point nearest to the mean using k-nearest neighbours (kNN) algorithm and finds the number of clusters automatically. Also, this paper handles the missing values using Mean Imputation (MI). The IMS algorithm produces better results than the Mean Shift Algorithm on both synthetic and real datasets.
Jiawei, H. and Micheline, K., 2005. Data Mining: Concepts and Techniques [M] Beijing. Mechanical Industry Press.
Han, J. and Kamber, M., 2001."Data Mining Concepts and Techniques, Morgan Kaufmann Publishers," San Francisco, CA, pp. 335-391.
Bindra, K. and Mishra, A., 2017, September. A detailed study of clustering algorithms. In 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO) (pp. 371-376). IEEE.
Fukunaga, K. and Hostetler, L., 1975. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on information theory, 21(1), pp.32-40.
AbdAllah, L. and Shimshoni, I., A distance function for data with missing values and its applications on knn and kmeans algorithms. Submitted to Int. J. Advances in Data Analysis and Classification.
Zhang, S., Qin, Z., Ling, C.X. and Sheng, S., 2005. "Missing is useful": missing values in cost-sensitive decision trees. IEEE transactions on knowledge and data engineering, 17(12), pp.1689-1693.
Xiao, C. and Liu, M., 2010. Efficient mean‐shift clustering using gaussian kd‐tree. In Computer Graphics Forum Vol. 29, No. 7, pp. 2065-2073.
Georgescu, B., Shimshoni, I. and Meer, P., 2003, October. Mean shift based clustering in high dimensions: A texture classification example. In null (p. 456). IEEE.
Chau, V.T.N., Loc, P.H. and Tran, V.T.N., 2015, November. A robust mean shift-based approach to effectively clustering incomplete educational data. In 2015 International Conference on Advanced Computing and Applications (ACOMP) (pp. 12-19). IEEE.
Comaniciu, D., Ramesh, V. and Meer, P., 2001, July. The variable bandwidth mean shift and data-driven scale selection. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 1, pp. 438-445). IEEE.
AbdAllah, L. and Shimshoni, I., 2014, September. Mean shift clustering algorithm for data with missing values. In International Conference on Data Warehousing and Knowledge Discovery (pp. 426-438). Springer, Cham.
Masud, M.A., Rahman, M.M., Bhadra, S. and Saha, S., 2019, December. Improved k-means Algorithm using Density Estimation. In 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI) (pp. 1-6). IEEE.
Singh, G., 2017. "Introductory guide to Information Retrieval using kNN and KDTree," analytics vidhya.
Adams, A., Gelfand, N., Dolson, J. and Levoy, M., 2009. Gaussian kd-trees for fast high-dimensional filtering. In ACM SIGGRAPH 2009 papers (pp. 1-12).
Masud, M.A., Huang, J.Z., Zhong, M., Fu, X. and Mahmud, M.S., 2018, November. Slice_OP: Selecting Initial Cluster Centers Using Observation Points. In International Conference on Advanced Data Mining and Applications (pp. 17-30). Springer, Cham.
- Abstract view
Copyright (c) 2021 SciEn Publishing Group
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
For open access publishing in this journal, authors will retain copyright alongside scholarly usage rights and SciEnPG will be granted publishing and distribution rights.