Clustering algorithms subjected to K-mean and gaussian mixture model on multidimensional data set

Saadaldeen Rashid Ahmed Ahmed, Israa Al Barazanchi, Zahraa A. Jaaz, Haider Rasheed Abdulshaheed

Abstract


This paper explored the method of clustering. Two main categories of algorithms will be used, namely k-means and Gaussian Mixture Model clustering. We will look at algorithms within thesis categories and what types of problems they solve, as well as what methods could be used to determine the number of clusters. Finally, we will test the algorithms out using sparse multidimensional data acquired from the usage of a video games sales all around the world, we categories the sales in three main standards of high sales, medium sales and low sales, showing that a simple implementation can achieve nontrivial results. The result will be presented in the form of an evaluation of there is potential for online clustering of video games sales. We will also discuss some task specific improvements and which approach is most suitable.

Full Text:

PDF

References


Marcel R Ackermann, Johannes Bl¨omer, Daniel Kuntze, and Christian Sohler. Analysis of agglomerative clustering. Algorithmica, 69(1):184–215, 2014.

Michael R Anderberg. Cluster analysis for applications. monographs and textbooks on probability and mathematical statistics, 1973.

David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035. Society for Industrial and Applied Mathematics, 2007.

Ian Davidson and SS Ravi. Agglomerative Gaussian Mixture Model clustering with constraints: Theoretical and empirical results. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 59–70. Springer, 2005.

William HE Day and Herbert Edelsbrunner. Efficient algorithms for agglomerative Gaussian Mixture Model cluster- ing methods. Journal of classification, 1(1):7–24, 1984.

Martin Ester, Hans-Peter Kriegel, J¨org Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.

Abdulshaheed, H.R., Binti, S.A., and Sadiq, I.I., 2018. A Review on Smart Solutions Based-On Cloud Computing and Wireless Sensing. International Journal of Pure and Applied Mathematics, 119 (18), pp.461–486.

Limin Fu and Enzo Medico. Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC bioinformatics, 8(1):1, 2007.

Anil K Jain and HC Martin. Law, data clustering: a user’s dilemma. In Proceedings of the First international conference on Pattern Recognition and Machine Intelligence, 2005.

Ismo K¨arkk¨ainen and Pasi Fr¨anti. Dynamic local search algorithm for the clustering problem. University of Joensuu, 2002.

Robert E Kass and Larry Wasserman. A reference bayesian test for nested hypotheses and its relationship to the schwarz criterion. Journal of the american statistical association, 90(431):928–934, 1995.

Jon Kleinberg and Eva Tardos. Algorithm design. Pearson Education India, 2006.

Godfrey N Lance and William T Williams. Computer programs for Gaussian Mixture Model polythetic classification (“similarity analyses”). The Computer Journal, 9(1):60–64, 1966.

Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129– 137, 1982.

Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. The planar k-means problem is np-hard. In International Workshop on Algorithms and Computation, pages 274–285. Springer, 2009.

Raymond T Ng and Jiawei Han. E cient and e ective clustering methods for spatial data mining. In

Proc. of, pages 144–155, 1994.

Raymond T. Ng and Jiawei Han. Clarans: A method for clustering objects for spatial data mining.

IEEE transactions on knowledge and data engineering, 14(5):1003–1016, 2002.

Dan Pelleg, Andrew W Moore, et al. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML, volume 1, 2000.

Abdulshaheed, H.R., Binti, S.A., and Sadiq, I.I., 2018. Proposed a Smart Solutions Based-on Cloud Computing and Wireless Sensing. International Journal of Pure and Applied Mathematics, 119 (18), pp.427–449.

Cor J. Veenman, Marcel J. T. Reinders, and Eric Backer. A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1273–1280, 2002.

Cai W., Chen S., and Zhang D., “Fast and Robust Fuzzy Cmeans Clustering Algorithms Incorporating Local Information for Segmentation,” Pattern Recognition, vol. 40, no. 3, pp. 825-838, 2007

Tolias A. and Panas M., “On Applying Spatial Constraints in Gaussian Clustering using a expectation maximization Based System,” IEEE Signal Processing Letters, vol. 5, no. 10, pp. 245-247, 1998.

D. Petrovic, Basic, H., Durakovic, B., and Prodanovic, S., “Science-Technology Park Ilidža as a Generator of Innovation Potential and SME’s Development in Bosnia and Herzegovina”, Periodicals of Engineering and Natural Sciences, vol. 1, 1 vol., no. 2, pp. 51-55, 2013.

B. Durakovic, “Emerging Issues, Trends and Challenges for Sustainable Engineering”, The Sixth Regional Conference on Soft Computing 2017. 2017.

M. Inalpolat and Durakovic, B., “Implementation of Advanced Automated Material Handling Systems in Manufacturing Environment”, European Conference of Technology and Society - EuroTecS. 2013.




DOI: http://dx.doi.org/10.21533/pen.v7i2.484

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 Saadaldeen rashid ahmed ahmed, Israa Al_Barazanchi, Zahraa A. Jaaz, Haider Rasheed Abdulshaheed

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2303-4521

Digital Object Identifier DOI: 10.21533/pen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License