An Efficient feature selection algorithm for the spam email classification

Hadeel M. Saleh

Abstract


The existing spam email classification systems are suffering from the problems of low accuracy due to the high dimensionality of the associated feature selection (FS) process. But being a global optimization process in machine learning, FS is mainly aimed at reducing the redundancy of dataset to create a set of acceptable and accurate results. This study presents the combination of Chaotic Particle Swarm Optimization (PSO) algorithm with Artificial Bees Colony (ABC) for the reduction of features dimensionality in a bid to improve spam emails classification accuracy. The features for each particle in this work were represented in a binary form, meaning that they were transformed into binary using a sigmoid function. The features selection was based on a fitness function that depended on the obtained accuracy using SVM. The proposed system was evaluated for performance by considering the performance of the classifier and the selected features vectors dimension which served as the input to the classifier; this evaluation was done using the Spam Base dataset and from the results, the PSO-ABC classifier performed well in terms of FS even with a small set of selected features.

Keywords


Feature Selection, Hybrid Algorithm, Swarm Intelligence, Machine Learning Classification, Spam Filtering

Full Text:

PDF

References


D. Debarr and H. Wechsler, “Spam detection using Random Boost,” Pattern Recognit. Lett., vol. 33, no. 10, pp. 1237–1244, 2012.

Q. Wu, S. Wu, and J. Liu, “Hybrid model based on SVM with Gaussian loss function and adaptive Gaussian PSO,” Eng. Appl. Artif. Intell., vol. 23, no. 4, pp. 487–494, 2010.

S. M. Lee, D. S. Kim, J. H. Kim, and J. S. Park, “Spam Detection Using Feature Selection and Parameters Optimization,” in 2010 International Conference on Complex, Intelligent and Software Intensive Systems, 2010, pp. 883–888.

N. Jindal and B. Liu, “Analyzing and detecting review spam,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2007, pp. 547–552.

M. Hall, “Correlation-based Feature Selection for Machine Learning,” Methodology, vol. 21i195-i20, no. April, pp. 1–5, 1999.

M. Chang and C. K. Poon, “Using phrases as features in email classification,” J. Syst. Softw., vol. 82, no. 6, pp. 1036–1045, 2009.

H. Liu, X. Shi, D. Guo, Z. Zhao, and Yimin, “Feature selection combined with neural network structure optimization for HIV-1 protease cleavage site prediction,” Biomed Res. Int., vol. 2015, 2015.

S. Li, P. Wang, and L. Goel, “Wind Power Forecasting Using Neural Network Ensembles with Feature Selection,” IEEE Trans. Sustain. Energy, vol. 6, no. 4, pp. 1447–1456, 2015.

M. Zhao, C. Fu, L. Ji, K. Tang, and M. Zhou, “Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes,” Expert Syst. Appl., vol. 38, no. 5, pp. 5197–5204, 2011.

J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Naïve Bayes,” Expert Syst. Appl., vol. 36, no. 3 PART 1, pp. 5432–5435, 2009.

M. L. Zhang, J. M. Peña, and V. Robles, “Feature selection for multi-label naive Bayes classification,” Inf. Sci. (Ny)., vol. 179, no. 19, pp. 3218–3229, 2009.

G. Feng, J. Guo, B.-Y. Jing, and T. Sun, “Feature subset selection using naive Bayes for text classification,” Pattern Recognit. Lett., vol. 65, pp. 109–115, 2015.

S. Q. Salih, “A New Training Method Based on Black Hole Algorithm for Convolutional Neural Network,” J. Sourthwest Jiaotong Univ., vol. 54, no. 3, pp. 1–10, 2019.

S. I. Abba, S. J. Hadi, S. S. Sammen, S. Q. Salih, R. A. Abdulkadir, Q. B. Pham, and Z. M. Yaseen, “Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination,” J. Hydrol., vol. 587, p. 124974, Aug. 2020.

Z. M. Yaseen, Z. H. Ali, S. Q. Salih, and N. Al-Ansari, “Prediction of Risk Delay in Construction Projects Using a Hybrid Artificial Intelligence Model,” Sustainability, vol. 12, no. 4, p. 1514, Feb. 2020.

A. Unler, A. Murat, and R. B. Chinnam, “Mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification,” Inf. Sci. (Ny)., vol. 181, no. 20, pp. 4625–4641, 2011.

H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.

P. Moradi and M. Rostami, “Integration of graph clustering with ant colony optimization for feature selection,” Knowledge-Based Syst., vol. 84, pp. 144–161, 2015.

S. M. Vieira, J. M. C. Sousa, and T. A. Runkler, “Two cooperative ant colonies for feature selection using fuzzy models,” Expert Syst. Appl., vol. 37, no. 4, pp. 2714–2723, 2010.

M. M. Kabir, M. Shahjahan, and K. Murase, “A new local search based hybrid genetic algorithm for feature selection,” Neurocomputing, vol. 74, no. 17, pp. 2914–2928, 2011.

C.-H. Lin, H.-Y. Chen, and Y.-S. Wu, “Study of image retrieval and classification based on adaptive features using genetic algorithm feature selection,” Expert Syst. Appl., vol. 41, no. 15, pp. 6611–6621, 2014.

M. Schiezaro and H. Pedrini, “Data feature selection based on Artificial Bee Colony algorithm,” J. Image Video Process., vol. 1, no. 47, pp. 1–8, 2013.

V. Agrawal and S. Chandra, “Feature Selection using Artificial Bee Colony Algorithm for Medical Image Classification,” 2015 Eighth Int. Conf. Contemp. Comput., vol. 1, pp. 2–7, 2015.

B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms,” Appl. Soft Comput. J., vol. 18, pp. 261–276, 2014.

B. Xue, M. Zhang, S. Member, and W. N. Browne, “Particle Swarm Optimization for Feature Selection in Classification : A Multi-Objective Approach,” pp. 1–16, 2012.

C. C. O. Ramos, A. N. Souza, G. Chiachia, A. X. Falc??o, and J. P. Papa, “A novel algorithm for feature selection using Harmony Search and its application for non-technical losses detection,” Comput. Electr. Eng., vol. 37, no. 6, pp. 886–894, 2011.

H. H. Inbarani, M. Bagyamathi, and A. T. Azar, “A novel hybrid feature selection method based on rough set and improved harmony search,” Neural Comput. Appl., vol. 26, no. 8, pp. 1859–1880, 2015.

R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS’95., Proceedings of the 6th International Symposium, 1995, pp. 39–43.

J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, 1995, vol. 4, pp. 1942–1948 vol.4.

İ. B. Aydilek, “A hybrid firefly and particle swarm optimization algorithm for computationally expensive numerical problems,” Appl. Soft Comput. J., vol. 66, pp. 232–249, 2018.

K. Premalatha and a M. Natarajan, “Hybrid PSO and GA for Global Maximization,” Int. J. Open Probl. Compt. Math, vol. 2, no. 4, pp. 597–608, 2009.

D. Chen, J. Chen, H. Jiang, F. Zou, and T. Liu, “An improved PSO algorithm based on particle exploration for function optimization and the modeling of chaotic systems,” Soft Comput., vol. 19, no. 11, pp. 3071–3081, 2015.

S. Q. Salih, A. A. Alsewari, B. Al-Khateeb, and M. F. Zolkipli, “Novel Multi-Swarm Approach for Balancing Exploration and Exploitation in Particle Swarm Optimization,” in In Proceesdings of 3rd International Conference of Reliable Information and Communication Technology 2018 (IRICT 2018), 2018, pp. 196–206.

S. Q. Salih and A. A. Alsewari, “Solving large-scale problems using multi-swarm particle swarm approach,” Int. J. Eng. Technol., vol. 7, no. 3, pp. 1725–1729, 2018.

S. Q. Salih, A. A. Alsewari, and Z. M. Yaseen, “Pressure Vessel Design Simulation: Implementing of Multi-Swarm Particle Swarm Optimization,” Proc. 2019 8th Int. Conf. Softw. Comput. Appl., pp. 120–124, 2019.

S. Q. Salih and A. A. Alsewari, “A new algorithm for normal and large-scale optimization problems: Nomadic People Optimizer,” Neural Comput. Appl., vol. 32, no. 14, pp. 10359–10386, Jul. 2020.

D. Karaboga, “An idea based on honey bee swarm for numerical optimization,” Tech. Rep. TR06, Erciyes Univ., no. TR06, p. 10, 2005.

D. Karaboga and B. Basturk, “A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm,” J. Glob. Optim., vol. 39, no. 3, pp. 459–471, 2007.

R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Micro Machine and Human Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on, 1995, pp. 39–43.

UCI Machine Learning Repository, “Spambase Dataset,” University of California, School of Information and Computer Science. .

S. Kumar, X. Gao, I. Welch, and M. Mansoori, “A Machine Learning based Web Spam Filtering Approach,” in Advanced Information Networking and Applications (AINA), 2016 IEEE 30th International Conference on, 2016, pp. 973–980.

K. R. D and P. Visalakshi, “A Hybrid ACO Based Feature Selection Method for Email Spam Classification,” WSEAS TRANSACTIONS on COMPUTERS, vol. 14. pp. 171–177, 2015.

D. Karthika Renuka, P. Visalakshi, and T. Sankar, “Improving E-Mail Spam Classification using Ant Colony Optimization Algorithm,” Int. J. Comput. Appl.

I. Idris, A. Selamat, N. Thanh Nguyen, S. Omatu, O. Krejcar, K. Kuca, and M. Penhaker, “A combined negative selection algorithm-particle swarm optimization for an email spam detection system,” Eng. Appl. Artif. Intell., 2015.

A. K. Uysal and S. Gunal, “A novel probabilistic feature selection method for text classification,” Knowledge-Based Syst., 2012.




DOI: http://dx.doi.org/10.21533/pen.v9i3.2202

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 Hadeel M. Saleh

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN: 2303-4521

Digital Object Identifier DOI: 10.21533/pen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License