فا   |   En
Login

Email Spam Detection Using Linear Discriminiant Analysis Based on Clustering

Author: Maryam Imani and Gholam Ali Montazer

The high volume of unwanted spam emails annoys the Internet users; causes spam activities and financial losses. So, spam detection is a serioustask to provide a secure electronic environment. Email spam databases usually have multimodal distributions with high overlap, which causedifficulties in separating spam emails from normal emails. Moreover, the number of available labeled emails may be limited. A supervisedfeature extraction method, which is called cluster space linear discriminant analysis (CSLDA), is proposed in this paper to deal with thesedifficulties. CSLDA uses the ability of unlabeled testing samples in addition to labeled training ones for estimation of the within-class andbetween-class scatter matrices. Based on the multimodal distribution of email spam databases, CSLDA clusters the unlabeled testing data forusing them in the learning phase of feature extraction. CSLDA uses the testing samples without determination of their labels, and just withobtaining relationship between training and testing samples through clustering. The use of Fisher criterion increases the class discrimination.Moreover, the use of clustered unlabeled samples solves the small sample size problem and provides good performance for multimodal data.The experimental results on spambase dataset indicate the superiority of CSLDA compared to some popular and state-of-the-art featureextraction and spam detection methods, especially in small sample size situations.

فایل مقاله