Email spam filtering by machine learning
DOI: 10.31673/2412-9070.2019.066199
DOI:
https://doi.org/10.31673/2412-9070.2019.066199Abstract
This article discusses email anti-spam filters. The focus is on machine learning methods for successful detection and filtering of spam e-mails. The article covers important concepts, trends, and evaluations of the effectiveness of spam filtering research by leading e-mail providers such as Gmail, Yahoo, and Outlook. The article compares the strengths and weaknesses of existing approaches to spam filtering. The article deals with the stages of evolution of spam-messages. As the special attention has been given to processes which participate in filtering of a spam. In article, the analysis of three basic algorithms for a filtration of a spam has been spent. The research has shown that the highest efficiency concerning SP, SR, A parameters has shown the algorithm of a random forest. Problems in the development of the fight against spam-messages were raised, as well as problems associated with various studies in this area. The article provided statistics on the number of spam messages for the period from 2018 to 2019. According to the statistics, in 2019 the number of spam messages continues to increase and reaches the mark of 500 billion messages per month. Based on these data, it is safe to say that the relevance of the problem of spam not only does not lose its relevance over the years, and gaining it. Annually, the number of spam messages increases by 20 – 40% and is about 77% of all mail traffic. Spam prevents users from using e-mail tools effectively. Fishing, which is the system that provokes users to leave their contact, credit card and passport information on malicious sites, is the most damaging to the user. The spam filtering algorithms discussed in the article continue to evolve on a daily basis, with new input data being added to help machine learning of the algorithms, and to solve increasingly acute and global problems in the direction of combating spam.
Keywords: machine learning; Spam filtering; neural networks; computer security; algorithm analysis.
References
1. Total Global Email & Spam Volume for October 2019 [Електронний ресурс]. URL: https://talosintelligence.com/reputation_center/email_rep (дата звернення: 04.11.2019).
2. Email Spam Detection Using Mashine Learning [Електронний ресурс]. URL: https://ese.wustl.edu/ContentFiles/Research/UndergraduateResearch/CompletedProjects/WebPages/sp14/SongSteimle/WebPage/classifiers.html (дата звернення: 04.11.2019).
3. Support Vector Machines — Soft Margin Formulation and Kernel Trick [Електронний ресурс]. URL: https://towardsdatascience.com/support-vectormachines-soft-margin-formulation-and-kernel-trick4c9729dc8efe (дата звернення: 04.11.2019).
4. Comparison of machine learning techniques in email spam detection [Електронний ресурс]. URL: https://dev.to/matchilling/comparison-of-machine-learning-techniques-in-email-spam-detection--2p0h#fn3 (дата звернення: 04.11.2019).
5. Albelwi S., Mahmood A. A framework for designing the architectures of deep convolutional neural networks // Entropy. 2017. 19 (6). Р. 242 [Електронний ресурс]. URL: https://www.mdpi.com/1099-4300/19/6/242 (дата звернення: 04.11.2019).
6. Sharma A., Suryawansi A. A novel method for detecting spam email using KNN classification with spearman correlation as distance measure // Int. J. Comput. Appl. 2016. 136 (6). Р. 28–34 [Електронний ресурс]. URL: https://pdfs.semanticscholar.org/3f1c/20b2c3b28a0328bfc5db19b02621e5874cee.pdf (дата звернення: 04.11.2019).
7. Deng L., Deep D. Yu. Learning: Methods and Applications Now publishers. Boston, 2014 [Електронний ресурс]. URL: https://www.nowpublishers.com/article/Details/SIG-039 (дата звернення: 04.11.2019).
8. Машинне навчання простими словами [Електронний ресурс]. URL: http://www.mmf.lnu.edu.ua/ar/1739 (дата звернення: 04.11.2019).
9. Akshita Tyagi. Content Based Spam Classification- A Deep Learning Approach A Thesis Submitted To The Faculty Of Graduate Studies University Of Calgary. Alberta, Canada, 2016 [Електронний ресурс]. URL: https://prism.ucalgary.ca/handle/11023/3478 (дата звернення: 04.10.2019).