Naive Bayes Classifier as one way to filter spam mail

DOI: 10.31673/2412-9070.2019.065860

Authors

  • О. О. Підмогильний, (Pidmohylʹnyy O. O.) State University of Telecommunications, Kyiv
  • О. М. Ткаченко, (Tkachenko O. M.) State University of Telecommunications, Kyiv
  • О. І. Голубенко, (Holubenko O. I.) State University of Telecommunications, Kyiv
  • О. В. Дробик, (Drobyk O. V.) State University of Telecommunications, Kyiv

DOI:

https://doi.org/10.31673/2412-9070.2019.065860

Abstract

In 2018-2019 years, mail spam was sent to the average for 53,8% of the previous traffic. Persons ranked highest in the ranking of spam is China (15,82%), USA (12,64%), Germany (5,86%), Russia (6,98%) and Brazil (6,95%). From these statistics we can calculate that on average electronic spending is about 0,17 terawatt, that is, about $282 billion is spent on spam e-mails. The article describes the problem of classifying e-mails to determine the affiliation of e-mails to spam. Three approaches are considered to solve this problem. This article considers three methods of text classification, their strengths, and weaknesses. As a result of the analysis of the methods, the most attractive to use is the Naive Bayes Classifier due to the fact that this method is easy to implement and test, the learning process is quite effective in comparison with other more complex classifiers and on small document cases the difference between other much more sophisticated classification algorithms is often irrelevant, and sometimes the Naive Bayes Classifier may be more accurate, as well as the example of how the Naive Bayes Classifier works, and it is considered in detail as the most efficient way to classify letters with high precision while being a cost-effective solution to detect spam messages.

Keywords: Naive Bayes Classifier; performance; Rule based classification; Weight based classification; spam filtering; classifier.

References
1. Guzella T. S., Caminhas W. M. A review of machine learning approaches to Spam filtering // Expert Systems with Applications. 2009. Vol. 36, no. 7. P. 10206–10222. DOI:10.1016/j.eswa.2009.02.037.
2. Metsis V., Androutsopoulos I., Paliouras G. Spam Filtering with Naive Bayes — Which Naive Bayes? // CEAS 2006: Third Conference on Email and Anti-Spam (July 27-28, 2006). Mountain View, California, USA, 2006.
3. A Bayesian approach to filtering junk email / M. Sahami, S. Dumais, D. Heckerman, E. Horvitz // AAAI Workshop on Learning for Text Categorization. Technical Report. 1998.

Published

2020-01-31

Issue

Section

Articles