Usage option of bayesian methods for machine learning of artificial intelligence of decision support system in the fight against spam
DOI №______
Abstract
This article offers an option for using Bayesian methods to determine if sign of spam is present in a message. This is taken into account during machine learning of artificial intelligence. A message test is considered as a process when a decision support system (further — DSS) receives a message, scans it and, in the absence of a certain sign of spam, makes a decision that the test has passed positively, and this message can be used. If there is a certain sign of spam, it makes a decision that the test was negative and this message can be placed among messages related to spam. The user scans the list of spam messages, reviews some of them, and when he decides that a message is not spam, it moves it to the list of workable messages. Also, the user is familiar with working messages, and when it is determined that a message is spam, it transfers it to the list of spam messages.
The purpose of the study is to describe a DSS model based on Bayesian methods that takes away the messages with signs of spam and, in the process of work provides artificial intelligence machine learning on the signs of spam.
During the conduct of 1000 tests (one test — one message), 360 tests showed that the messages were suitable and they were really useful during the work. 440 tests showed that the messages were unsuitable and they really were not suitable during the work. 60 tests showed that the messages were unsuitable, but they were suitable in the course of work (error of type I). 80 tests showed that the messages were suitable, but they were not suitable (error of type II).
According to the results, it was stated that:
- after receiving a positive test result, the probability, that we will receive a suitable message, is P(D1|T1) = P(D1, T1)/P(T1) = 0,81818;
- the probability that after receiving a positive test result we will get an unsuitable message (error of type II) is P(D2|T1) = P(D2, T1)/P(T1) = 0.1818;
- after receiving the negative test result, the probability that we will receive an unsuitable message P(D2|T2) = P(D2, T2)/P(T2) = 0,8928;
- the probability that after receiving the negative test result, we will get a suitable message (error of type I) is P(D1|T2) = P(D1,T2)/P(T2) = 0.1071.
According to the results of the research, the following conclusions were drawn:
- the accumulation of test results and their inclusion in a tables based on Bayesian methods positively affects on the accuracy of obtaining the appropriate probabilities;
- different probabilities characterize the sign on which spam testing is performed. At large values of type I and II errors it is possible to draw a conclusion regarding the quality of the test is low. So the sign could be excluded from the list of signs;
- changes in testing technology can affect the results of obtaining the appropriate probabilities. When changing the testing technology it is necessary to reject the outdated values of statistical data;
- accumulating new data and rejecting outdated, processing data using Bayesian methods, evaluating data — we can assume that this is a described model of DSS with artificial intelligence that carries out artificial intelligence machine learning on the signs of spam.
Keywords: Bayesian methods; machine learning; artificial intelligence; decision support systems; spam.
References
1. Гаманюк І. М. Методи розрахунку помилок 1-го та 2-го роду при прийнятті рішення про функціональний стан системи підтримки прийняття рішень // Зв’язок. 2018. № 4. С. 25–27.
2. Байєсівські мережі в системах підтримки прийняття рішень: навч. посіб. / М. З. Згуровський, П. І. Бідюк, О. М. Терентьєв, Т. І. Просянкіна-Жарова. Київ, 2015. 300 с.
3. Горбань І. І. Теорія ймовірностей і математична статистика для наукових працівників та інженерів: монографія. Київ, 2014. 244 с.
4. Постанова Кабінету Міністрів України від 11 квітня 2012 р. № 295 «Про затвердження Правил надання та отримання телекомунікаційних послуг».
5. Chiang S. Jao. Decision Support Systems // Intech, 2010. 406 р.