How does a Spam Classifier Work?

Bayes Theorem

Example of a Spam Classifier

What about other words other than “Won”?

Impementation of Spam Classifier

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn import svm
from sklearn.model_selection import GridSearchCV
dataframe = pd.read_csv("spam.csv")x = dataframe["EmailText"]
y = dataframe["Label"]
print(dataframe.describe())
x_train,y_train = x[0:4457],y[0:4457]
x_test,y_test = x[4457:],y[4457:]
cv = CountVectorizer()  
features = cv.fit_transform(x_train)
tuned_parameters = {'kernel': ['rbf','linear'], 'gamma': [1e-3, 1e-4],'C': [1, 10, 100, 1000]}model = GridSearchCV(svm.SVC(), tuned_parameters)model.fit(features,y_train)
print(model.best_params_)
print(model.score(cv.transform(x_test),y_test))

Use of Naïve Bayes in Spam Classifier

This sounds tough but, scikit-learn in Python makes this actually pretty easy to do.

Computer Scientist | Content Writer | Intrapreneur