## The Naive Bayes Classifier

The Naive Bayes classifier[2] is an extremely simple classifier that relies on Bayesian probability and the assumption that feature probabilities are independent of one another. Baye's Rule gives:

Simplifying the numerator gives:

Then, assuming the probabilities are independent gives

so

is estimated through plus-one smoothing on a labeled training set, that is:

where is the number of times that appears over all training documents in class .

The class a feature vector belongs to is given by

Taking the logarithm of both sides gives

While the Naive Bayes classifier seems very simple, it is observed to have high predictive power; in our tests, it performed competitively with the more sophisticated classifiers we used. The Bayes classifier can also be implemented very efficiently. Its independence assumption means that it does not fall prey to the curse of dimensionality, and its running time is linear in the size of the input.

Pranjal Vachaspati 2012-02-05