Thursday, November 3, 2011

Naive Bayes and Laplace Smoothing

Don't panic if you don't understand my writings below. I encourage you to watch this video for more details. Alternatively subscribe to the following online course. So, we are going to do a bit of Machine Learning.

Let's have the following inputs (2 categories or training set)

MOVIESONG
A PERFECT WORLDA PERFECT DAY
MY PERFECT WORLD   ELECTRIC STORM
PRETTY WOMANANOTHER RAINY DAY

Vocabulary of this training set is:

A, PERFECT, WORLD, MY, WOMAN, PRETTY, DAY, ELECTRIC, STORM, ANOTHER, RAINY

Vocabulary size is 11 words (also counted as categories, more details below).

Laplace smoothing is K=1.

P(SONG) = P(MOVIE) = (3+1) ⁄ (6+1⋅2) = 1 ⁄ 2
3 sentences in category, 6 - sentences altogether, 1 - is K, 2 - number of categories (SONG and MOVIES).

P("PERFECT"|MOVIE) = (2+1) ⁄ (8+1⋅11) = 3 ⁄ 19
2 occurrences in the MOVIE category, 1 - is K, 8 words in the MOVIE category, 11 - vocabulary size.

P("PERFECT"|SONG) = (1+1) ⁄ (8+1⋅11) = 2 ⁄ 19
1 occurrence in the SONG category, 1 - is K, 8 words in the SONG category, 11 - vocabulary size.

P("STORM"|MOVIE) = (0+1) ⁄ (8+1⋅11) = 1 ⁄ 19
0 occurrences in the MOVIE category, 1 - is K, 8 words in the MOVIE category, 11 - vocabulary size.

P("STORM"|SONG) = (1+1) ⁄ (8+1⋅11) = 2 ⋅ 19
1 occurrence in the SONG category, 1 - is K, 8 words in the SONG category, 11 - vocabulary size.

Applying Bayes' Rules we can calculate P(MOVIE|M), where M = {"PERFECT STORM"} (or probability that "PERFECT STORM" is MOVIE).

P(MOVIE|M) = P(M|MOVIE)⋅P(MOVIE) ⁄ [P(M|MOVIE)⋅P(MOVIE)+P(M|SONG)⋅P(SONG)] = (3⁄19 ⋅ 1⁄19 ⋅ 1⁄2)/[3⁄19 ⋅ 1⁄19 ⋅ 1⁄2 + 2⁄19 ⋅ 2⁄19 ⋅ 1⁄2] = 3⁄(3+4) = 3⁄7