Understanding the Maths Behind Naive Bayes Model

Delving into the mathematical foundations of algorithms and their practical applications by building code implementations from scratch.

Posted by PR-Peri on July 18, 2023 · 10 mins read
Understanding Naive Bayes: A Step-by-Step Guide

Understanding Naive Bayes: A Step-by-Step Guide

In the captivating realm of machine learning, classification algorithms wield an enchanting power to predict the class label of data instances. Among the most enchanting ones is Naive Bayes, a spellbindingly simple yet surprisingly powerful algorithm. In this blog post, we embark on a magical journey into the mathematical underpinnings of Naive Bayes, where we'll decipher the secrets of Bayes' theorem and the "naive" assumption, and weave our way through illustrative examples to unravel its charm.

Step 1: Introduce the Problem and Dataset

To begin our journey, let's introduce the classification problem at hand and the dataset we'll be using to illustrate the magic of Naive Bayes. Our dataset consists of two mystical features (x_1 and x_2) and a binary class label (C) determining whether an animal is a "Dog" or a "Cat."

Feature 1 (x_1) Feature 2 (x_2) Class (C)
1 1 Dog
1 0 Dog
0 1 Cat
0 0 Cat

Step 2: Introduce the Naive Bayes Algorithm

Before diving into the enchanting world of mathematics, let's take a moment to understand the Naive Bayes algorithm itself. Naive Bayes is a probabilistic algorithm based on Bayes' theorem and the "naive" assumption of conditional independence of features given the class label. Despite its simplicity, Naive Bayes has proven to be remarkably powerful and finds applications in various fields, including natural language processing, text classification, and spam filtering.

Step 3: Present Bayes' Theorem - Unraveling the Enigma

Behold Bayes' theorem, a magical revelation in probability theory that forms the very essence of Naive Bayes. It reveals the posterior probability of an event A given the evidence B, conjured from the product of the prior probability of A and the likelihood of observing B given A, all divided by the probability of observing B:

P(A|B) = P(A) * P(B|A) / P(B)

Where:

  • P(A|B) is the posterior probability of event A occurring given that event B has occurred.
  • P(A) is the prior probability of event A, representing our initial belief about the probability of A before considering any evidence.
  • P(B|A) is the likelihood of observing evidence B given that event A has occurred.
  • P(B) is the probability of observing evidence B.

Step 4: Explain the "Naive" Assumption

Now, in the context of Naive Bayes, we introduce the "naive" assumption. This assumption states that the features (variables) in the dataset are conditionally independent of each other, given the class label. In simple terms, it means that the presence or absence of one feature does not influence the presence or absence of another feature, given the knowledge of the class label.

This assumption simplifies the calculations significantly and allows us to express the joint probability of observing all features given the class label C as the product of individual probabilities of observing each feature given the class label:

P(x_1, x_2, ..., x_n | C) = P(x_1 | C) * P(x_2 | C) * ... * P(x_n | C)


Step 5: Introduce the Naive Bayes Equations

Now that we've set the stage with Bayes' theorem and the "naive" assumption, let us conjure the magical equations of Naive Bayes. Imagine a dataset with an ensemble of features X = {x_1, x_2, ..., x_n}, and a mysterious class label C. We can now derive the Naive Bayes equation as follows:

P(C | X) = P(C) * P(x_1 | C) * P(x_2 | C) * ... * P(x_n | C) / P(X)

This is the derived Naive Bayes equation! It allows us to calculate the posterior probability of a class label given a set of features and use it to make predictions for new instances.


Step 6: Applying the Naive Bayes Equation

In practice, to make predictions, we calculate the posterior probabilities for each class label, and the class with the highest probability becomes our predicted class for the new instance.

Example Calculation

Let's illustrate Naive Bayes with a simple example. Suppose we have a dataset with two features (x_1 and x_2) and a binary class label (C) indicating whether an animal is a "Dog" or a "Cat." We will use the following training data:

Feature 1 (x_1) Feature 2 (x_2) Class (C)
1 1 Dog
1 0 Dog
0 1 Cat
0 0 Cat

Now, let's calculate the probabilities required for Naive Bayes to predict the class of a new instance with features X_new = {x_1=1, x_2=1}


Step 1: Calculate the prior probabilities P(Dog) and P(Cat)

P(Dog) = Number of instances of Dog / Total number of instances = 2 / 4 = 0.5

P(Cat) = Number of instances of Cat / Total number of instances = 2 / 4 = 0.5


Step 2: Calculate the conditional probabilities P(x_i|C)

P(x_1=1|Dog) = Number of instances of Dog with x_1=1 / Total number of instances of Dog = 2 / 2 = 1

P(x_2=1|Dog) = Number of instances of Dog with x_2=1 / Total number of instances of Dog = 1 / 2 = 0.5

P(x_1=1|Cat) = Number of instances of Cat with x_1=1 / Total number of instances of Cat = 0 / 2 = 0

P(x_2=1|Cat) = Number of instances of Cat with x_2=1 / Total number of instances of Cat = 1 / 2 = 0.5


Step 3: Calculate the likelihood of the new instance P(X_new|C)

P(X_new|Dog) = P(x_1=1|Dog) * P(x_2=1|Dog) = 1 * 0.5 = 0.5

P(X_new|Cat) = P(x_1=1|Cat) * P(x_2=1|Cat) = 0 * 0.5 = 0


Step 4: Calculate the posterior probabilities P(Dog|X_new) and P(Cat|X_new)

P(Dog|X_new) = P(Dog) * P(X_new|Dog) / P(X_new) = 0.5 * 0.5 / P(X_new)

P(Cat|X_new) = P(Cat) * P(X_new|Cat) / P(X_new) = 0.5 * 0 / P(X_new)


Step 5: Normalize the probabilities

As P(Dog|X_new) + P(Cat|X_new) = 1, we can normalize the probabilities:

P(Dog|X_new) = 0.5 / (0.5 + 0) = 1

P(Cat|X_new) = 0 / (0.5 + 0) = 0


Conclusion

As our mystical journey comes to an end, we've unveiled the mathematical charm behind Naive Bayes, illuminated by the radiant glow of Bayes' theorem and the "naive" assumption. We've mastered the magical equations and summoned its power to predict the unseen. Naive Bayes, a beguiling combination of simplicity and potency, continues to weave its magic in the realm of machine learning, enthralling us with its applications in various fields. Embrace the magic of Naive Bayes and let it guide you on your own captivating adventures in the enchanted world of classification algorithms. May your path be bright, and your predictions ever-accurate!