Introduction to Data Mining

<<< Previous    Up

Lesson 6

Entropy

 

Entropy measures the homogeneity (purity) of a set of examples.

It gives the information content of the set in terms of the class labels of the examples. In other words, the less you know, the set of examples has more information.

Consider that you have a set of examples, S with two classes, P and N. Let the set have p instances for the class P and n instances for the class N. So the total number of instances we have is t = p + n. The view [p, n] can be seen as a class distribution of S.

The entropy for S is defined as

    Entropy(S) = - (p/t).log2(p/t) - (n/t).log2(n/t)

Example: Let a set of examples consists of 9 instances for class positive, and 5 instances for class negative.

Answer: p = 9 and n = 5.

So Entropy(S) = - (9/14).log2(9/14) - (5/14).log2(5/14)

                        = -(0.64286)(-0.6375) - (0.35714)(-1.48557)

                        = (0.40982) + (0.53056)

                        = 0.94

 

The Entropy Curve is defined for p/(p+n) for the values between 0 and 1. The 2-class entropy can be depicted by the following graph. The entropy increases monotonically from 0 to 0.5 and decreases monotonically from 0.5 to 1.

When p/(p+n) = 0, the entropy is 0

          p/(p+n) = 0.5, the entropy is 1

          p/(p+n) = 1, the entropy is 0

The entropy for a completely pure set is 0 and is 1 for a set with equal occurrences for both the classes.

i.e. Entropy[14,0] = - (14/14).log2(14/14) - (0/14).log2(0/14)

                              = -1.log2(1) - 0.log2(0)

                              = 0 - 0

                              = 0

i.e. Entropy[7,7] = - (7/14).log2(7/14) - (7/14).log2(7/14)

                            = - (0.5).log2(0.5) - (0.5).log2(0.5)

                            = - (0.5).(-1) - (0.5).(-1)

                            = 0.5 + 0.5

                            = 1

 

Exercises:

  1. Find the entropy content of a set of instances containing 4 instances for the class positive and 7 instances for the class negative. Find under what distribution of instances for classes positive and negative, will the entropy of the set be equal to 1.

      

<<< Previous    Up