**Calculating Entropy for Data Miners**

Pages: 1, 2, **3**, 4, 5

### Independence

Beyond knowing the minimal number of questions needed to identify a signal
from our bivariate distribution of signals, is there another use for the joint
entropy score? Another use is to compare the joint entropy score
`H(X,Y)`

with the sum of the marginal entropies ```
H(X) +
H(Y)
```

in the probability distribution table to determine the degree of
independence between two random variables; for example, `age`

and
`buys_computer`

. If `H(X,Y)`

is approximately equal to
`H(X) + H(Y)`

, then you can conclude that the two variables are
independent of each other--knowing the value of one variable does not
enlighten you about the value of the other variable. Here's a script to see whether
`age`

and `buys_computer`

are independent.

```
<?php
include_once "../config.php";
include_once PHPMATH."/IT/Entropy.php";
require_once PHPMATH."/IT/JointEntropy.php";
$e = new Entropy;
$e->setTable("Customers");
$e->setColumn("age");
$e->analyze();
$age = $e->bits;
$e->setColumn("buys_computer");
$e->analyze();
$buy = $e->bits;
echo "H(age) = $age<br />";
echo "H(buys_computer) = $buy<br />";
echo "H(age) + H(buys_computer)= ".($age + $buy)."<br />";
$je = new JointEntropy;
$je->setTable('Customers');
$je->setColumns(array('age','buys_computer'));
$je->analyze();
echo "H(age, buys_computer) = ". $je->bits;
?>
```

The output of this script is:

```
H(age) = 1.57740628285
H(buys_computer) = 0.940285958671
H(age) + H(buys_computer)= 2.51769224152
H(age, buys_computer) = 2.27094242175
```

From these results, you conclude that ```
H(age, buys_computer) <
H(age) + H(buys_computer)
```

, which means that `age`

and `buys_computer`

are not
totally independent variables (although the dependence does not seem too strong
either). In general:

`H(X,Y) <= H(X) + H(Y)`

with equality only if X and Y are
independent.

One of the most important reasons for being concerned about whether two variables are independent is that data reduction is a critical aspect of any datamining analysis. One important way to reduce the data to manageable size is to eliminate variables that are independent of the output variable about which you want to reduce your uncertainty. If you find that your joint probability scores are nearly equal to the additive sum of your marginal probabilities, you can conclude that our variables are independent and should consider eliminating the independent variable from your analysis.

### Conditional Probability

The next formula to discuss and implement is the conditional entropy formula. Before discussing this formula, however, you must first understand how to compute a conditional probability from a joint probability table. The concrete formula for computing a conditional probability looks like this.

`P(Y=y | X=x) = P(Y=y, X=x) / P(X=x)`

An instantiated formula for computing the conditional probability that customers will buy a computer given that they are under 30 years old looks like this:

```
P(buys_computer = yes | age = <30) = P(buys_computer = yes,
age = <30) / P(age = < 30)
```

To compute `P(buys_computer = yes, age = <30)`

, look up the *joint
probability* cell with these specific row and column settings (that is,
`0.14286`

). To compute the `P(age = <30)`

, look up the *row
marginal* where `age = <30`

(`0.35714`

). In a nutshell,
computing a conditional probability involves dividing a joint probability by a
marginal probability (`0.14286/0.35714 = 0.4`

).

### Conditional Entropy

The first step in computing the overall conditional entropy is to compute the specific conditional entropies using this formula:

`H(X | Y = y) = -Σ`_{i:n} P(X = x | Y = y) * log(P(X = x | Y = y))

Plug the specific conditional entropy formula `H(X | Y = y)`

into the
conditional entropy formula below to compute the amount of uncertainty
remaining about X after Y has been observed:

`H(X | Y) = -Σ`_{i:n} P(Y = y) * H(X | Y =
y)

The specific conditional entropy formula computes the amount of uncertainty remaining after performing conditioning on one value of the signal distribution, whereas the conditional entropy is the amount of uncertainty remaining after summing the products of specific signal probabilities and specific conditional entropies.

### Calculating Conditional Entropy

Now onto some code for computing the conditional entropy. The code below
conditions buying on age (```
$ce->setConditional('buys_computer |
age')
```

) and outputs a joint probability and conditional entropy
table.

```
<?php
/**
* @package IT
*/
require_once "../config.php";
require_once PHPMATH."/IT/ConditionalEntropy.php";
/**
* Example of how to use the ConditionalEntropy class.
*/
$ce = new ConditionalEntropy;
$ce->setTable('Customers');
$ce->setConditional('buys_computer|age');
$ce->analyze();
?>
<i>Joint probability table.</i>
<?php
$ce->showJointProbability("%.5f");
?>
<br />
<i>Conditional entropy table.</i>
<?php
$ce->showConditionalEntropy();
?>
```

This code outputs the following tables:

buys_computer |
||||

no | yes | Σ_{i+} |
||
---|---|---|---|---|

age |
<=30 | 0.21429 | 0.14286 | 0.35714 |

31..40 | 0.00000 | 0.28571 | 0.28571 | |

>40 | 0.14286 | 0.21429 | 0.35714 | |

Σ_{+j} |
0.35714 | 0.64286 | 1 |

*Joint probability table*

P(B | A = a_{i}) |
|||||
---|---|---|---|---|---|

A_{i} |
P(A = a_{i}) |
no | yes | H(B | A = a_{i}) |
P(A = a_{i}) * H(B | A = a_{i}) |

<=30 | 0.357142857143 | 0.6 | 0.4 | 0.970950594455 | 0.346768069448 |

31..40 | 0.285714285714 | 0 | 1 | 0 | 0 |

>40 | 0.357142857143 | 0.4 | 0.6 | 0.970950594455 | 0.346768069448 |

Σ_{i=1 ... 3} P(A = a_{i}) * H(B | A = a_{i}) |
0.693536138896 |

*Conditional entropy table*

The first table appears again so that you can see how the second,
third, and fourth columns in the second table were derived from it. The second
column simply reproduces the row marginal from the first table. The third and
fourth columns are conditional probabilities calculated by dividing a joint
probability by a row marginal (for instance, `0.14286/0.35714 = 0.4`

). The fifth column
calculates the specific conditional entropy for each age range. For
example:

```
H(buys_computer | age = <30) = -1 * [ 0.6 * log(0.6) + 0.4 * log(0.4) ] =
0.970950594455
```

The specific conditional entropy column can be useful to examine in some detail, because low values are telling you that there is an uncertainty-reducing relationship between levels of your variables. In traditional statistical analysis, such minute relationships may not be theoretically interesting; however, in datamining contexts you might find it interesting to know that 30- to 40-year-old customers tend to purchase computers at your store. Of course, there is not enough data in this data set to draw any firm conclusions.

The specific entropy value in the fifth column is then multiplied by the corresponding probability in the second column to obtain the values in the sixth column. The values in the sixth column are summed to give the overall conditional entropy score reported in the bottom-right cell.