Indexing Population Diversity

A diversity index is a quantitative measure that reflects how many different types (such as species, or people, or items) there are in a dataset, and simultaneously takes into account how evenly the basic entities (such as individuals) are distributed among those types. The value of a diversity index increases both when the number of types increases and when evenness increases. For a given number of types, the value of a diversity index is maximized when all types are equally abundant.

Shannon Index

The Shannon index has been a popular diversity index in the ecological literature, where it is also known as Shannon’s diversity index, the Shannon–Wiener index.

The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction. It is most often calculated as follows:

 H' = -\sum_{i=1}^R p_i \ln p_i

where pi is the proportion of characters belonging to the ith type of letter in the string of interest. In ecology, pi is often the proportion of individuals belonging to the ith species in the dataset of interest. Then the Shannon entropy quantifies the uncertainty in predicting the species identity of an individual that is taken at random from the dataset.

Rényi entropy

The Rényi entropy is a generalization of the Shannon entropy to other values of q than unity. It can be expressed:

{}^qH = \frac{1}{1-q} \; \ln\left ( \sum_{i=1}^R p_i^q \right )

Simpson Index

The Simpson index was introduced in 1949 by Edward H. Simpson to measure the degree of concentration when individuals are classified into types. The measure is usually known as the Simpson index in ecology, and as the  Herfindahl–Hirschman index (HHI) in economics.

The measure equals the probability that two entities taken at random from the dataset of interest represent the same type. It equals:

 \lambda = \sum_{i=1}^R p_i^2