Women on a board of directors: let’s use Bayesian inference

Home > data science, open source tools > Women on a board of directors: let’s use Bayesian inference

Women on a board of directors: let’s use Bayesian inference

June 30, 2011 Cathy O'Neil, mathbabe

I wanted to show how to perform a “women on the board of directors” analysis using Bayesian inference. What this means is that we need to form a “prior” on what we think the distribution of the answer could be, and then we update our prior with the data available. In this case we simplify the question we are trying to answer: given that we see a board with 3 women and 7 men (so 10 total), what is the fraction of women available for the board of directors in the general population? The reason we may want to answer this question is that then we can compare the answer to other available answers, derived other ways (say by looking at the makeup of upper level management) and see if there’s a bias.

In order to illustrate Bayesian techniques, I’ve simplified it further to be a discrete question. So I’ve pretended that there are only 11 answers you could possible have, namely that the fraction of available women (in the population of people qualified to be put on the board of directors) is 0%, 10%, 20%, …, 90%, or 100%.

Moreover, I’ve put the least judgmental prior on the situation, namely that there is an equal chance for any of these 11 possibilities. Thus the prior distribution is uniform:

We have absolutely no idea what the fraction of qualified women is.

The next step is to update our prior with the available data. In this case we have the data point that there a board with 3 women and 7 men. In this case we are sure that there are some women and some men available, so the updated probability of there being 0% women or 100% women should both be zero (and we will see that this is true). Moreover, we would expect to see that the most likely fraction will be 30%, and we will see that too. What Bayesian inference gives to us, though, is the relative probabilities of the other possibilities, based on the likelihood that one of them is true given the data. So for example if we are assuming for the moment that 70% of the qualified people are women, what is the likelihood that the board ends up being 3 women and 7 men? We can compute that as (0.70)^3*(0.30)^7. We multiply that by 1/11, the probability that 70% is the right answer (according to our prior) to get the “unscaled posterior distribution”, or the likelihoods of each possibility. Here’s a graph of these numbers when I do it for all 11 possibilities:

We learn the relative likelihoods of the outcome "3 out of 10" given the various ratios of women

In order to make this a probability distribution we need to make sure the total adds up to 1, so we scale to get the actual posterior distribution:

We scale these to add up to 1

What we observe is, for example, that it’s about twice as likely for 50% of women to be qualified as it is for 10% of women to be qualified, even though those answers are equally distant from the best guess of 30%. This kind of “confidence of error” is what Bayesian inference is good for. Also, keep in mind that if we had had a more informed prior the above graph would look different; for example we could use the above graph as a prior for the next time we come across a board of directors. In fact that’s exactly how this kind of inference is used: iteratively, as we travel forward through time collecting data. We typically want to start out with a prior that is pretty mild (like the uniform distribution above) so that we aren’t skewing the end results too much, and let the data speak for itself. In fact priors are typically of the form, “things should vary smoothly”; more on what that could possibly mean in a later post.

Here’s the python code I wrote to make these graphs:

#!/usr/bin/env python

from matplotlib.pylab import *

from numpy import *

# plot prior distribution:

figure()

bar(arange(0,1.1,0.1), array([1.0/11]*11), width = 0.1, label = “prior probability distribution”)

xticks(arange(0,1.1,0.1) + 0.05, [str(x) for x in arange(0,1.1,0.1)] )

xlim(0, 1.1)

legend()

show()

# compute likelihoods for each of the 11 possible ratios of women:

likelihoods = []

for x in arange(0, 1.1, 0.1):

likelihoods.append(x**3*(1-x)**7)

# plot unscaled posterior distribution:

figure()

bar(arange(0,1.1,0.1), array([1.0/11]*11)*array(likelihoods), width = 0.1, label = “unscaled posterior probability distribution”)

xticks(arange(0,1.1,0.1) + 0.05, [str(x) for x in arange(0,1.1,0.1)] )

xlim(0, 1.1)

legend()

show()

# plot scaled posterior distribution:

figure()

bar(arange(0,1.1,0.1), array([1.0/11]*11)*array(likelihoods)/sum(array([1.0/11]*11)*array(likelihoods)), width = 0.1, label = “scaled posterior probability distribution”)

xticks(arange(0,1.1,0.1) + 0.05, [str(x) for x in arange(0,1.1,0.1)] )

xlim(0, 1.1)

legend()

show()

Here’s the R code that Daniel Krasner wrote for these graphs:

barplot( rep((1/11), 11), width = .1, col=”blue”, main = “prior probability distribution”)

likelihoods = c()

for (x in seq(0, 1.0, by = .1))

likelihoods = c(likelihoods, (x^3)*((1-x)^7));

barplot(likelihoods, width = .1, col=”blue”, main = “unscaled posterior probability distribution”)

barplot(likelihoods/sum(seq((1/11), 11)*likelihoods), width = .1, col=”blue”, main = “scaled posterior probability distribution”)

Comments (5)

FogOfWar

June 30, 2011 at 8:50 pm

Awesome! Can we update the site so people can do open-source data entry on the actual composition of the S&P 500 boards into a database (need to include ticker symbols to eliminate duplicates)?

I still owe you a write up about the FIFO/LIFO tax point in Obama’s budget plan…

LikeLike
FogOfWar

June 30, 2011 at 8:53 pm

Oh–and I’d quotation mark “qualified” (or use a different term). The analysis gives us the implied probability distribution of what board nominating committees think are qualified candidates, not the universe of candidates who are actually “qualified” (whatever that means in this context).

LikeLike
human mathematics

August 30, 2011 at 3:43 pm

par(col = “#333333″, border=”white”) is more Tufte-compliant.

LikeLike
human mathematics

August 30, 2011 at 3:46 pm

This is _such_ a clever idea! I was hoping you would have some outside estimates at the end to compare to, though.

LikeLike
human mathematics

August 30, 2011 at 5:54 pm

A heckler’s response: shouldn’t our prior be that exactly the 3/10 of the population is qualified?

Heckler’s response #2: Our prior should be based on historical data, so our prior is that zero women are qualified.

LikeLike