# Gender Inequality in STEM Programs

Recently I’ve been interested in investigating how gender equality - or equivalently, inequality - has evolved over time in Canada. Using the University of Waterloo’s public Cognos cubes, specifically those for undergraduate enrollment, I have found some pretty interesting results. Below, I will detail a brief summary of these findings.

To begin, let’s talk a bit more about our population of study and the data that is used in our analysis. The target population that I’m examining here is set of all undergraduate university students and the sample population is the set of all undergraduate students who have enrolled at the University of Waterloo. The sample chosen in the analysis is the sample population restricted to students who have enrolled between 1996 to 2013 where terms are only selected in the sample if there are at least 20 distinct programs with at least 1 student in them. We make no distinction between students enrolled in or not enrolled in a co-operative program.

All analysis and visualization is done in the free academic version of Revolution R, Version 7.

For each date, at the Program (e.g. Life Sciences) and Faculty (e.g. Science) level, a total is computed for each of the female and male genders and the percentage of females is then calculated as |Females|/(|Males| + |Females|). A rendered ordered bar chart at each date, with the Program on the x-axis and percentage female on the y-axis, is then generated using the ggplot2 R package and a GIF animation of these charts is produced to study the time evolution and can be accessed here.

Bar colors are dependent on the Faculty of the Program and the abbreviations for each Program can be clarified here. A quick scan over the image shows that there does not appear to be any noticeable change in the overall shape other than a -very- slight flattening of the center bars and slight increase in the slopes near the extreme ends during later years.

Using this data, I use the following method as a crude estimate for Faculty-wide, time dependent gender bias, where I define this as how gender bias a Faculty is relative to past or future states or enrollments of the university. Suppose that for a fixed date we have \(n\) programs and \(P=\{P_{1,F1}, P_{2,F2}, ..., P_{n,Fn}\}\) is a set of ordered values of percentages of females in \(n\) different programs, ordered by least to greatest percentage of females in the first index, and where the second index is representative of the Faculty in which the program falls under. Let \(P_{F}=\{P_{k,Fk} \in P: Fk = F\}\) and \(n_{F} = | P_{F} |\). Then for each Faculty \(F\), we denote the (female-dominated) gender bias as

\[G(F)= (P_{n_{F},Fn_{F}}+P_{n_{F-1},Fn_{F-1}}+P_{n_{F-2},Fn_{F-2}}) / 3n\]

Which we can think of as a three term average [1] of the quantile of the three most female dominated programs. A value close to 100% (less biased) is generally preferred.

Taking only the STEM Faculties into consideration (SCI, ENG, MATH), we plot out this measure over time using the lattice R package below:

A least squares regression with slope and intercept interaction factors is also done in R for computing long term trends and is shown below:

Here, Idx is just a normalized Date variable. From the results, we can see that the long-run growth in the MATH and SCI Faculties are not significantly different from one another and we can expect a long-term growth of female gender bias of approximately 0.23% every term in these faculties in the near future, while for engineering, this is closer to 0.03%.

With this in mind, it looks like we won’t be seeing fair gender equality for at least 2 decades for the sciences and several times that amount for the mathematics and engineering faculties.

To replicate these results, as well as see the charts above in higher resolution and examine the source data, you can check out the relevant Skydrive directory here.

If you have any comments or suggestions for future statistical projects, let me know in the comments section below.

[1] An average is done here in order to smooth out any outliers, which from the data we can see a few, particularly in the architecture program.