*The following post originally appeared on Blogging4Jobs.com as part of the #BigDataHR series.*

Civil rights defense lawyers have warned their clients for decades to keep company data sets small: smaller applicant flow counts/promotion pools…smaller pools of comparative employees wherever you can legitimately shrink the pools into the smallest dataset you can tolerate from an administrative and human resources perspective. You will recall discussion about “Data Management Techniques” as one tool for management to use to shrink “Applicant” counts when searching the Internet job boards at the time OFCCP launched its “Internet Applicant” regulation in the early 2000s and Monster.com had over 1 million resumes online.

Statisticians report that the LARGER the number of employment transactions (like hires or promotions), the smaller the acceptable range within which the occurrence of mere CHANCE can explain away as benign selection disparities large enough to make for meaningful statistical analyses (for example in the percentage of Blacks rejected for hire, or the percentage of women rejected for promotion).

The classic example of this phenomenon of large number sets is a coin flip demonstration. Just like when we select Hispanics for hire out of a pool of applicants, for example, we have an expectation of a fair process when flipping coins. But CHANCE will intervene and change our view as to what we would expect in a fair toss of even a simple two-headed coin. You expect that 5 of 10 flips would be heads and 5/10 would be tails, right? But that rarely happens in the real world. Try it. Take a coin and flip it 10 times. Record the number of “tails”/“heads” you get. While you could get 5/5, the greater reality is you will get 2/8, 7/3, or vice versa, 4/6 or even 9/1. Chance intervenes in every human event to spoil the otherwise expectation of perfect fairness: 5 heads/5 tails in the case of coin flips.

Now, flip the coin in a second series of 10 flips. You will see the number counts are almost undoubtedly different from the first series of 10 flips. BUT, if you performed 100 ten-flip series (i.e. 1000 flips), you would find that while the flip counts for each series were different, that the total number of heads/tails across all series would approach 50/50: half heads/half tails. What is happening? As the number set grows, the number of 7/3 flips starts to balance out the number of 3/7 flips, etc.

In lay terms, the law measures statistical differences in selection results (differences from the expected norm) by something called a “2 standard deviation analysis” which gives you an allowable range of “miss” (“variance”) due to chance. You are not discriminating in flipping if you do not get 5/5. The two standard deviation analysis allows you some room to miss the expected number due to chance. On 10 flips, the range of acceptable “miss” (“variance”) is +/- 3 from the expected. That is a 30% “miss allowance.” Since we expect 5, we could get as few as 2 heads and as many as 8 (+/- 3 on either side of the 5 we expect) and find that a “fair” flip: i.e. what we would reasonably expect.

**Punch line. **Because of all that balancing I mentioned above…after hundreds or thousands of flips coming out “all over the map” and canceling each other out, the acceptable variance one would expect at 10,000 flips would be ONLY **+/- 1%…**not the 30% “miss allowance” we had with only 10 flips (or 10 selections for employment). So, you would expect 5,000 heads/5,000 tails, but you could “miss” that expected number by ONLY +/-100 (+/-1%). So, if you had fewer than 4900 heads or more than 5100 heads, the law would say that the statistics suggest something other than CHANCE was at work…perhaps unlawful discrimination.** **

** **

When should you worry your numbers are TOO big? At 100 coin flips, the acceptable variation = +/- 10 (10%…down 3x from your “miss allowance” at 10 flips). At 500 flips, the acceptable variance drops precipitously to only +/- 22 (4.4%). 1000 flips = only +/- 32 from the 500 you expect:=only a 3.2%.acceptable variance (and not the 10 times greater acceptable variance (30%) you enjoyed with 10 flips).

Now, go back and re-read paragraph 2, above, and (hopefully) it will now make greater sense to you. And, then go to work to shrink your Applicant pools to as small as you can make them while keeping them as large as you must to fuel the human factor needs of your business!

Good Luck!

-John