Sampling bias: how a machine-learning beauty contest awarded nearly all prizes to whites

If you've read Cathy O'Neil's Weapons of Math Destruction (you should, right NOW), then you know that machine learning can be a way to apply a deadly, nearly irrefutable veneer of objectivity to our worst, most biased practices.

Here's a telling example of exactly how that works in practice. Beauty.ai is a joint Microsoft/Nvidia/Youth Laboratories project that applied machine learning techniques to rank 600,000 user-submitted selfies for their beauty, and picked 44 finalists: six Asians, one dark-skinned person, and thirty-seven white people.

Are whites just that hot?

Nope. But the training data for "beauty" comes from profoundly biased institutions (the first Black Miss America was crowned in 1983). What's more, the European-based project leads ended up disproportionately sourcing their selfies from other white Europeans.

In other words, they used non-representative samples to make statistical inferences. This isn't a unique problem in machine learning: it is literally one of the most common problems in all scientific work. But it's only in machine learning that we treat the statistical work as having some kind of mystical property that finds truth that is otherwise invisible to the unassisted human cognitive apparatus.

The combination of this new oracular authority and the dismal, familiar sampling bias is a truly dangerous thing. It's disturbing when applied to beauty, but the same problems crop up in automated sentencing and other systems that decide who is free and who is denied freedom, work and family.

The problem here is with the lack of diversity of people and opinions in the databases used to train AI, which are created by humans.

“We had this problem with our database for wrinkle estimation, for example,” said Konstantin Kiselev, chief technology officer of Youth Laboratories, in an interview. “Our database had a lot more white people than, say, Indian people. Because of that, it’s possible that our algorithm was biased.”

“It happens to be that color does matter in machine vision,” Alex Zhavoronkov, chief science officer of Beauty.ai, wrote me in an email. “and for some population groups the data sets are lacking an adequate number of samples to be able to train the deep neural networks.”

The other problem for the Beauty.ai content in particular, Kiselev said, is that the large majority (75 percent) of contest entrants were European and white. Seven percent were from India, and one percent were from the African continent. That’s 40,000 Indians and 9,000 people from Africa that the algorithms decided didn’t match up with the idea of beauty that they’d been trained to recognize.

Why An AI-Judged Beauty Contest Picked Nearly All White Winners
[Jordan Pearson/Motherboard]