Skip to main content

Study indicates neither algorithmic differences nor diverse data sets solve facial recognition bias

Businessman using using face recognition outdoors
Image Credit: wonry // Getty Images

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


Facial recognition models fail to recognize Black, Middle Eastern, and Latino people more often than those with lighter skin. That’s according to a study by researchers at Wichita State University, who benchmarked popular algorithms trained on datasets containing tens of thousands of facial images.

While the study has limitations in that it investigated models that haven’t been fine-tuned for facial recognition, it adds to a growing body of evidence that facial recognition is susceptible to bias. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors’ systems by the Gender Shades project and the National Institute of Standards and Technology (NIST) have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of 96% of the time.

The researchers focused on three models — VGG, ResNet, and InceptionNet — that were pretrained on 1.2 million images from the open source ImageNet dataset. They tailored each for gender classification using images from UTKFace and FairFace, two large facial recognition datasets. UTKFace contains over 20,000 images of white, Black, Indian, and Asian faces scraped from public databases around the web, while FairFace comprises 108,501 photos of white, Black, Indian, East Asian, Southeast Asian, Middle East, and Latino faces sourced from Flickr and balanced for representativeness.

In the first of several experiments, the researchers sought to evaluate and compare the fairness of the different models in the context of gender classification. They found that accuracy hovered around 91% for all three, with ResNet attaining higher rates than VGG and InceptionNet on the whole. But they also report that ResNet classified men more reliably compared with the other models; by contrast, VGG obtained higher accuracy rates for women.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

As alluded to, the model performance also varied depending on the race of the person. VGG obtained higher accuracy rates for identifying women excepting Black women and higher rates for men excepting Latino men. Middle Eastern men had the highest accuracy values across the averaged models, followed by Indian and Latino men, but Southeast Asian men had high false negative rates, meaning they were more likely to be classified as women rather than men. And black women were often misclassified as male.

All of these biases were exacerbated when the researchers trained the models on UTKFace alone, which isn’t balanced to mitigate skew. (UTKFace doesn’t contain images of people of Middle Eastern, Latino, and Asian descent.) After training only on UTKFace, Middle Eastern men obtained the highest accuracy rates followed by Indian, Latino, and white men, while Latino women were identified more accurately than all other women (followed by East Asian and Middle Eastern women). Meanwhile, the accuracy for Black and Southeast Asian women was reduced even further.

“Overall, [the models] with architectural differences varied in performance with consistency towards specific gender-race groups … Therefore, the bias of the gender classification system is not due to a particular algorithm,” the researchers wrote. “These results suggest that a skewed training dataset can further escalate the difference in the accuracy values across gender-race groups.”

In future work, the coauthors plan to study the impact of variables like pose, illumination, and makeup on classification accuracy. Previous research has found that photographic technology and techniques can favor lighter skin, including everything from sepia-tinged film to low-contrast digital cameras.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.