Computer vision for recognizing household objects works better for people in high-income households, according to analysis of 6 major object detection systems shared today by Facebook AI researchers. The study examined object classification systems made by Facebook, Google Cloud, Microsoft Azure, AWS, IBM Watson, and Clarifai.
Results show the 6 systems work between 10-20% better for the wealthiest households than they do for the poorest households.
A company spokesperson declined to share specific figures on the performance of other individual companies, but Facebook’s system had an accuracy gap as high as 20% between a home making $ 3,500 a month or more in the United States and a household with an income of $ 50 a month or less in countries like Somalia and Burkina Faso.
The systems were generally more likely to identify items in homes in North America and Europe than in Asia and Africa. Results are detailed in a paper published by researchers on arXiv Thursday titled “Does Object Recognition Work for Everyone?”
Object recognition uses computer vision to discern the difference between things like a chair or toothpaste or a dress. Object recognition is harnessed by a number of cloud service companies as well as for consumer-facing services like Google Assistant’s computer vision service Lens and Amazon’s StyleSense. Facebook uses object detection for things like content moderation and recognizing items on the screen for people with visual impairments.
“Our analysis showed that this issue is not specific to one particular object-recognition system, but rather broadly affects tools from a wide range of companies, including ours,” reads a blog post announcing the news today. “These results clearly show that we must do better both across the industry and here at Facebook.”
A map of Facebook’s object detection performance finds it performs worst in the southern hemisphere.
“By publishing these results and describing our methodology, AI researchers and engineers across the community can use this work to test and compare the performance of their own object-recognition systems, and then make them more capable of serving everyone effectively,” the post read.
The discrepancy in the major systems may be the result of the fact that data sets like ImageNet used to train many object detection systems are compiled almost entirely from photos in Europe and North America, researchers said.
Photos obtained through English-language searches on public photo websites could also be a source of the disparity in the overrepresentation of higher income environments in the U.S. and Europe.
Facebook said it plans to address the shortcoming by training its convolutional networks with images that contain hashtags in languages other than English.
The work will be shared at the Computer Vision for Global Challenges workshop being held at the Computer Vision and Pattern Recognition (CVPR) conference, which takes place June 16-20 in Long Beach, California.
Today’s announcement follows Facebook’s presentation last month detailing how Facebook tests its computer vision and AR systems to ensure they work on people with different skin tones.
In related news, Facebook computer vision researchers had to redefine the kinds of buildings that should be labeled a home when making a population density map of Africa earlier this year.