Chart of the Week: A Plateau in Computer Vision Models?
31 Aug 2022

This week we are looking at the Visual Commonsense Reasoning challenge - a subset of computer vision challenges. The VCR challenge which was introduced in 2018, asks AI systems to answer questions about images and also explain their reasoning.This data and chart were sourced from

This chart shows the advancement of artificial intelligence models in the visual commonsense reasoning challenge.

The field of computer vision has been advancing so rapidly, it’s been hard to keep up with news of the latest accomplishments. The AI Index shows that computer vision systems are tremendously good at tasks involving static images such as object classification and facial recognition, and they’re getting better at video tasks such as classifying activities.

But a relatively new benchmark shows the limits of what computer vision systems can do: They’re great at identifying things, not so great at reasoning about what they see. For example, an image that shows people seated at a restaurant table and a server approaching with plates; the test asks why one of the seated people is pointing to the person across the table. The report notes that performance improvements have become increasingly marginal in recent years, “suggesting that new techniques may need to be invented to significantly improve performance.”

Do you know of any VCR models, if you do please share them with the community.

