Blog

We believe there is something unique at every business that will ignite the fuse of innovation.

Many software companies today offer image recognition services that can help businesses and government agencies improve customer service and increase operating efficiencies; but, the platforms are far from perfect.
 

For example, in a research project that CapTech recently conducted, we presented upside-down photographs of Hollywood’s Brad Pitt to six well-known image recognition platforms. The only one that recognized Mr. Pitt was a platform that appears to be a mechanical turk. In other words, it uses behind-the-scenes people to perform tasks that the software can’t – like identifying Brad Pitt.

Although image recognition services don’t always work as advertised, they have become a hot topic in diverse industries. These platforms can help a business tell camera-wielding customers where to find a specific product such as a suit or pair of shoes; filter out adult images on photo-sharing sites; gauge the moods of customers as they stroll the aisles; determine which shoppers have been in the store recently; and identify people who may present security risks.

But finding the right image recognition tool for the job can be far trickier than identifying a distorted image of Brad Pitt. 

CapTech evaluated six of the most widely used image recognition platforms, specifically offerings from Microsoft Azure, IBM Watson, Amazon Rekognition, Google Cloud Vision, Clarifai, and CloudSight.

We found that no single platform does it all and, because of that, the best platform for one organization may not be the best for another. Some organizations will want to use multiple platforms, or build their systems so that switching to a different vendor can happen quickly. Others may want to hold off on implementing a platform until the technology has matured.

About the Research 

As growing numbers of our clients express interest in this technology and many vendors happily – and glowingly – rate their own software, we wanted to provide an unbiased look at the performance of these solutions. We evaluated a range of recognition activities of potential relevance to clients. These activities included:

  • General item recognition
  • Reading text
  • Identifying the presence of faces
  • Identifying whose face it is
  • Mood determination
  • Brand identification
  • Classifying an item
  • Identifying adult content
We submitted approximately 5,700 images to each platform, including images that had been subjected to various types of filtering and distortion. We did this because, in the real world, images are often blurred, upside-down, skewed, and otherwise distorted.

In assessing the results, we considered not only whether identification of an image was correct, but also how confident the platform was in its answer. If an app is talking to one of these systems, the only measure of the value of an answer is the platform’s confidence in the answer. This is important because a platform may be highly confident of its answer while nonetheless being mistaken. Conversely, a platform may lack confidence in an answer when, in fact, the answer is correct. 

We also gauged how much “noise” was in the responses. Noise is incorrect information included in an answer; for example, a service may correctly identify a dog in an image, but it may incorrectly identify a flower in the same image. 

We strived to be unbiased in the tests and to present realistic use cases – as if our clients had implemented these systems and as if real people were using them.

Among Our Findings

The results led us to the conclusion that, for many functions, image recognition services are little more than an interesting novelty. 

We mentioned previously that the only service that correctly identified the upside-down image of Brad Pitt, who was twice named by People magazine as the “sexiest man alive,” was CloudSight, which appears to be a mechanical turk. 
   
Figure 1 - Brad PittFigure 2 - Somehow, This is not Brad Pitt according to the systems tested

The various services also demonstrated lackluster capabilities in identifying popular brands. All platforms successfully identified a Coke can but they weren’t as successful in identifying a Pepsi can or Levi’s jeans label. 

In addition, all were poor at recognizing handwriting and text. The best, by the way, was Google Cloud Vision. 

For some functions, however, the utility and accuracy of some services are truly impressive. 

One of the best examples of utility involves detection of adult content. We consider that important because many organizations allow users to share photos. While users should be prevented from submitting adult content, they shouldn’t be prevented from submitting images of classical paintings and statues. In our evaluations, Clarifai correctly identified 100% of adult content cases. It also correctly identified 100% of fake adult content (i.e., classical works of art).

Another area in which image recognition services demonstrate utility involves mood analysis. We found that Microsoft Azure was particularly good at analyzing moods – if the images were of high quality. As a rule, the four services that supported mood recognition struggled with moods other than angry, sad, and surprised.

An additional finding is that the image recognition services that vendors offer directly to consumers appear to be more capable than the services they offer to businesses. The product identification features of Amazon’s shopping app, for example, far outstrip the product identification features of Amazon’s recognition service.

Key Takeaways

When vendors demonstrate these services, they typically use images that their systems are thoroughly trained to identify. Getting an unbiased assessment of a vendor’s actual capabilities is key. Other takeaways:

  • If you implement image recognition services, plan on using multiple services to gain the right mix of functionality for your specific needs.
  • Be prepared to change services as they mature, so don’t tightly bind yourself to any one vendor.
  • Be prepared for fuzzy and noisy responses. Your handling of these will need to be technically complex. Along with that, be prepared to determine what you consider acceptable levels of confidence and noise.
Click here to read the full research study on our processes and findings for our vendor analysis of image recognition service providers including details on vendor performance across the range of functions we tested. 

About the Author

Jack Cox
Jack Cox has over a decade of experience helping Fortune 500 clients build mobile strategy through technology, security and cryptography. He is a software developer, systems architect, and a Fellow at CapTech where he is responsible for the firm’s mobile software practice. Jack’s love of software development and all things mobile has driven a career developing software for businesses of all sizes including large-scale transaction processing systems, embedded software, and smart-phone software. Jack co-authored the book ‘Professional iOS Network Programming’ (Wiley). He has been involved in several startups, holds multiple patents and frequently speaks nationally. Jack is based in CapTech’s Richmond, Virginia office and helps clients both locally and across the US.