CapTech recently evaluated six of the most widely used image recognition platforms, including offerings from Microsoft Azure, IBM Watson, Amazon Rekognition, Google Cloud Vision, Clarifai, and CloudSight.
We presented each service with the same set of roughly 4,800 images, distorting many by blurring, overexposing or underexposing, positioning the images at odd angles, and otherwise recreating real-world conditions. We then evaluated nine distinct functions (for example, facial recognition) and the services’ confidence in their answers.
This blog focuses on the best vendors for each function tested. We found that no one service has a clear lead across all tested functions. Your choice of a vendor – or vendors – should hinge on your specific needs. If adult content detection is important, for example, Clarifai might be your best bet. But if you’re interested in facial recognition, you might want to use Amazon Rekognition or Azure. Some organizations will need to use multiple vendors.
Our latest white paper, “Image Recognition Services: Searching for Value Amid Hype,” provides full details.
Functions Tested, Vendors EvaluatedFor each function tested, here is a brief definition of the function along with some observations about top performers. Not all vendors offer all functions.
Branded product identification: identifying the presence of branded items in an image. None of the services that offered this functionality performed well in our tests. All the services we evaluated, except CloudSight, expressed misplaced confidence in their own answers. All provided many incorrect answers. CloudSight, the most accurate of the services, was incorrect in 60% of all test cases. CloudSight suffers from slow response times — it appears to be a mechanical turk, with people behind the curtain, doing the actual recognition — but it is comparatively good at branded product identification.
Custom item recognition: identifying the presence or absence of items unique to a company or other organization. Only two vendors – Clairifai and IBM Watson – provided this function. Generally, both performed better when items were visually distinctive. IBM Watson did a better job of identifying items with similar packaging; for example, clear bottles of alcohol with only minor label differences. Clarifai was better at identifying items that were dramatically different from one another.
Item categorization: identifying classes of items such as dogs or flowers in an image. All six vendors offered this function and, on average, performance of this function was the strongest of all functions tested. Azure and Clarifai delivered the weakest results in this category. Results for the other vendors were similar to one another.
Logo recognition: identifying the presence of a brand logo in an image. Only two of the services provided this function: CloudSight and Google Cloud Vision. Given CloudSight’s costs and slow response times, Google Cloud Vision appears to be the more viable offering.
Face detection: reporting on the position of faces in an image. Amazon Rekognition handily beat the other evaluated services, except IBM Watson, which successfully identified a camouflaged face that no other service could identify. However, Watson’s response times were significantly slower when multiple faces were in the image.
Facial recognition: identifying people whose faces appear in an image. Only Amazon Rekognition and Azure provided true facial recognition. They performed equally well. With Watson and Clarifai, we had to use a combination of face detection and custom item recognition, which is time-consuming.
Mood analysis: determining the mood of a person in an image. Only Azure and Google supported this function directly. Of the two, Azure provided the stronger results. We attempted to get Amazon and Clarifai to detect moods through their general classification features. (Clarifai offers a domain for “general” recognition. Amazon provides a similar feature.) Clarifai returned reasonably good results.
Text recognition: reading and converting digital text, whether handwritten or typed, to machine-readable text. The performance of the four providers tested for this function was mediocre. Google Cloud Vision performed the best on a short piece of typed text; however, the 75% to 80% degree of accuracy we observed isn’t reliable enough to be used in the absence of a person who can validate the results.
Adult content detection: determining if an image contains sexually graphic content. Clarifai was 100% accurate in identifying objectionable material. Clarifai also was 100% accurate in categorizing classical works of art and other non-offensive content as non-objectionable.
Because no single vendor excels in all functions tested, we recommend that organizations looking to adopt image recognition be prepared to use multiple vendors. In addition, because of the rapid pace of change in this industry, we recommend that integration of existing systems with image recognition services be architected to provide maximum flexibility. This will enable organizations to switch vendors as needed and adapt to the rapidly changing image recognition landscape.
For more details about vendor strengths and weaknesses and about the functions tested, please request the white paper, “Image Recognition Services: Searching for Value Amid Hype.”