At Google I/O this year, Google announced a new Jetpack support library, named CameraX, to introduce a consistent, simplified API for handling common camera functionality on a majority of Android devices. The Camera and Camera2 (for Lollipop and newer devices) APIs have long been a major pain point for Android developers around the world, often having to hack together device/manufacturer specific fixes to get correct camera functionality on all (or as many as feasible) devices.

CameraX works on all Lollipop and newer Android devices (using Camera2 behind the scenes) and comes with three pre-defined use cases aimed to cover the standard functionality an app interacting with the camera would expect. These are:

  • Preview: display a preview of what the camera sees
  • Image analysis: access the image buffer to run any analysis (ex: text recognition from MLKit)
  • Image capture: capture high-quality images

Each of the use cases has a similar setup — you define a config for the use case, then create the use case with the config, and then bind the use case to an Activity or Fragment Lifecycle to make it lifecycle-aware (automatically handling closing the camera when appropriate). To demo each use case, we will make a sample app that takes user input to define a phrase to be searched for, and then using the Preview use case, search for the phrase in the text using the ImageAnalysis use case and MLKit Text Recognition, and finally the ImageCapture use case to capture the image once the phrase has been detected. Code snippets in this article are modified for brevity, but the full code can be found on GitHub.

Preview

First, we will start with the Preview use case — displaying a feed from the camera to the user inside a TextureView. First, we define a PreviewConfig to define which camera we want to use (front or back), our target resolution, and our target aspect ratio, then we use that config to create our Preview use case.

// pull the metrics from our TextureView
val metrics = DisplayMetrics().also { surfacePreview.display.getRealMetrics(it) }
// define the screen size
val screenSize = Size(metrics.widthPixels, metrics.heightPixels)
val screenAspectRatio = Rational(metrics.widthPixels, metrics.heightPixels)

val previewConfig = PreviewConfig.Builder()
 .setLensFacing(CameraX.LensFacing.BACK) // defaults to Back camera
 .setTargetAspectRatio(screenAspectRatio)
 .setTargetResolution(screenSize)
 .build()

val preview = Preview(previewConfig)

// finder: TextureView -- our TextureView for displaying input
preview.setOnPreviewOutputUpdateListener {
 previewOutput: Preview.PreviewOutput? ->
 // update the preview
 val parent = finder.parent as ViewGroup
 parent.removeView(finder)
 parent.addView(finder, 0)
 finder.surfaceTexture = it.surfaceTexture
}

// this can be your activity or your fragment
CameraX.bindToLifeCycle(this, preview)

Image Analysis

Now that we have a working Preview of the camera input, we can run some analysis on the image buffer to search for a word or phrase. To accomplish this, we will use the ImageAnalysis use case and MLKit's Text Recognition functionality. The setup is very similar to our Preview use case — with the Image we receive from the ImageAnalysis use case easily being converted to a FirebaseVisionImage to run the text recognition on device (no need for the cloud!).

// setup the config
val analyzerConfig = ImageAnalysisConfig.Builder().apply {
 setLensFacing(CameraX.LensFacing.BACK)
 // run the analytics on a background thread so we are not interrupting
 // the preview
 val analyzerThread = HandlerThread("OCR").apply { start() }
 setCallbackHandler(Handler(analyzerThread.looper))
 // we only care about the latest image in the buffer,
 // we do not need to analyze each image
 setImageReaderMode(ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE)
 setTargetResolution(Size(1280, 720))
}.build()

val imageAnalysis = ImageAnalysis(analyzerConfig)
// set the analyzer
imageAnalysis.analyzer = ImageAnalysis.Analyzer { 
 image: ImageProxy, rotationDegrees: Int ->
 // no image.. just return
 if (image.image == null) return@Analyzer

 // convert our CameraX image into a FirebaseVisionImage
 val visionImage = FirebaseVisionImage.fromMediaImage(
 image.image!!,
 // converts degress to 
 // FirebaseVisionImageMetadata.ROTATION_ values
 getOrientationFromRotation(rotationDegrees)
 )

 val detector = FirebaseVision.getInstance()
 .onDeviceTextRecognizer

 detector.processImage(visionImage)
 .addOnSuccessListener { result: FirebaseVisionText ->
 // success, check for our word or phrase
 }
}
// bind it to the lifecycle with our preview use case
CameraX.bindToLifeCycle(this, preview, imageAnalysis)

Image Capture

The final use case is ImageCapture , which we will use to capture the current image once our analysis has detected the word or phrase we are searching for. Once again, this is setup very similar to our other use cases — we will create a config for our ImageCapture , as well as a listener to handle the successfully captured image. We will also modify our OnSuccessListener for our image processing to trigger the image capture when the phrase has been detected.

val captureConfig = ImageCaptureConfig.Builder()
 .setLensFacing(CameraX.LensFacing.BACK)
 .setCaptureMode(ImageCapture.CaptureMode.MIN_LATENCY)
 .setTargetRotation(surfacePreview.display.rotation)
 .setTargetAspectRatio(screenAspectRatio)
 .build()

val imageCapture = ImageCapture(captureConfig)

// imageCaptureListener
val imageCaptureListener = object : ImageCapture.OnImageSavedListener {
 override fun onError(error: ImageCapture.UseCaseError, message: String, exc: Throwable?) {
 // handle any errors
 }

 override fun onImageSaved(photoFile: File) {
 // handle our saved image
 }
}


// update our image detector's onSuccess callback to trigger the capture
/* ... */
detector.processImage(visionImage)
 .addOnSuccessListener { result: FirebaseVisionText ->
 // success, check for our word or phrase
 if (result.text.contains(ourPhrase)) {
 // trigger image capture
 val outputDirectory: File = requireContext().filesDir
 val photoFile = File(outputDirectory, "${System.currentTimeMillis()}.jpg")
 imageCapture
 .takePicture(photoFile, imageCaptureListener, ImageCapture.Metadata())
 }
 }
/* ... */
// add imageCapture to the lifecycle
CameraX.bindToLifeCycle(this, preview, imageAnalysis, imageCapture)

Result

All of the use cases work seamlessly together, and we get all of this functionality in a fraction of the lines of code it would have required had the old APIs been used. Check out the Github code and try it yourself! Testing was done on a Pixel XL and a Samsung S8.

Extensions

In addition to the three use cases, there are also vendor-specific extensions to allow for handling things like HDR or Portrait mode on supported devices. A list of extension supported devices and what extensions they support can be seen here. Note: the extensions library is not yet publicly available through Google's Maven repository, but you can see the code here.

Conclusion

The CameraX Jetpack support library is an extremely welcome addition to what was one of the messier and painful parts of the Android framework to work with. While it is still in alpha and has a few kinks to iron out before it hits a stable release, it is already clear that this library will be of major value to any app that makes use of the Camera or Camera2 APIs. The analysis portion especially allows for easily hooking into a number of image analysis libraries and machine learning models, which is exciting!

Authors

Alex Townsend is a Manager at CapTech, based in Washington, DC. Alex has a passion for Android application development and architecture, focusing on bringing the latest and greatest technologies and patterns to his work.

Jack Hughes is an Android Developer in CapTech’s System Integration practice and is based out of their Richmond, VA office. He has a passion for enterprise level Android architecture and integrating micro-services with mobile.