Live Text is an AI-based feature that uses deep neural networks to convert images into machine-encoded text, recognizing any text in a photo, screenshot, or live camera preview. It relies on Apple’s Vision machine learning model, which is powered by the device’s internal Neural Engine. This internal engine allows for “on-device intelligence”, which provides an additional layer of privacy on top of Apple’s existing security measures. Instead of batch processing information in the cloud like most AI products, processing is done on the device in real-time, protecting the user’s privacy and keeping their information secure.
Because the Neural Engine is required, Live Text is only available on Apple devices made after 2018 running iOS 15 or above (E.g., iPhone XS, XR, etc.). The following settings are used to enable Live Text on any eligible device:
Camera -> Show Detected Text
Language & Region -> Live Text
After toggling both of these settings on, a user will be able to capture text from the camera, screenshots, or images stored on the device.
A Practical Application of the Scan Text Feature
My introduction to the Live Text feature came while developing a mobile application that calculates and pools server staff tips for a local restaurant. As a former restaurant employee whose wages relied primarily on tips, the process of tip pooling (i.e., combining and equally dividing the staff’s collective tips) was one that required meticulous attention even after the most chaotic of shifts. Arithmetic mistakes were frequent, often resulting in incorrect paychecks and recalculations. To combat this, I partnered with my former colleagues to create a mobile app that would allow servers to calculate tips programmatically simply by scanning their end-of-night report with their phones, and letting Live Text do the rest.
Calculating server tip pool
To build a programmatic tip calculator, I started by identifying the key factors that determine tips. These factors serve as the basis of what will eventually become the key state variables stored in the application.
From there, I identified the formulas used to calculate end-of-night tips.
Hosts/hostesses and food runners are referred to as support staff. Support staff are paid out a portion of the server tip pool, determined by a percentage of the server’s combined net sales. This looks something like:
After calculating what support staff is owed, total tip pool can be calculated by subtracting support staff’s portion and dividing by the total number of servers. This is represented by:
To use the identified variables in their respective formulas, I needed a way to collect the information from the user.
Counting Support Staff and Servers
I first created a view with two steppers: one for total servers and one for total support staff.
Both values are stored as state variables, so the application can track changes as the user increases and decreases the server and support staff count. The total server count is used to generate the appropriate amount of server forms.
For support staff tip percentage, I assumed the tip out was 1.5% of total net sales. This is standard for most restaurants and I did not give the user an option to change it.
Storing Server Information
To store server information, I created a form with fields for each relevant variable:
house owes server
server owes house
any additional cash tips
Because Live Text is an out-of-the-box feature, no code is required to use it in a text field. With a simple double tap on the text field, the option to scan text will appear, as long as the simulator or device has iOS 15 or later. However, it requires the user to have the pre-existing awareness that the feature is available. It also needs to be manually configured to search for complex content types.
Calculating Individual Tips
As the user enters information in each text field, their data will be stored in a Server object within a struct.
Once the user enters their information, the individual server’s total tips are calculated before being appended to a server array that is initialized in a separate Model class. Calculating each individual server’s contribution to the tip pool before calculating support staff tips helps reduce overall processing time.
After completing the serverTotal calculation, the server object is appended to the server array.
A check is then performed to see if the count of the server array is equal to numberOfServers collected from the previous screen, indicating whether or not the user has filled out information for all servers working that evening. If not, the current information is cleared and the user repeats the process.
Calculating Final Tip Pool
Once the application has received each server’s information, it loops through each server to grab their total net sales and total tips, adding and storing them into totalNetSales and totalTips respectively. From there, the total amount owed to support staff (sTips) is calculated by taking 1.5% of totalNetSales, multiplying by the number of support staff (numSupport), and then dividing by numSupport. To calculate the server tip amount, the value of sTips is subtracted from totalTips and divided by the number of servers on the floor.
After server and support staff tip allocations are calculated, they are displayed on a sheet that visually represents the final information to the user.
Implementing the Live Text Feature
Once the basic tip pool logic was complete, I turned my focus to adding a more sophisticated Live Text implementation, allowing users to scan server information. As previously mentioned, the live text feature is available out-of-the-box for any user with a device that has a neural engine, iOS 15, and the proper system settings. I wanted to make it clear to the user that scanning text is the preferred method, so I implemented a “Scan Text” button.
Adding a Scan Text Button
The live text feature is powered by the new captureTextFromCamera(responder:identifier:) method. This is a UIAction that must conform to UIResponder and UIKeyInput, meaning a UIView is required. Hello UIKit!
I created a struct called ScanButton that conformed to UIViewRepresentable. To conform to UIViewRepresentable, it needed makeUIView(context:) and updateUIView(_:context) functions.
I then created a Coordinator to pass data from a UIView to a SwiftUI View. To be a subclass of UIResponder and conform to UIKeyInput, the Coordinator class needed to contain:
hasText — as a Boolean variable
insertText(_:) — the function that drives the input of the text within the text box
deleteBackward() — this will be left blank since the keyboard isn’t being used to insert text
To facilitate the passing of data from the Coordinator back to the SwiftUI View that calls ScanButton, ScanButton needed a binding string property to store the text captured from the camera.
I then set the text passed into insertText equal to parent.text (the binding property).
Finally, I configured the button’s action to capture text from the user’s camera. I created a UIAction called textFromCamera and set it as the button’s primary action. This triggers the camera to be launched when the user taps the ScanButton.
With this implementation, the user is able to click the Scan Text button and point their camera at their desired text. The visual representation of the button can be adjusted in the UIView by placing it in a VStack or HStack and playing with the padding settings. Here is a representation of the Scan Text button in an HStack with a 10x10 frame and trailing alignment.
Filtering for Custom Content Types Using Regular Expressions
The Scan Text button helps the user navigate to the Live Text feature living within their device; however, it still requires the user to know exactly what portion of text to capture and requires them to select, copy, and paste that text. For the user experience to be effortless, a developer may want to incorporate a content type filter. Apple provides the following content type filters out-of-the-box:
To implement any of these 7 types, simply add the .contentTypeFilter on the Text Field object.
But what if the application needs to filter for a content type that isn’t provided out-of-the-box? In the instance of my server tip calculator, I needed to filter for a USD currency value. This is where the beauty of regular expressions comes in!
Create a TextReader() Function
When a user scans text, it is captured in a string variable. To filter for specific text already displayed as a currency value, I created a textReader() function that takes in the desired string as a parameter, checks against the regular expression, and then returns a new string with the simplified text.
In the ScanButton struct, I added a patternID parameter. This eliminated redundancy by dynamically applying different regex patterns to the same textReader() function.
The patternID specifies which text box/content type is being targeted. The text pattern that exists immediately before the desired content type is identified through a regular expression. The desired value is then extracted into a group (indicated by the parenthesis).
After the pattern variable is set to the desired regex, a stringRange and a regex object is created with that specified pattern. The server tip sheet is printed out in uniform format, so regex.firstMatch is used because there will never be more than one match. The match is then stored in an array and converted using .description to transform the match object back into a string. That string, stored in the variable convertedText, is then returned from the textReader() function.
To finish the text filtering implementation, I called textReader() within the insertText() function and set it equal to parent.text. The user is then able to point their camera at the desired text and insert it into the text box without any extra effort to identify the content type and then copy & paste!
Advantages and Disadvantages
For a relatively new feature, iOS Live Text works with minimal errors. Embedding Live Text within a mobile application requires a low level of effort, but can significantly elevate the user experience by eliminating the need for manual text input. It can help save the user time and frustration as it reduces the chance for input error by allowing a direct scan of content. On the other hand, it tends to fall short when it comes to capturing a large body of text. In an instance where there are two columns of text, the device defaults to selecting one column or the other, not both simultaneously. Additionally, Apple currently only provides 7 out-of-the-box content filter types, so the developer will most likely need to implement their own filter using regular expressions. I’d imagine that as the feature becomes more widely used, Apple will expand the content type filter capabilities, but for now it requires additional development work.
Live Text is an incredible way to level-up your mobile application. At its most basic level, it is easy to implement, requiring a low level of effort for a seemingly complex feature. I look forward to seeing Apple’s continuous improvements with the Live Text feature, hopefully improving the feature’s ability to recognize large bodies of text, as well as providing more out-of-the-box content types.