Usage
Photographs can be used to detect and identify objects in the physical world by performing a visual search (a search query that uses an image as input). Using machine learning models, visual search results can tell users more information about an item – whether it’s a species of plant or an item to purchase.
ML Kit’s Object Detection & Tracking API’s “static” mode allows you to detect up to five objects in a provided image and display matching results using your own image classification model.
Principles
Design for this feature is based on the following principles...
Design for this feature is based on the following principles:
Keep images clear and legible
Align the camera UI components to the top and bottom edges of the screen, ensuring that text and icons remain legible when placed in front of an image.
Any non-actionable elements displayed in front of the live camera feed should be translucent to minimize obstructing the camera.
Provide feedback
Using an image to search for objects introduces unique usage requirements. Overlapping or cropped objects can make it hard to identify an object.
Error states should be communicated with multiple design cues (such as components and motion) and include explanations of how users can improve their search.
Components
The static image object detection features uses existing Material Design components and new elements specific to interacting with an image. For code samples and demos of new elements (such as object markers), check out the source code for the ML Kit Material Design showcase app on Android.
Top app bar
The top app bar provides persistent access to the following actions...
The top app bar provides persistent access to the following actions:
Object markers
ML Kit’s Object Detection & Tracking API contains an option to detect a “prominent object.”
Object markers are circular, elevated indicators placed in front of the center of a detected object. Each marker is paired with a card at the bottom of the screen, which displays a preview of each object’s results. When the card is scrolled into view, the corresponding object marker increases in size.
Tapping an object marker (or its results card) opens a modal bottom sheet displaying an object’s full visual search results.
Tooltip
Tooltips display informative text to users.
Tooltips display informative text to users. For example, they express both states (such as with a message that says “Searching…”) and prompt the user to the next step (such as a message that says, “Tap on a dot or card for results”).
Cards
Cards provide a preview of an object’s visual search results.
Cards provide a preview of an object’s visual search results. They are arranged in a horizontally scrolling carousel, organized based on the horizontal position of each object.
Each card is paired with an object marker. When the card is scrolled into view, its related object marker increases in size. Tapping a card (or its object marker) opens a modal bottom sheet, which displays an object’s full visual search results.
Modal bottom sheet
Modal bottom sheets provide access to visual search results.
Modal bottom sheets provide access to visual search results. Their layout and content depend on your app’s use case, the number of results, and result confidence.
Experience
Visual object search from an image happens in three phases:
Input
Visual search begins when a user selects an image.
Visual search begins when a user selects an image. To increase the chances of a successful search, advise users on the types of images most suitable to search.
Recognize
When one or more objects have been detected from an image, the app should...
When one or more objects have been detected from an image, the app should:
- Communicate that the app is awaiting results
- Display search progress
Objects detected by ML Kit Object Detection & Tracking API are then compared against a set of known images from your image classification model, which are used to find matching results.
Even if an image is detected from a photo, it doesn’t guarantee that matching results will be found. Thus, objects shouldn’t be marked as detected until valid search results are returned.
Guide adjustments
The following factors can affect whether or not objects are detected and identified (this list is not exhaustive):
- Poor image quality
- Small object size in image
- Low contrast between an object and its background
- An object is shown from an unrecognizable angle
- The network connection needed to complete the search is lost
Communicate
Results for detected objects are expressed to users by...
Results for detected objects are expressed to users by:
- Placing object markers in front of each detected object
- Showing a preview each object’s result on a card (as part of a carousel of cards)
Your app should set a confidence threshold for displaying visual search results. “Confidence” refers to an ML model’s evaluation of how accurate a prediction is. For visual search, the confidence level of each result indicates how similar the model believes it is to the provided image.
If one or more objects in the image have search results, the app should identify those detected objects using object markers and a carousel of cards previewing each object’s results. Tapping on a marker or card opens a modal bottom sheet that shows an object’s results.
Evaluating search results
In some cases, visual search results may not meet user expectations, such as in the following scenarios:
No results found
A search can return without matches for several reasons, including:
- An object isn’t a part of, or similar to, the known set of objects
- It was detected from an angle the visual search model doesn’t recognize
- Poor image quality, making key details of the object hard to recognize
Display a banner to explain if there are no results and guide users to a Help section for information on how to improve their search.
Poor results
If a search returns results with only low-confidence scores, you can ask the user to search again (with tips on improving their search).
Theming
Shrine Material theme
The Shrine app purchase flow lets users perform a visual search for products using a photo.
The Shrine app purchase flow lets users perform a visual search for products using a photo.
Object markers
Shrine’s object markers use a diamond shape to reflect Shrine’s shape style (which uses angled cuts).
To help users match result cards with possible detected objects, object markers typically increase in size when their corresponding result card is selected in the carousel. Instead of changing the object marker’s size to emphasize it, Shrine applies custom color and border styles.
Cards
Shrine’s result cards use custom colors, typography, and shape styles.