Tensorflow lite android gpu

TensorFlow Lite Now Faster with Mobile GPUs

Posted by the TensorFlow team

Running inference on compute-heavy machine learning models on mobile devices is resource demanding due to the devices’ limited processing and power. While converting to a fixed-point model is one avenue to acceleration, our users have asked us for GPU support as an option to speed up the inference of the original floating point models without the extra complexity and potential accuracy loss of quantization.

We listened and we are excited to announce that you will now be able to leverage mobile GPUs for select models (listed below) with the release of developer preview of the GPU backend for TensorFlow Lite; it will fall back to CPU inference for parts of a model that are unsupported. In the coming months, we will continue to add additional ops and improve the overall GPU backend offering.

This new backend leverages:

Today, we are releasing a precompiled binary preview of the new GPU backend, allowing developers and machine learning researchers an early chance to try this exciting new technology. A full open-source release is planned in later 2019, incorporating the feedback we collect from your experiences.

4x on Pixel 3 and Samsung S9 to

GPU vs CPU Performance

At Google, we have been using the new GPU backend for several months in our products, accelerating compute intensive networks that enable vital use cases for our users.

For Portrait mode on Pixel 3, Tensorflow Lite GPU inference accelerates the foreground-background segmentation model by over 4x and the new depth estimation model by over 10x vs. CPU inference with floating point precision. In YouTube Stories and Playground Stickers our real-time video segmentation model is sped up by 5–10x across a variety of phones.

We found that in general the new GPU backend performs 2–7x faster than the floating point CPU implementation for a wide range of diverse deep neural network models. Below, we benchmarked 4 public and 2 internal models covering common use cases developers and researchers encounter across a set of Android and Apple devices:

  1. MobileNet v1 (224×224) image classification[download]
    (image classification model designed for mobile and embedded based vision applications)
  2. PoseNet for pose estimation[download]
    (vision model that estimates the poses of a person(s) in image or video)
  3. DeepLab segmentation (257×257) [download]
    (image segmentation model that assigns semantic labels (e.g., dog, cat, car) to every pixel in the input image)
  4. MobileNet SSD object detection[download]
    (image classification model that detects multiple objects with bounding boxes)

Google proprietary use cases:

The GPU speedup is most significant on more complex neural network models that lend themselves better to GPU utilization, such as dense prediction/segmentation or classification tasks. On very small models the speedup might be smaller and the use of the CPU instead could have the benefit to avoid latency costs inherent in memory transfers.

Читайте также:  Tjoc halloween edition android gamejolt

How Can I Use It?

Tutorials

The easiest way to get started is to follow our tutorial on using the TensorFlow Lite demo apps with the GPU delegate. A brief summary of the usage is presented below as well. For even more information see our full documentation.

For a step-by-step tutorial, watch the GPU Delegate videos:

Using Java for Android

We have prepared a complete Android Archive (AAR) that includes TensorFlow Lite with the GPU backend. Edit your gradle file to include this AAR instead of the current release and add this snippet to your Java initialization code.

Using C++ for iOS

Step 1. Download the binary release of TensorFlow Lite.

Step 2. Change your code so that it calls ModifyGraphWithDelegate() after creating your model.

What Is Accelerated Right Now?

The GPU backend currently supports select operations (see documentation). Your model will run fastest when containing only these operations; unsupported GPU operations will automatically fall back to CPU.

How Does It Work?

Deep neural nets run hundreds of operations in sequence, making them a great fit for GPUs, which are designed with throughput-oriented parallel workloads in mind.

The GPU delegate is initialized when Interpreter::ModifyGraphWithDelegate() is called in Objective-C++ or indirectly by calling Interpreter ’s constructor with Interpreter.Options in Java. In this initialization phase, a canonical representation of the input neural network is built based on the execution plan received from the framework. With this new representation, a set of transformation rules are applied. These include, but are not limited to:

  • Culling of unneeded ops
  • Substitution of ops with other equivalent ops that have better performance
  • Merging of ops to reduce the final number of shader programs generated

Based on this optimized graph, compute shaders are generated and compiled; we currently use OpenGL ES 3.1 Compute Shaders on Android and Metal Compute Shaders on iOS. When creating these compute shaders, we also employ various architecture-specific optimizations such as:

  • Applying specializations of certain ops instead of their (slower) generic implementations
  • Relaxing register pressure
  • Picking optimal workgroup sizes
  • Safely trimming accuracy
  • Reordering explicit math operations

At the end of these optimizations, the shader programs are compiled which can take a few milliseconds up to half a second, just like mobile games. Once the shader programs are compiled, the new GPU inference engine is ready for action.

Upon inference for each input:

  • Inputs are moved to the GPU if necessary: The input tensors, if not already stored as GPU memory, are made accessible to the GPU by the framework by creating GL buffers/textures or MTLBuffers while also potentially copying data. As GPUs are most efficient with 4-channel data structures, tensors with channel sizes not equal to 4 are reshaped to a more GPU-friendly layout.
  • Shader programs are executed: The aforementioned shader programs are inserted into the command buffer queue and the GPU carries these out. During this step, we also manage GPU memory for intermediate tensors to keep the memory footprint of our backend as small as possible.
  • Outputs moved to CPU if necessary: Once the deep neural network has finished processing, the framework copies the result from GPU memory to CPU memory unless the output of the network can be directly rendered on screen and this transfer is not needed.

For the best experience, we recommend optimizing the input/output tensor copy and/or the network architecture. Details on such optimizations can be found at TensorFlow Lite GPU documentation. For performance best practices, please read this guide.

Читайте также:  Обновить андроид lenovo a1000

How Big Is It?

The GPU delegate will add about 270KB to an Android armeabi-v7a APK and 212KB to iOS per included architecture. However, the backend is optional, and so if you are not using the GPU delegate, you need not include it.

Future Work

This is just the beginning of our GPU support efforts. Along with community feedback, we intend to add the following improvements:

  • Expand coverage of operations
  • Further optimize performance
  • Evolve and finalize the APIs

We encourage you to leave your thoughts and comments on our GitHub and StackOverflow pages.

Acknowledgements

Andrei Kulik, Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Raman Sarokin, Yury Pisarchyk, Matthias Grundmann, Andrew Selle, Yu-Cheng Ling, Jared Duke, Lawrence Chan, Tim Davis, Pete Warden, Sarah Sirajuddin

Источник

Android quickstart

To get started with TensorFlow Lite on Android, we recommend exploring the following example.

Read TensorFlow Lite Android image classification for an explanation of the source code.

This example app uses image classification to continuously classify whatever it sees from the device’s rear-facing camera. The application can run either on device or emulator.

Inference is performed using the TensorFlow Lite Java API and the TensorFlow Lite Android Support Library. The demo app classifies frames in real-time, displaying the top most probable classifications. It allows the user to choose between a floating point or quantized model, select the thread count, and decide whether to run on CPU, GPU, or via NNAPI.

Build in Android Studio

To build the example in Android Studio, follow the instructions in README.md.

Create your own Android app

To get started quickly writing your own Android code, we recommend using our Android image classification example as a starting point.

The following sections contain some useful information for working with TensorFlow Lite on Android.

Use Android Studio ML Model Binding

To import a TensorFlow Lite (TFLite) model:

Right-click on the module you would like to use the TFLite model or click on File , then New > Other > TensorFlow Lite Model

Select the location of your TFLite file. Note that the tooling will configure the module’s dependency on your behalf with ML Model binding and all dependencies automatically inserted into your Android module’s build.gradle file.

Optional: Select the second checkbox for importing TensorFlow GPU if you want to use GPU acceleration.

The following screen will appear after the import is successful. To start using the model, select Kotlin or Java, copy and paste the code under the Sample Code section. You can get back to this screen by double clicking the TFLite model under the ml directory in Android Studio.

Use the TensorFlow Lite Task Library

TensorFlow Lite Task Library contains a set of powerful and easy-to-use task-specific libraries for app developers to create ML experiences with TFLite. It provides optimized out-of-box model interfaces for popular machine learning tasks, such as image classification, question and answer, etc. The model interfaces are specifically designed for each task to achieve the best performance and usability. Task Library works cross-platform and is supported on Java, C++, and Swift (coming soon).

To use the Task Library in your Android app, we recommend using the AAR hosted at MavenCentral for Task Vision library and Task Text library , respectively.

Читайте также:  Стильный экран блокировки для андроид

You can specify this in your build.gradle dependencies as follows:

To use nightly snapshots, make sure that you have added Sonatype snapshot repository.

See the introduction in the TensorFlow Lite Task Library overview for more details.

Use the TensorFlow Lite Android Support Library

The TensorFlow Lite Android Support Library makes it easier to integrate models into your application. It provides high-level APIs that help transform raw input data into the form required by the model, and interpret the model’s output, reducing the amount of boilerplate code required.

It supports common data formats for inputs and outputs, including images and arrays. It also provides pre- and post-processing units that perform tasks such as image resizing and cropping.

To use the Support Library in your Android app, we recommend using the TensorFlow Lite Support Library AAR hosted at MavenCentral.

You can specify this in your build.gradle dependencies as follows:

To use nightly snapshots, make sure that you have added Sonatype snapshot repository.

To get started, follow the instructions in the TensorFlow Lite Android Support Library.

Use the TensorFlow Lite AAR from MavenCentral

To use TensorFlow Lite in your Android app, we recommend using the TensorFlow Lite AAR hosted at MavenCentral.

You can specify this in your build.gradle dependencies as follows:

To use nightly snapshots, make sure that you have added Sonatype snapshot repository.

This AAR includes binaries for all of the Android ABIs. You can reduce the size of your application’s binary by only including the ABIs you need to support.

We recommend most developers omit the x86 , x86_64 , and arm32 ABIs. This can be achieved with the following Gradle configuration, which specifically includes only armeabi-v7a and arm64-v8a , which should cover most modern Android devices.

To learn more about abiFilters , see NdkOptions in the Android Gradle documentation.

Build Android app using C++

There are two ways to use TFLite through C++ if you build your app with the NDK:

Use TFLite C API

This is the recommended approach. Download the TensorFlow Lite AAR hosted at MavenCentral, rename it to tensorflow-lite-*.zip , and unzip it. You must include the four header files in headers/tensorflow/lite/ and headers/tensorflow/lite/c/ folder and the relevant libtensorflowlite_jni.so dynamic library in jni/ folder in your NDK project.

The c_api.h header file contains basic documentation about using the TFLite C API.

Use TFLite C++ API

If you want to use TFLite through C++ API, you can build the C++ shared libraries:

Currently, there is no straightforward way to extract all header files needed, so you must include all header files in tensorflow/lite/ from the TensorFlow repository. Additionally, you will need header files from FlatBuffers and Abseil.

Min SDK version of TFLite

Library minSdkVersion Device Requirements
tensorflow-lite 19 NNAPI usage requires API 27+
tensorflow-lite-gpu 19 GLES 3.1 or OpenCL (typically only available on API 21+
tensorflow-lite-hexagon 19
tensorflow-lite-support 19
tensorflow-lite-task-vision 21 android.graphics.Color related API requires API 26+
tensorflow-lite-task-text 21
tensorflow-lite-task-audio 23
tensorflow-lite-metadata 19

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Источник

Оцените статью