Barracuda PoseNet Tutorial Pt. 2 (Outdated)

unity
tutorial
This post covers how to implement the preprocessing steps for the PoseNet model.
Author

Christian Mills

Published

November 4, 2020

Version 2: Part 1

Last Updated: Nov 25, 2020

Previous: Part 1

Introduction

The PoseNet model we’ll be using has a ResNet-50 architecture and was created using TensorFlow. It takes a single RGB image as input. We need to perform some preprocessing operations on the RGB channel values before feeding an image to the model. We’ll first scale the values so that they are in the same range that the model was trained on. We then subtract the mean RGB values for the ImageNet dataset.

Create a Compute Shader

We can perform the preprocessing steps more quickly on the GPU. In Unity, we accomplish this with compute shaders. Compute shaders are pieces of code that can run parallel tasks on the graphics card. This is beneficial since we need to perform the same operations on every pixel in an image. It also frees up the CPU.

Create the Asset File

Create a new folder in the Assets window and name it Shaders. Open the Shaders folder and right-click an empty space. Select Shader in the Create submenu and click Compute Shader. We’ll name it PoseNetShader.

Remove the Default Code

Open the PoseNetShader in your code editor. By default, the ComputeShader will contain the following.

Delete the CSMain function along with the #pragma kernel CSMain. Next, we need to add a Texture2D variable to store the input image. Name it InputImage and give it a data type of <half4>. Use the same data type for the Result variable as well.

Create PreprocessResNet Function

We need to make a new function to apply the ResNet preprocessing. Name the new function PreprocessResNet(). We’ll use the default [numthreads(8,8,1)].

The PreprocessResNet function scales the RGB channel values of every pixel in the InputImage by 255. By default, color values in Unity are in the range of [0,1]. The function then substracts the ImageNet mean specific to the RGB channels. The processed image is returned in the Result variable.

Channel ImageNet Mean
Red 123.15
Green 115.90
Blue 103.06

Now that we’ve created our ComputeShader, we need to execute it using a C# script.

Create the PoseNet Script

We need to make a new C# script to perform inference with the PoseNet model. When finished, this script will load the model, prepare the input, run the model, and process the output. For this post, we’ll implement the preprocessing functionality.

Create the Asset File

Create a new folder in the Assets window and name it Scripts. In the Scripts folder, right-click an empty space and select C# Script in the Create submenu.

Name the script PoseNet.

Open the script in your code editor.

Create videoTexture Variable

Above the start method, create a new public RenderTexture named videoTexture. This is the variable to which we’ll assign the video_texture that we made in part 1.

Create posenetShader Variable

We’ll also create a new public ComputeShader variable and name it posenetShader. We’ll assign the PoseNetShader to this variable in the Unity Editor.

Create PreprocessImage() Method

Next, we need to make a new method to handle the preprocessing steps for the videoTexture. We’ll name this method PreprocessImage and define it below the Update method. The method will return a Texture2D that contains the preprocessed image.

Create a New Texture2D

We don’t want to alter the videoTexture directly, so we’ll make a copy of the current frame. Create a new Texture2D called imageTexture and give it the same dimensions as the videoTexture. We can use the Graphics.CopyTexture() method to copy the data from the RenderTexture directly on the GPU.

Resize the Image

Now that we have our imageTexture, we need to resize it to a more practical resolution. Lowering the resolution does decrease the model’s accuracy. Unfortunately, using a higher resolution can significantly impact inference speed. We’ll examine this trade-off in a later post.

For now, we’ll use a resolution of 360 x 360. Create two new public int variables for the image height and width respectively. This will make it easier to experiment with different resolutions.

We’ll make a new method to handle the resizing process. The method will take in a Texture2D as well as the new height and width. It will return a Texture2D with the new resolution.

The Graphics.CopyTexture() method requires that the source and destination textures be the same size. That means we need to destroy the current imageTexture and make a temporary one with the smaller dimensions.

Note: Resizing the image to 360 x 360 will squish our input image from a 16:9 aspect ratio to a square aspect ratio. We’ll need to account for this when we get to the postprocessing section.

Apply Model-Specific Preprocessing

This is where we’ll make use of the PoseNetShader we made earlier. We’ll create a new method to handle the execution process. Name the new method PreprocessResNet to match the function in the PoseNetShader. They don’t need to have the same name. It’s just personal preference.

For this method, we need to use HDR texture formats for the RenderTexture and Texture2D. This allows us to feed images into the model with color values outside of the standard range of [0,1]. The Barracuda library remaps non-HDR color values to [0,1]. Given that we’re scaling the values by 255, this is undesirable.

You can view the full PreprocessResNet method below.

The PreprocessResNet method returns a Texture2D with an HDR texture format. The switch to HDR texture formats means the tempTex variable is no longer compatible. Fortunately, we can reuse the imageTexture variable that we emptied.

The finished PreprocessImage method looks like this.

Call the Method

We’ll call PreprocessImage() in the Update() method so that it runs every frame.

Create the Pose Estimator

To run the PoseNet script, we need to attach it to a GameObject in the Unity Editor.

Create an Empty GameObject

In the Hierarchy tab, right-click an empty space and select Create Empty from the menu. Name the empty GameObject PoseEstimator.

Attach the PoseNet Script

With the PoseEstimator object selected, drag and drop the PoseNet script into the Inspector tab.

Assign the video_texture

Next, we need to assign the video_texture asset to the Video Texture parameter. With the PoseEstimator object selected, drag and drop the video_texture asset into the Video Texture spot in the Inspector tab.

Assign the PoseNetShader

We also need to drag and drop the PoseNetShader asset into the Posenet Shader spot in the the Inspector tab.

Summary

We’re now ready to feed video frames to our PoseNet model. In part 3, we’ll cover how to install the Barracuda library and perform inference with our model.

GitHub Repository - Version 1

Next: Part 2.5(Optional) Part 3