# Barracuda PoseNet Tutorial Pt. 4 (Outdated)

**Version 2:** Part 1

**Last Updated:** Nov 30, 2020

### Previous: Part 3

## Introduction

The post processing phase consists of a few main steps. We need to first determine the region of the image that the model estimates is most likely to contain a given key point. We’ll then refine this estimate using the output from the `offsetsLayer`

. Lastly, we’ll account for any changes in aspect ratio and scale the key point locations up to the source resolution.

So far, major operations have been performed on the GPU. We’ll be performing the post processing steps on the CPU. `Tensor`

elements need to be accessed on the main thread. Just reading the values from the model’s output layers forces the rest of the program to wait until the operation completes. Even if we perform the post processing on the GPU, we would still need to access the result on the CPU. I’m working on a way to avoid reading the values on the CPU. Unfortunately, it’s still too messy to include in this tutorial.

## Create `ProcessOutput()`

Method

The post processing steps will be handled in a new method called `ProcessOutput()`

. The method will take in the output `Tensors`

from the `predictionLayer`

and the `offsetsLayer`

.

Before filling out the function, we need to create a new constant and a new variable.

### Create `numKeypoints`

Constant

The PoseNet model estimates the 2D locations of `17`

key points on a human body.

Index | Name |
---|---|

0 | Nose |

1 | Left Eye |

2 | Right Eye |

3 | Left Ear |

4 | Right Ear |

5 | Left Shoulder |

6 | Right Shoulder |

7 | Left Elbow |

8 | Right Elbow |

9 | Left Wrist |

10 | Right Wrist |

11 | Left Hip |

12 | Right Hip |

13 | Left Knee |

14 | Right Knee |

15 | Left Ankle |

16 | Right Ankle |

Since the number of key points never changes, we’ll store it in an `int`

constant. Name the constant `numKeypoints`

and set the value to `17`

.

### Create `keypointLocations`

Variable

The processed output from the model will be stored in a new variable called `keypointLocations`

. This variable will contain the `(x,y)`

coordinates for each key point. For this tutorial, the coordinates will be scaled to the original resolution of `1920x1080`

for `videoTexture`

.

This variable will also store the confidence values associated with the coordinates. The model predicts key point locations even when there isn’t a human in the input image. In such situations, the confidence values will likely be quite low. We can decide how to handle the latest coordinates based on a confidence threshold that we pick.

There are many ways we can store this information. For simplicity, we’ll stick with an array of arrays. The array will have `17`

elements. Each element will contain the location information for the key point that matches their index.

### Retrieve Output Tenors

Call `ProcessOutput()`

after `engine.Execute(input)`

in the `Update()`

method. We’ll use the `engine.PeekOutput()`

method to get a reference to the output `Tensors`

from the model. Since they are just references, we don’t need to manually dispose of them.

Now we can start filling out the `ProcessOutput()`

method.

## Calculate Scaling Values

The heatmaps generated by the model are much smaller than the input image fed into it. We’ll need to make some calculations to accurately scale the key point locations back up to the source resolution.

### Calculate Model Stride

The heatmap dimensions are dependent on both the size of the input image and a fixed integer value called the stride. The stride determines how much smaller the heatmaps will be than the input image. The model used in this tutorial has a stride of `32`

. The heatmap dimensions are equal to the ceiling of `resolution/stride`

. With our default input resolution of `360 x 360`

, the size of the heatmaps are `12 x 12`

.

Since we know the stride for this model, we could make it a constant value. However, calculating it is an easy way to make sure. This also makes it less of a hassle when switching between models with different stride values.

#### Model with a Different Stride Value

- ResNet50 Stride 16: (download)

To get the stride value, we’ll select a dimension of `inputImage`

and subtract `1`

. We then divide that value by the same dimension of the heatmap with `1`

subtracted as well. If we don’t subtract `1`

, we’ll undershoot the stride value.

For most input resolutions this will yield a value that is slightly above the actual stride. If we left it there, the key point locations would be offset from the `videoTexture`

. To compensate, we’ll subtract the remainder of the calculated stride divided by `8`

. The stride for the PoseNet models provided in this tutorial series are all multiples of `8`

.

### Calculate Image Scale

After scaling the output back to the `inputImage`

resolution, we’ll need to scale the output up to the source resolution. We can use the dimensions of `videoTexture`

to calculate this scale.

### Calculate Aspect Ratio Scale

As I noted in Part 2, we need to compensate for the change in aspect ratio that results from resizing the image. We can use the dimensions of the `videoTexture`

to stretch the output to the original aspect ratio.

## Iterate Through Heatmaps

Now we can iterate through each of the heatmaps and determine the location of the associated key points.

### Locate Key Point Indices

For each heatmap, we’ll first need to locate the index with the highest confidence value. This indicates what region of the image the model thinks is most likely to contain that key point. We’ll create a separate method to handle this.

The new method will be called `LocateKeyPointIndex()`

and take in the `heatmaps`

and `offsets`

tensors along with the current `keypointIndex`

. It will return a `Tuple`

containing the `(x,y)`

coordinates from the heatmap index, the associated offset vector, and the confidence value at the heatmap index.

#### Call the Method

We’ll call `LocateKeyPointIndex()`

at the start of each iteration through the for loop in `ProcessOutput()`

.

### Calculate Key Point Positions

Now we can calculate the estimated key point locations relative to the source `videoTexture`

. We’ll first extract the output from the `Tuple`

returned by `LocateKeyPointIndex()`

. The offset vectors are based on the `inputImage`

resolution so we’ll scale the `(x,y)`

coordinates by the `stride`

before adding them. We’ll then scale the coordinates up to the source `videoTexture`

.

Only the x-axis position is scaled by the `unsqueezeValue`

. This is specific to our current `videoTexture`

aspect ratio. I will cover a more dynamic approach in a later post.

#### Store Key Point Positions

Finally, we’ll store the location data for the current key point at the corresponding index in the `keypointLocations`

array.

## Summary

We finally have the estimated key point locations relative to the source video. However, we still don’t have an easy means to gauge the model’s accuracy. In the next post, we’ll map each key point location to a `GameObject`

. This will provide a quick way to determine if the model is outputting nonsense as well as what scenarios the model struggles with.