Barracuda PoseNet Tutorial 2nd Edition Pt. 5
Overview
In this post, we will cover how to implement the post processing steps for single pose estimation. This method is much simpler than what is required to perform multi-pose estimation. However, it should only be used when there is a single person in the input image.
Update Utils
Script
We will implement the methods for processing the model output in the Utils
script.
Add Required Namespace
First, we need to add the Unity.Barracuda
namespace since we will be working with Tensors.
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.Barracuda;
Add Public Variables
Each key point predicted by the model has a confidence score, position, and id number associated with it. For example a nose has the id number 0
. We will define a new struct
to keep track of these values for each key point.
/// <summary>
/// Stores the heatmap score, position, and partName index for a single keypoint
/// </summary>
public struct Keypoint
{
public float score;
public Vector2 position;
public int id;
public Keypoint(float score, Vector2 position, int id)
{
this.score = score;
this.position = position;
this.id = id;
}
}
Create GetOffsetVector
Method
Next, we will create a new method to obtain the offset values associated with a given heatmap coordinate. The method will take in the X and Y values for a heatmap coordinate, the current key point id number, and the offset values from the model output.
/// <summary>
/// Get the offset values for the provided heatmap indices
/// </summary>
/// <param name="y">Heatmap column index</param>
/// <param name="x">Heatmap row index</param>
/// <param name="keypoint">Heatmap channel index</param>
/// <param name="offsets">Offsets output tensor</param>
/// <returns></returns>
public static Vector2 GetOffsetVector(int y, int x, int keypoint, Tensor offsets)
{
// Get the offset values for the provided heatmap coordinates
return new Vector2(offsets[0, y, x, keypoint + 17], offsets[0, y, x, keypoint]);
}
Create GetImageCoords
Method
We can calculate the estimated location of a key point in the input image by multiplying the heatmap coordinate by the stride value for the model and then adding the associated offset values. We will calculate the stride value for the current model in the PoseEstimator
script.
/// <summary>
/// Calculate the position of the provided key point in the input image
/// </summary>
/// <param name="part"></param>
/// <param name="stride"></param>
/// <param name="offsets"></param>
/// <returns></returns>
public static Vector2 GetImageCoords(Keypoint part, int stride, Tensor offsets)
{
// The accompanying offset vector for the current coords
= GetOffsetVector((int)part.position.y, (int)part.position.x,
Vector2 offsetVector .id, offsets);
part
// Scale the coordinates up to the input image resolution
// Add the offset vectors to refine the key point location
return (part.position * stride) + offsetVector;
}
Create DecodeSinglePose
Method
This is the method that will be called from the PoseEstimator
script after executing the model. It will take in the heatmaps and offsets from the model output along with the stride value for the model as input.
For single pose estimation, we will iterate through the heatmaps from the model output and keep track of the indices with the highest confidence value for each key point. Once we have the heatmap location with the highest confidence value, we can call the GetImageCoords
method to calculate the position of the key point in the input image. We will store each key point in a Keypoint
array.
Note: This approach should only be used when there is a single person in the input image. It is unlikely that the key points with the highest confidence scores will belong to the same body when multiple people are visible.
/// <summary>
/// Determine the estimated key point locations using the heatmaps and offsets tensors
/// </summary>
/// <param name="heatmaps">The heatmaps that indicate the confidence levels for key point locations</param>
/// <param name="offsets">The offsets that refine the key point locations determined with the heatmaps</param>
/// <returns>An array of keypoints for a single pose</returns>
public static Keypoint[] DecodeSinglePose(Tensor heatmaps, Tensor offsets, int stride)
{
[] keypoints = new Keypoint[heatmaps.channels];
Keypoint
// Iterate through heatmaps
for (int c = 0; c < heatmaps.channels; c++)
{
= new Keypoint();
Keypoint part .id = c;
part
// Iterate through heatmap columns
for (int y = 0; y < heatmaps.height; y++)
{
// Iterate through column rows
for (int x = 0; x < heatmaps.width; x++)
{
if (heatmaps[0, y, x, c] > part.score)
{
// Update the highest confidence for the current key point
.score = heatmaps[0, y, x, c];
part
// Update the estimated key point coordinates
.position.x = x;
part.position.y = y;
part}
}
}
// Calcluate the position in the input image for the current (x, y) coordinates
.position = GetImageCoords(part, stride, offsets);
part
// Add the current keypoint to the list
[c] = part;
keypoints}
return keypoints;
}
Update PoseEstimator
Script
In the PoseEstimator
script, we need to add some new variables before we can call the DecodeSinglePose
method.
Add Public Variables
First, we will define a new public enum
so that we can choose whether to perform single or multi-pose estimation from the inspector tab.
public enum EstimationType
{
,
MultiPose
SinglePose}
[Tooltip("The type of pose estimation to be performed")]
public EstimationType estimationType = EstimationType.SinglePose;
Add Private Variables
We will store the Keypoint
arrays returned by the post processing methods in an array of Keypoint
arrays. There will only be one array stored for single pose estimation, but there will be several for multi-pose estimation.
// Stores the current estimated 2D keypoint locations in videoTexture
private Utils.Keypoint[][] poses;
Create ProcessOutput
Method
We will call the postprocessing methods inside a new method called ProcessOutput
. This method will take in the IWorker
from engine
.
Method Steps
Get the four model outputs
Calculate the stride for the current model
Call the appropriate post processing method for the selected estimation type
Note: We will fill in the
else
statement when we implement the post processing steps for multi-pose estimation.Release the resources allocated for the output Tensors.
/// <summary>
/// Obtains the model output and either decodes single or mutlple poses
/// </summary>
/// <param name="engine"></param>
private void ProcessOutput(IWorker engine)
{
// Get the model output
= engine.PeekOutput(predictionLayer);
Tensor heatmaps = engine.PeekOutput(offsetsLayer);
Tensor offsets = engine.PeekOutput(displacementFWDLayer);
Tensor displacementFWD = engine.PeekOutput(displacementBWDLayer);
Tensor displacementBWD
// Calculate the stride used to scale down the inputImage
int stride = (imageDims.y - 1) / (heatmaps.shape.height - 1);
-= (stride % 8);
stride
if (estimationType == EstimationType.SinglePose)
{
// Initialize the array of Keypoint arrays
= new Utils.Keypoint[1][];
poses
// Determine the key point locations
[0] = Utils.DecodeSinglePose(heatmaps, offsets, stride);
poses}
else
{
}
// Release the resources allocated for the output Tensors
.Dispose();
heatmaps.Dispose();
offsets.Dispose();
displacementFWD.Dispose();
displacementBWD}
Modify Update
Method
We will call the ProcessOutput
method at the end of the Update
method.
// Decode the keypoint coordinates from the model output
ProcessOutput(engine.worker);
Full Code
void Update()
{
// Copy webcamTexture to videoTexture if using webcam
if (useWebcam) Graphics.Blit(webcamTexture, videoTexture);
// Prevent the input dimensions from going too low for the model
.x = Mathf.Max(imageDims.x, 64);
imageDims.y = Mathf.Max(imageDims.y, 64);
imageDims
// Update the input dimensions while maintaining the source aspect ratio
if (imageDims.x != targetDims.x)
{
= (float)videoTexture.height / videoTexture.width;
aspectRatioScale .y = (int)(imageDims.x * aspectRatioScale);
targetDims.y = targetDims.y;
imageDims.x = imageDims.x;
targetDims}
if (imageDims.y != targetDims.y)
{
= (float)videoTexture.width / videoTexture.height;
aspectRatioScale .x = (int)(imageDims.y * aspectRatioScale);
targetDims.x = targetDims.x;
imageDims.y = imageDims.y;
targetDims}
// Update the rTex dimensions to the new input dimensions
if (imageDims.x != rTex.width || imageDims.y != rTex.height)
{
.ReleaseTemporary(rTex);
RenderTexture// Assign a temporary RenderTexture with the new dimensions
= RenderTexture.GetTemporary(imageDims.x, imageDims.y, 24, rTex.format);
rTex }
// Copy the src RenderTexture to the new rTex RenderTexture
.Blit(videoTexture, rTex);
Graphics
// Prepare the input image to be fed to the selected model
ProcessImage(rTex);
// Reinitialize Barracuda with the selected model and backend
if (engine.modelType != modelType || engine.workerType != workerType)
{
.worker.Dispose();
engineInitializeBarracuda();
}
// Execute neural network with the provided input
.worker.Execute(input);
engine// Release GPU resources allocated for the Tensor
.Dispose();
input
// Decode the keypoint coordinates from the model output
ProcessOutput(engine.worker);
}
Summary
That is all we need to perform pose estimation when there is a single person in the input image. In the next post, we will implement the post processing steps for multi-pose estimation.
Previous: Part 4
Previous: Part 6
Project Resources: GitHub Repository
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.