Real-Time Object Detection in Unity With ONNX and DirectML Pt. 1

onnx
directml
yolox
object-detection
unity
tutorial
Create a dynamic link library (DLL) file in Visual Studio to perform object detection with a YOLOX model using ONNX Runtime and DirectML.
Author

Christian Mills

Published

August 18, 2022

Introduction

In this two-part tutorial series, I will show you how to implement real-time object detection in Unity using ONNX Runtime and DirectML. ONNX Runtime is a cross-platform model accelerator that works with several hardware acceleration libraries. DirectML is a hardware-accelerated DirectX 12 library for machine learning on Windows.

In Part 1, we will create a dynamic link library (DLL) file in Visual Studio to perform object detection with ONNX Runtime and DirectML. In Part 2, we will integrate this DLL file into a Unity project and perform real-time object detection. By the end of this tutorial series, you will have the skills and knowledge to leverage ONNX Runtime and DirectML in your Unity projects.

This tutorial uses an ONNX model exported in a previous tutorial series, so no modifications to the training code are required.

Unity Demo

Overview

In Part 1 of this tutorial series, we will create a dynamic link library (DLL) file in Visual Studio to perform object detection using ONNX Runtime and DirectML. We will first download OpenCV, then create a DLL project and configure it. Next, we will include dependency directories and link libraries and set up post-build events. After that, we will install ONNX Runtime and update the precompiled header and dllmain files. Finally, we will build the project to generate the DLL file. By the end of this post, you will have a DLL file ready to be integrated into a Unity project.

Important: This post assumes Visual Studio is present on your system.

Download OpenCV

We’ll use the OpenCV library to process image data from Unity. The tutorial uses OpenCV 4.6.0, which is available at the link below.

Select the opencv-4.6.0-vc14_vc15.exe option from the Assets list.

Run the executable once it finishes downloading. You might get a warning from Windows that the executable is an unrecognized app. We can bypass this by clicking the More info text, then the Run anyway button.

Then, click the Run anyway button.

The executable will prompt us to select a location to extract the opencv folder. We’ll need to give Visual Studio this location to access the library’s functionality. I tend to place my C++ dependencies in a dedicated folder for consistency.

If we open the opencv folder, we can see a build folder and a source folder. Everything we need is in the build folder.

Create DLL Project

Open Visual Studio and select the Create a new project option.

Type DLL into the text box and select the Dynamic-Link Library (DLL) option. This option automatically configures a few parameters for us compared to starting with a standard console application.

Choose a name and location for the project and click the Create button. By default, the DLL file will use the project name.

Configure the Project

At the top of the window, open the Solution Configurations dropdown menu, and select Release.

Then, open the Solution Platform dropdown menu and select x64.

Add Include Directories

We need to tell Visual Studio where OpenCV is so we can access its APIs. Right-click the project name in the Solution Explorer panel.

Select the Properties option in the popup menu.

In the Properties Window, open on the C/C++ dropdown. Select the Additional Include Directories section and click on <Edit..> in the dropdown.

Add the path for the following folder, replacing <parent-folder-path> with the full path to the parent folder for the OpenCV library, and click OK.

  • <parent-folder-path>\opencv\build\include

Post Build Events

Our DLL file will depend on the following DLL file included with OpenCV.

OpenCV DLL files

We can add a post-build event in Visual Studio to automatically copy these DLL files to the build folder for the project at compile time. Open the Build Events dropdown in the Properties window and select Post-Build Event. Select Command Line and click <Edit..>.

Add the following commands, replacing <parent-folder-path> with the full path to the parent folder for the OpenCV library, and click OK.

  • xcopy <parent-folder-path>\opencv\build\x64\vc15\bin\opencv_world452.dll $(SolutionDir)$(Platform)\$(Configuration)\ /c /y

Finally, click the Apply button and close the Properties window.

Install ONNX Runtime

We can install ONNX Runtime with DirectML directly into our project via the NuGet package manager integrated with Visual Studio. Open the Project menu and select Manage NuGet Packages... from the dropdown menu.

Select the Browse option and enter Microsoft.ML.OnnxRuntime.DirectML into the search box.

With the package selected, click the Install button. The project uses version 1.12.1.

Click OK in the popup window.

With the dependencies taken care of, we can start modifying the code.

Update Precompiled Header File

We’ll first update the pch.h Precompiled Header file with the required header files. We can open the pch.h file by selecting it in the Solution Explorer window.

Comment or remove the #include line for the framework.h header file.

// pch.h: This is a precompiled header file.
// Files listed below are compiled only once, improving build performance for future builds.
// This also affects IntelliSense performance, including code completion and many code browsing features.
// However, files listed here are ALL re-compiled if any one of them is updated between builds.
// Do not add files here that you will be updating frequently as this negates the performance advantage.

#ifndef PCH_H
#define PCH_H

// add headers that you want to pre-compile here
//#include "framework.h"

#endif //PCH_H

Add required header files

Next, we’ll add the required header files for ONNX Runtime and OpenCV below //#include "framework.h" line.

// pch.h: This is a precompiled header file.
// Files listed below are compiled only once, improving build performance for future builds.
// This also affects IntelliSense performance, including code completion and many code browsing features.
// However, files listed here are ALL re-compiled if any one of them is updated between builds.
// Do not add files here that you will be updating frequently as this negates the performance advantage.

#ifndef PCH_H
#define PCH_H

// add headers that you want to pre-compile here
//#include "framework.h"
#include <onnxruntime_cxx_api.h>
#include "dml_provider_factory.h"
#include <opencv2/opencv.hpp>

#endif //PCH_H

Update dllmain File

By default, the dllmain.cpp file contains the following code.

// dllmain.cpp : Defines the entry point for the DLL application.
#include "pch.h"

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

We can delete everything below the #include "pch.h" line.

Create a macro to mark functions we want to make accessible in Unity

// dllmain.cpp : Defines the entry point for the DLL application.
#include "pch.h"


// Create a macro to quickly mark a function for export
#define DLLExport __declspec (dllexport)

Wrap the code in extern “C” to prevent name-mangling issues with the compiler

The rest of our code will go inside here.

// Wrap code to prevent name-mangling issues
extern "C" {

}

Define variables

Inside the wrapper, we will declare the persistent variables needed for the DLL.

  • OrtApi: ONNX Runtime API interface
  • OrtEnv: Holds the logging state for the ONNX Runtime objects
  • OrtSessionOptions: Holds the options used when creating a new ONNX Runtime session
  • OrtSession: The ONNX Runtime session

// The current source image width
int img_w;
// The current source image height
int img_h;
// The current model input width
int input_w;
// The current model input height
int input_h;
// The total number pixels in the input image
int n_pixels;
// The number of color channels 
int n_channels = 3;

// Stores information about a single object prediction
struct Object
{
    float x0;
    float y0;
    float width;
    float height;
    int label;
    float prob;
};

// Store grid offset and stride values to decode a section of the model output
struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

// The scale values used to adjust the model output to the source image resolution
float scale_x;
float scale_y;

// The minimum confidence score to consider an object proposal
float bbox_conf_thresh = 0.3;
// The maximum intersection over union value before an object proposal will be ignored
float nms_thresh = 0.45;

// Stores the grid and stride values to navigate the raw model output
std::vector<GridAndStride> grid_strides;
// Stores the object proposals with confidence scores above bbox_conf_thresh
std::vector<Object> proposals;
// Stores the indices for the object proposals selected using non-maximum suppression
std::vector<int> proposal_indices;

// The stride values used to generate the gride_strides vector
std::vector<int> strides = { 8, 16, 32 };

// The mean of the ImageNet dataset used to train the model
const float mean[] = { 0.485f, 0.456f, 0.406f };
// The standard deviation of the ImageNet dataset used to train the model
const float std_dev[] = { 0.229f, 0.224f, 0.225f };

// ONNX Runtime API interface
const OrtApi* ort = NULL;

// List of available execution providers
char** provider_names;
int provider_count;

// Holds the logging state for the ONNX Runtime objects
OrtEnv* env;
// Holds the options used when creating a new ONNX Runtime session
OrtSessionOptions* session_options;
// The ONNX Runtime session
OrtSession* session;

// The name of the model input
char* input_name;
// The name of the model output
char* output_name;

// A pointer to the raw input data
float* input_data;
// The memory size of the raw input data
int input_size;

Define a function to convert char data to wchar_t

ONNX Runtime expects paths to ONNX model files to be in wchar_t format. We receive the file paths in char format, so we’ll make a function to convert them.

/// <summary>
/// Convert char data to wchar_t
/// </summary>
/// <param name="text"></param>
/// <returns></returns>
static wchar_t* charToWChar(const char* text)
{
    const size_t size = strlen(text) + 1;
    wchar_t* wText = new wchar_t[size];
    size_t converted_chars;
    mbstowcs_s(&converted_chars, wText, size, text, _TRUNCATE);
    return wText;
}

Define a function to initialize the interface to the ONNX Runtime API


/// <summary>
/// Initialize the ONNX Runtime API interface and get the available execution providers
/// </summary>
/// <returns></returns>
DLLExport void InitOrtAPI() {

    ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);

    ort->GetAvailableProviders(&provider_names, &provider_count);
}

Define a function to get the number of execution providers

The next function we’ll define will create a list of available execution providers and return the number of providers accessible by ONNX Runtime. We’ll use this information to select which device to use to perform inference from the Unity application.

/// <summary>
/// Get the number of available execution providers
/// </summary>
/// <returns>The number of available devices</returns>
DLLExport int GetProviderCount()
{
    // Return the number of available execution providers
    return provider_count;
}

Define a function to get the name of an execution provider

Next, we’ll define a function to return the name at a specified index in the list of available execution providers.

/// <summary>
/// Get the name of the execution provider at the specified index
/// </summary>
/// <param name="index"></param>
/// <returns>The name of the execution provider at the specified index</returns>
DLLExport char* GetProviderName(int index) {
    return provider_names[index];
}

Define method to generate stride values to navigate the raw model output

The method for generating the offset values used to traverse the output array is almost identical to the Python implementation from the training tutorial.

/// <summary>
/// Generate offset values to navigate the raw model output
/// </summary>
/// <param name="height">The model input height</param>
/// <param name="width">The model input width</param>
void GenerateGridsAndStride(int height, int width)
{
    // Remove the values for the previous input resolution
    grid_strides.clear();

    // Iterate through each stride value
    for (auto stride : strides)
    {
        // Calculate the grid dimensions
        int grid_height = height / stride;
        int grid_width = width / stride;

        // Store each combination of grid coordinates
        for (int g1 = 0; g1 < grid_height; g1++)
        {
            for (int g0 = 0; g0 < grid_width; g0++)
            {
                grid_strides.push_back(GridAndStride{ g0, g1, stride });
            }
        }
    }
}

Define a function to set the minimum confidence score from Unity

We might want to try different confidence thresholds for keeping object proposals from the Unity application, so we’ll add a function to enable this.

/// <summary>
/// Set minimum confidence score for keeping bounding box proposals
/// </summary>
/// <param name="min_confidence">The minimum confidence score for keeping bounding box proposals</param>
DLLExport void SetConfidenceThreshold(float min_confidence)
{
    bbox_conf_thresh = min_confidence;
}

Define function to refresh memory when switching models or execution providers


/// <summary>
/// Refresh memory when switching models or execution providers
/// </summary>
DLLExport void RefreshMemory() {
    if (input_data) free(input_data);
    if (session) ort->ReleaseSession(session);
    if (env) ort->ReleaseEnv(env);
}

Define a function to load an ONNX model


/// <summary>
/// Load a model from the specified file path
/// </summary>
/// <param name="model_path">The full model path to the ONNX model</param>
/// <param name="execution_provider">The name for the desired execution_provider</param>
/// <param name="image_dims">The source image dimensions</param>
/// <returns>A status value indicating success or failure to load and reshape the model</returns>
DLLExport int LoadModel(char* model_path, char* execution_provider, int image_dims[2])
{
    int return_val = 0;

    // Initialize the ONNX runtime environment
    std::string instance_name = "yolox-inference";
    ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, instance_name.c_str(), &env);

    // Disable telemetry
    ort->DisableTelemetryEvents(env);

    // Add the selected execution provider
    ort->CreateSessionOptions(&session_options);
    std::string provider_name = execution_provider;
    
    // Add the specified execution provider
    if (provider_name.find("CPU") != std::string::npos) {
        return_val = 1;
    }
    else if (provider_name.find("Dml") != std::string::npos) {
        ort->DisableMemPattern(session_options);
        ort->SetSessionExecutionMode(session_options, ExecutionMode::ORT_SEQUENTIAL);
        OrtSessionOptionsAppendExecutionProvider_DML(session_options, 0);
    }
    else return_val = 1;

    // Create a new inference session
    ort->CreateSession(env, charToWChar(model_path), session_options, &session);
    ort->ReleaseSessionOptions(session_options);

    Ort::AllocatorWithDefaultOptions allocator;

    // Get input and output names
    ort->SessionGetInputName(session, 0, allocator, &input_name);
    ort->SessionGetOutputName(session, 0, allocator, &output_name);

    // The dimensions of the source input image
    img_w = image_dims[0];
    img_h = image_dims[1];
    // Calculate new input dimensions based on the max stride value
    input_w = (int)(strides.back() * std::roundf(img_w / strides.back()));
    input_h = (int)(strides.back() * std::roundf(img_h / strides.back()));
    n_pixels = input_w * input_h;

    // Calculate the value used to adjust the model output to the source image resolution
    scale_x = input_w / (img_w * 1.0);
    scale_y = input_h / (img_h * 1.0);

    // Generate the grid and stride values based on input resolution
    GenerateGridsAndStride(input_h, input_w);

    // Replace the initial input dims with the updated values
    image_dims[0] = input_w;
    image_dims[1] = input_h;

    // Allocate memory for the raw input data
    input_size = n_pixels * n_channels * (int)sizeof(float);
    input_data = (float*)malloc((size_t)input_size * sizeof(float*));
    if (input_data != NULL) memset(input_data, 0, input_size);

    // Return a value of 0 if the model loads successfully
    return return_val;
}

Define method to generate object detection proposals from the raw model output

The method to generate object proposals is nearly identical to the Python implementation from the training tutorial.

/// <summary>
/// Generate object detection proposals from the raw model output
/// </summary>
/// <param name="out_ptr">A pointer to the output tensor data</param>
void GenerateYoloxProposals(float* out_ptr, int proposal_length)
{
    // Remove the proposals for the previous model output
    proposals.clear();

    // Obtain the number of classes the model was trained to detect
    int num_classes = proposal_length - 5;

    for (int anchor_idx = 0; anchor_idx < grid_strides.size(); anchor_idx++)
    {
        // Get the current grid and stride values
        int grid0 = grid_strides[anchor_idx].grid0;
        int grid1 = grid_strides[anchor_idx].grid1;
        int stride = grid_strides[anchor_idx].stride;

        // Get the starting index for the current proposal
        int start_idx = anchor_idx * proposal_length;

        // Get the coordinates for the center of the predicted bounding box
        float x_center = (out_ptr[start_idx + 0] + grid0) * stride;
        float y_center = (out_ptr[start_idx + 1] + grid1) * stride;

        // Get the dimensions for the predicted bounding box
        float w = exp(out_ptr[start_idx + 2]) * stride;
        float h = exp(out_ptr[start_idx + 3]) * stride;

        // Calculate the coordinates for the upper left corner of the bounding box
        float x0 = x_center - w * 0.5f;
        float y0 = y_center - h * 0.5f;

        // Get the confidence score that an object is present
        float box_objectness = out_ptr[start_idx + 4];

        // Initialize object struct with bounding box information
        Object obj = { x0, y0, w, h, 0, 0 };

        // Find the object class with the highest confidence score
        for (int class_idx = 0; class_idx < num_classes; class_idx++)
        {
            // Get the confidence score for the current object class
            float box_cls_score = out_ptr[start_idx + 5 + class_idx];
            // Calculate the final confidence score for the object proposal
            float box_prob = box_objectness * box_cls_score;

            // Check for the highest confidence score
            if (box_prob > obj.prob)
            {
                obj.label = class_idx;
                obj.prob = box_prob;
            }
        }

        // Only add object proposals with high enough confidence scores
        if (obj.prob > bbox_conf_thresh) proposals.push_back(obj);
    }

    // Sort the proposals based on the confidence score in descending order
    std::sort(proposals.begin(), proposals.end(), [](Object& a, Object& b) -> bool
              { return a.prob > b.prob; });
}

Define function to sort bounding box proposals using Non-Maximum Suppression

The C++ API for OpenCV has built-in functionality to perform comparison operations between rectangles. Therefore, we don’t need to define helper functions to calculate the intersection and union areas of two bounding boxes. Otherwise, the method to sort bounding box proposals using Non-Maximum Suppression is almost identical to the Python implementation from part 1.

/// <summary>
/// Filter through a sorted list of object proposals using Non-maximum suppression
/// </summary>
void NmsSortedBboxes()
{
    // Remove the picked proposals for the previous model outptut
    proposal_indices.clear();

    // Iterate through the object proposals
    for (int i = 0; i < proposals.size(); i++)
    {
        Object& a = proposals[i];

        // Create OpenCV rectangle for the Object bounding box
        cv::Rect_<float> rect_a = cv::Rect_<float>(a.x0, a.y0, a.width, a.height);

        bool keep = true;

        // Check if the current object proposal overlaps any selected objects too much
        for (int j : proposal_indices)
        {
            Object& b = proposals[j];

            // Create OpenCV rectangle for the Object bounding box
            cv::Rect_<float> rect_b = cv::Rect_<float>(b.x0, b.y0, b.width, b.height);

            // Calculate the area where the two object bounding boxes overlap
            float inter_area = (rect_a & rect_b).area();
            // Calculate the union area of both bounding boxes
            float union_area = rect_a.area() + rect_b.area() - inter_area;
            // Ignore object proposals that overlap selected objects too much
            if (inter_area / union_area > nms_thresh)
                keep = false;
        }

        // Keep object proposals that do not overlap selected objects too much
        if (keep) proposal_indices.push_back(i);
    }
}

Define a function to perform inference

We will access the pixel data for the input image from Unity with a pointer to a uchar (unsigned 1-byte integer) array and wrap the data in a cv::Mat variable for processing.

After processing the model output, we’ll return the final number of detected objects to Unity so we can initialize an array of objects.


/// <summary>
/// Perform inference with the provided texture data
/// </summary>
/// <param name="image_data">The source image data from Unity</param>
/// <returns>The final number of detected objects</returns>
DLLExport int PerformInference(uchar* image_data)
{
    // Store the pixel data for the source input image in an OpenCV Mat
    cv::Mat input_image = cv::Mat(img_h, img_w, CV_8UC4, image_data);
    // Remove the alpha channel
    cv::cvtColor(input_image, input_image, cv::COLOR_RGBA2RGB);
    // Resize the image to the model input dimensions
    cv::resize(input_image, input_image, cv::Size(input_w, input_h));

    // Iterate over each pixel in image
    for (int p = 0; p < n_pixels; p++)
    {
        for (int ch = 0; ch < n_channels; ch++) {
            // Scale and normalize each value
            input_data[ch * n_pixels + p] = ((input_image.data[p * n_channels + ch] / 255.0f) - mean[ch]) / std_dev[ch];
        }
    }

    // Initialize list of input and output names
    const char* input_names[] = { input_name };
    const char* output_names[] = { output_name };
    // Initialize the list of model input dimension
    int64_t input_shape[] = { 1, 3, input_h, input_w };
    int input_shape_len = sizeof(input_shape) / sizeof(input_shape[0]);

    // Initialize an input tensor object with the input_data
    OrtMemoryInfo* memory_info;
    ort->CreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info);

    OrtValue* input_tensor = NULL;
    ort->CreateTensorWithDataAsOrtValue(memory_info, input_data, input_size, input_shape,
                                        input_shape_len, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT,
                                        &input_tensor);

    ort->ReleaseMemoryInfo(memory_info);


    OrtValue* output_tensor = NULL;
    // Perform inference
    ort->Run(session, NULL, input_names, (const OrtValue* const*)&input_tensor, 1, output_names, 1,
             &output_tensor);

    // Make sure the output tensor is not NULL to avoid potential crashes
    if (output_tensor == NULL) {
        ort->ReleaseValue(input_tensor);
        ort->ReleaseValue(output_tensor);
        return -1;
    }

    // Get the length of a single object proposal (i.e., number of object classes + 5)
    OrtTensorTypeAndShapeInfo* output_tensor_info;
    ort->GetTensorTypeAndShape(output_tensor, &output_tensor_info);
    size_t output_length[1] = {};
    ort->GetDimensionsCount(output_tensor_info, output_length);
    int64_t output_dims[3] = {};
    ort->GetDimensions(output_tensor_info, output_dims, *output_length);

    // Access model output
    float* out_data;
    ort->GetTensorMutableData(output_tensor, (void**)&out_data);

    // Generate new proposals for the current model output
    GenerateYoloxProposals(out_data, output_dims[2]);

    // Pick detected objects to keep using Non-maximum Suppression
    NmsSortedBboxes();

    // Free memory for input and output tensors
    ort->ReleaseValue(input_tensor);
    ort->ReleaseValue(output_tensor);

    // return the final number of detected objects
    return (int)proposal_indices.size();
}

Define a function to populate an array of objects from Unity

Next, we’ll define a function to populate an array of objects from Unity. We call this function after initializing the list based on the current number of detected objects. We’ll also scale the bounding box information from the input dimensions to the source image resolution.

/// <summary>
/// Fill the provided array with the detected objects
/// </summary>
/// <param name="objects">A pointer to a list of objects from Unity</param>
DLLExport void PopulateObjectsArray(Object* objects) 
{

    for (int i = 0; i < proposal_indices.size(); i++)
    {
        Object obj = proposals[proposal_indices[i]];

        // Adjust offset to source image resolution and clamp the bounding box
        objects[i].x0 = std::min(obj.x0 / scale_x, (float)img_w);
        objects[i].y0 = std::min(obj.y0 / scale_y, (float)img_h);
        objects[i].width = std::min(obj.width / scale_x, (float)img_w);
        objects[i].height = std::min(obj.height / scale_y, (float)img_h);

        objects[i].label = obj.label;
        objects[i].prob = obj.prob;
    }
}

Define a function to clear memory when the Unity application exits

This last function will free the memory allocated by the plugin. We’ll call it when the Unity application shuts down.

/// <summary>
/// Free memory
/// </summary>
DLLExport void FreeResources()
{
    grid_strides.clear();
    proposals.clear();
    proposal_indices.clear();

    free(input_data);
    ort->ReleaseSession(session);
    ort->ReleaseEnv(env);
}

That is all the code needed for the plugin. We can now build the solution to generate the DLL file.

Build Solution

Open the Build menu at the top of the Visual Studio window and click Build Solution. Visual Studio will generate a new x64 folder in the project directory containing the DLL file and its dependencies.

Gather Dependencies

Right-click the project name in the Solution Explorer panel and select Open Folder in File Explorer from the popup menu.

In the new File Explorer window, go to the parent folder.

Open the x64 → Release subfolder.

We’ll need to copy all the DLL files in this folder to the Unity project.

Summary

In this post, we created a dynamic link library (DLL) file in Visual Studio to perform object detection using ONNX Runtime and DirectML. We started by downloading OpenCV and creating a DLL project. Then, we configured the project by adding include directories, linking libraries, and setting up post-build events. After, we installed ONNX Runtime and updated the precompiled header and dllmain files. Finally, we built the project to generate the DLL file. In Part 2, we will cover how to integrate this DLL file into a Unity project and perform real-time object detection.

Training Tutorial: End-to-End Object Detection for Unity With IceVision and OpenVINO Pt. 1

Next: Object Detection for Unity With ONNX Runtime and DirectML Pt. 2

Project Resources: GitHub Repository