How to Create an OpenVINO Plugin for Unity on Windows Pt. 1

fastai
openvino
unity
tutorial
Modify the training code from the fastai-to-unity tutorial to export the model to OpenVINO.
Author

Christian Mills

Published

July 17, 2022

Introduction

This tutorial is a follow-up to the fastai-to-unity tutorial series and covers using OpenVINO, an open-source toolkit for optimizing model inference, instead of Unity’s Barracuda library. OpenVINO enables significantly faster CPU inference than Barracuda and supports more model types. It also supports GPU inference for integrated and discrete Intel GPUs and will be able to leverage the AI hardware acceleration available in Intel’s upcoming ARC GPUs.

We’ll modify the original tutorial code and create a dynamic link library (DLL) file to access the OpenVINO functionality in Unity.

Overview

This post covers the required modifications to the original training code. We’ll finetune models from the Timm library on the same ASL dataset as the original tutorial, just like in this previous follow-up. Below is a link to the complete modified training code, along with links for running the notebook on Google Colab and Kaggle.

GitHub Repository Colab         Kaggle        
Jupyter Notebook Open in Colab Open in Kaggle

Install Dependencies

The pip package for the Timm library is generally more stable than the GitHub repository but may have fewer model types and pretrained weights. However, the latest pip version had some issues running the MobileNetV3 models at the time of writing. Downgrade to version 0.5.4 to use those models.

Recent updates to the fastai library resolve some performance issues with PyTorch so let’s update that too.

We need to install the openvino-dev pip package to convert trained models to OpenVINO’s Intermediate Representation (IR) format.

Uncomment the cell below if running on Google Colab or Kaggle

# %%capture
# !pip3 install -U torch torchvision torchaudio
# !pip3 install -U fastai==2.7.6
# !pip3 install -U kaggle==1.5.12
# !pip3 install -U Pillow==9.1.0
# !pip3 install -U timm==0.6.5 # more stable fewer models
# # !pip3 install -U git+https://github.com/rwightman/pytorch-image-models.git # more models less stable
# !pip3 install openvino-dev==2022.1.0 

Note for Colab: You must restart the runtime in order to use newly installed version of Pillow.

Import all fastai computer vision functionality

from fastai.vision.all import *
import fastai
fastai.__version__
'2.7.6'

Disable max rows and columns for pandas

import pandas as pd
pd.set_option('max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

Select a Model

Let’s start by selecting a model from the Timm library to finetune. The available pretrained models depend on the version of the Timm library installed.

Import the Timm library

import timm
timm.__version__
'0.6.5'

Check available pretrained model types

We can check which model types have pretrained weights using the timm.list_models() function.

model_types = list(set([model.split('_')[0] for model in timm.list_models(pretrained=True)]))
model_types.sort()
pd.DataFrame(model_types)
0
0 adv
1 bat
2 beit
3 botnet26t
4 cait
5 coat
6 convit
7 convmixer
8 convnext
9 crossvit
10 cs3darknet
11 cspdarknet53
12 cspresnet50
13 cspresnext50
14 darknet53
15 deit
16 deit3
17 densenet121
18 densenet161
19 densenet169
20 densenet201
21 densenetblur121d
22 dla102
23 dla102x
24 dla102x2
25 dla169
26 dla34
27 dla46
28 dla46x
29 dla60
30 dla60x
31 dm
32 dpn107
33 dpn131
34 dpn68
35 dpn68b
36 dpn92
37 dpn98
38 eca
39 ecaresnet101d
40 ecaresnet269d
41 ecaresnet26t
42 ecaresnet50d
43 ecaresnet50t
44 ecaresnetlight
45 edgenext
46 efficientnet
47 efficientnetv2
48 ens
49 ese
50 fbnetc
51 fbnetv3
52 gc
53 gcresnet33ts
54 gcresnet50t
55 gcresnext26ts
56 gcresnext50ts
57 gernet
58 ghostnet
59 gluon
60 gmixer
61 gmlp
62 halo2botnet50ts
63 halonet26t
64 halonet50ts
65 haloregnetz
66 hardcorenas
67 hrnet
68 ig
69 inception
70 jx
71 lambda
72 lamhalobotnet50ts
73 lcnet
74 legacy
75 levit
76 mixer
77 mixnet
78 mnasnet
79 mobilenetv2
80 mobilenetv3
81 mobilevit
82 mobilevitv2
83 nasnetalarge
84 nf
85 nfnet
86 pit
87 pnasnet5large
88 poolformer
89 regnetv
90 regnetx
91 regnety
92 regnetz
93 repvgg
94 res2net101
95 res2net50
96 res2next50
97 resmlp
98 resnest101e
99 resnest14d
100 resnest200e
101 resnest269e
102 resnest26d
103 resnest50d
104 resnet101
105 resnet101d
106 resnet10t
107 resnet14t
108 resnet152
109 resnet152d
110 resnet18
111 resnet18d
112 resnet200d
113 resnet26
114 resnet26d
115 resnet26t
116 resnet32ts
117 resnet33ts
118 resnet34
119 resnet34d
120 resnet50
121 resnet50d
122 resnet51q
123 resnet61q
124 resnetaa50
125 resnetblur50
126 resnetrs101
127 resnetrs152
128 resnetrs200
129 resnetrs270
130 resnetrs350
131 resnetrs420
132 resnetrs50
133 resnetv2
134 resnext101
135 resnext26ts
136 resnext50
137 resnext50d
138 rexnet
139 sebotnet33ts
140 sehalonet33ts
141 selecsls42b
142 selecsls60
143 selecsls60b
144 semnasnet
145 sequencer2d
146 seresnet152d
147 seresnet33ts
148 seresnet50
149 seresnext101
150 seresnext101d
151 seresnext26d
152 seresnext26t
153 seresnext26ts
154 seresnext50
155 seresnextaa101d
156 skresnet18
157 skresnet34
158 skresnext50
159 spnasnet
160 ssl
161 swin
162 swinv2
163 swsl
164 tf
165 tinynet
166 tnt
167 tresnet
168 tv
169 twins
170 vgg11
171 vgg13
172 vgg16
173 vgg19
174 visformer
175 vit
176 volo
177 wide
178 xception
179 xception41
180 xception41p
181 xception65
182 xception65p
183 xception71
184 xcit

Timm provides many pretrained models, but not all of them are fast enough for real-time applications. We can filter the results by providing a full or partial model name.

Check available pretrained ConvNeXt models

pd.DataFrame(timm.list_models('convnext*', pretrained=True))
0
0 convnext_base
1 convnext_base_384_in22ft1k
2 convnext_base_in22ft1k
3 convnext_base_in22k
4 convnext_large
5 convnext_large_384_in22ft1k
6 convnext_large_in22ft1k
7 convnext_large_in22k
8 convnext_small
9 convnext_small_384_in22ft1k
10 convnext_small_in22ft1k
11 convnext_small_in22k
12 convnext_tiny
13 convnext_tiny_384_in22ft1k
14 convnext_tiny_hnf
15 convnext_tiny_in22ft1k
16 convnext_tiny_in22k
17 convnext_xlarge_384_in22ft1k
18 convnext_xlarge_in22ft1k
19 convnext_xlarge_in22k

Let’s go with the convnext_tiny model since we want higher framerates. Each model comes with a set of default configuration parameters. We must keep track of the mean and std values used to normalize the model input.

Inspect the default configuration for the convnext_tiny model

from timm.models import convnext
convnext_model = 'convnext_tiny'
pd.DataFrame.from_dict(convnext.default_cfgs[convnext_model], orient='index')
0
url https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth
num_classes 1000
input_size (3, 224, 224)
pool_size (7, 7)
crop_pct 0.875
interpolation bicubic
mean (0.485, 0.456, 0.406)
std (0.229, 0.224, 0.225)
first_conv stem.0
classifier head.fc

Check available pretrained MobileNetV2 models

pd.DataFrame(timm.list_models('mobilenetv2*', pretrained=True))
0
0 mobilenetv2_050
1 mobilenetv2_100
2 mobilenetv2_110d
3 mobilenetv2_120d
4 mobilenetv2_140

Inspect the default configuration for the mobilenetv2_100 model

from timm.models import efficientnet
mobilenetv2_model = 'mobilenetv2_100'
pd.DataFrame.from_dict(efficientnet.default_cfgs[mobilenetv2_model], orient='index')
0
url https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mobilenetv2_100_ra-b33bc2c4.pth
num_classes 1000
input_size (3, 224, 224)
pool_size (7, 7)
crop_pct 0.875
interpolation bicubic
mean (0.485, 0.456, 0.406)
std (0.229, 0.224, 0.225)
first_conv conv_stem
classifier classifier

Check available pretrained ResNet models

pd.DataFrame(timm.list_models('resnet*', pretrained=True))
0
0 resnet10t
1 resnet14t
2 resnet18
3 resnet18d
4 resnet26
5 resnet26d
6 resnet26t
7 resnet32ts
8 resnet33ts
9 resnet34
10 resnet34d
11 resnet50
12 resnet50_gn
13 resnet50d
14 resnet51q
15 resnet61q
16 resnet101
17 resnet101d
18 resnet152
19 resnet152d
20 resnet200d
21 resnetaa50
22 resnetblur50
23 resnetrs50
24 resnetrs101
25 resnetrs152
26 resnetrs200
27 resnetrs270
28 resnetrs350
29 resnetrs420
30 resnetv2_50
31 resnetv2_50d_evos
32 resnetv2_50d_gn
33 resnetv2_50x1_bit_distilled
34 resnetv2_50x1_bitm
35 resnetv2_50x1_bitm_in21k
36 resnetv2_50x3_bitm
37 resnetv2_50x3_bitm_in21k
38 resnetv2_101
39 resnetv2_101x1_bitm
40 resnetv2_101x1_bitm_in21k
41 resnetv2_101x3_bitm
42 resnetv2_101x3_bitm_in21k
43 resnetv2_152x2_bit_teacher
44 resnetv2_152x2_bit_teacher_384
45 resnetv2_152x2_bitm
46 resnetv2_152x2_bitm_in21k
47 resnetv2_152x4_bitm
48 resnetv2_152x4_bitm_in21k

Inspect the default configuration for the resnet10t model

from timm.models import resnet
resnet_model = 'resnet10t'
pd.DataFrame.from_dict(resnet.default_cfgs[resnet_model], orient='index')
0
url https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet10t_176_c3-f3215ab1.pth
num_classes 1000
input_size (3, 176, 176)
pool_size (6, 6)
crop_pct 0.875
interpolation bilinear
mean (0.485, 0.456, 0.406)
std (0.229, 0.224, 0.225)
first_conv conv1.0
classifier fc
test_crop_pct 0.95
test_input_size (3, 224, 224)

Select a model

# model_type = convnext
# model_name = convnext_model
# model_type = efficientnet
# model_name = mobilenetv2_model
model_type = resnet
model_name = resnet_model

Store normalization stats

mean = model_type.default_cfgs[model_name]['mean']
std = model_type.default_cfgs[model_name]['std']
mean, std
((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

Modify Transforms

We can apply the normalization stats at the end of the batch transforms.

item_tfms = [FlipItem(p=1.0), Resize(input_dims, method=ResizeMethod.Pad, pad_mode=PadMode.Border)]

batch_tfms = [
    Contrast(max_lighting=0.25),
    Saturation(max_lighting=0.25),
    Hue(max_hue=0.05),
    *aug_transforms(
        size=input_dims, 
        mult=1.0,
        do_flip=False,
        flip_vert=False,
        max_rotate=0.0,
        min_zoom=0.5,
        max_zoom=1.5,
        max_lighting=0.5,
        max_warp=0.2, 
        p_affine=0.0,
        pad_mode=PadMode.Border),
    Normalize.from_stats(mean=mean, std=std)
]

Define Learner

The training process is identical to the original tutorial, and we only need to pass the name of the Timm model to the vision_learner object.

learn = vision_learner(dls, model_name, metrics=metrics).to_fp16()

Export the Model

The OpenVINO model conversion script does not support PyTorch models, so we need to export the trained model to ONNX. We can then convert the ONNX model to OpenVINO’s IR format.

Define ONNX file name

onnx_file_name = f"{dataset_path.name}-{learn.arch}.onnx"
onnx_file_name
'asl-and-some-words-resnet10t.onnx'

Export trained model to ONNX

torch.onnx.export(learn.model.cpu(),
                  batched_tensor,
                  onnx_file_name,
                  export_params=True,
                  opset_version=11,
                  do_constant_folding=False,
                  input_names = ['input'],
                  output_names = ['output'],
                  dynamic_axes={'input': {2 : 'height', 3 : 'width'}}
                 )

Now we can define the argument for OpenVINO’s model conversion script.

Import OpenVINO Dependencies

from IPython.display import Markdown, display
from openvino.runtime import Core

Define export directory

output_dir = Path('./')
output_dir
Path('.')

Define path for OpenVINO IR xml model file

The conversion script generates an XML containing information about the model architecture and a BIN file that stores the trained weights. We need both files to perform inference. OpenVINO uses the same name for the BIN file as provided for the XML file.

ir_path = Path(f"{onnx_file_name.split('.')[0]}.xml")
ir_path
Path('asl-and-some-words-resnet10t.xml')

Define arguments for model conversion script

OpenVINO provides the option to include the normalization stats in the IR model. That way, we don’t need to account for different normalization stats when performing inference with multiple models. We can also convert the model to FP16 precision to reduce file size and improve inference speed.

# Construct the command for Model Optimizer
mo_command = f"""mo
                 --input_model "{onnx_file_name}"
                 --input_shape "[1,3, {input_dims[0]}, {input_dims[1]}]"
                 --mean_values="{mean}"
                 --scale_values="{std}"
                 --data_type FP16
                 --output_dir "{output_dir}"
                 """
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert the ONNX model to OpenVINO:")
display(Markdown(f"`{mo_command}`"))
Model Optimizer command to convert the ONNX model to OpenVINO:
mo --input_model "asl-and-some-words-resnet10t.onnx" --input_shape "[1,3, 216, 384]" --mean_values="(0.485, 0.456, 0.406)" --scale_values="(0.229, 0.224, 0.225)" --data_type FP16 --output_dir "."

Convert ONNX model to OpenVINO IR

if not ir_path.exists():
    print("Exporting ONNX model to IR... This may take a few minutes.")
    mo_result = %sx $mo_command
    print("\n".join(mo_result))
else:
    print(f"IR model {ir_path} already exists.")
    Exporting ONNX model to IR... This may take a few minutes.
    Model Optimizer arguments:
    Common parameters:
        - Path to the Input Model:  /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.onnx
        - Path for generated IR:    /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/.
        - IR output name:   asl-and-some-words-resnet10t
        - Log level:    ERROR
        - Batch:    Not specified, inherited from the model
        - Input layers:     Not specified, inherited from the model
        - Output layers:    Not specified, inherited from the model
        - Input shapes:     [1,3, 216, 384]
        - Source layout:    Not specified
        - Target layout:    Not specified
        - Layout:   Not specified
        - Mean values:  (0.485, 0.456, 0.406)
        - Scale values:     (0.229, 0.224, 0.225)
        - Scale factor:     Not specified
        - Precision of IR:  FP16
        - Enable fusing:    True
        - User transformations:     Not specified
        - Reverse input channels:   False
        - Enable IR generation for fixed input shape:   False
        - Use the transformations config file:  None
    Advanced parameters:
        - Force the usage of legacy Frontend of Model Optimizer for model conversion into IR:   False
        - Force the usage of new Frontend of Model Optimizer for model conversion into IR:  False
    OpenVINO runtime found in:  /home/innom-dt/mambaforge/envs/fastai-openvino/lib/python3.9/site-packages/openvino
    OpenVINO runtime version:   2022.1.0-7019-cdb9bec7210-releases/2022/1
    Model Optimizer version:    2022.1.0-7019-cdb9bec7210-releases/2022/1
    [ WARNING ]  
    Detected not satisfied dependencies:
        numpy: installed: 1.23.0, required: < 1.20
    
    Please install required versions of components or run pip installation
    pip install openvino-dev
    [ SUCCESS ] Generated IR version 11 model.
    [ SUCCESS ] XML file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.xml
    [ SUCCESS ] BIN file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.bin
    [ SUCCESS ] Total execution time: 0.43 seconds. 
    [ SUCCESS ] Memory consumed: 123 MB. 
    It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2022_bu_IOTG_OpenVINO-2022-1&content=upg_all&medium=organic or on the GitHub*
    [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
    Find more information about API v2.0 and IR v11 at https://docs.openvino.ai

Benchmark OpenVINO Inference

Now we can compare inference speed between OpenVINO and PyTorch. OpenVINO supports inference with ONNX models in addition to its IR format.

Get available OpenVINO compute devices

OpenVINO does not support GPU inference with non-Intel GPUs.

devices = ie.available_devices
for device in devices:
    device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")
CPU: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

Create normalized input for ONNX model

normalized_input_image = batched_tensor.cpu().detach().numpy()
normalized_input_image.shape
(1, 3, 224, 224)

Test ONNX model using OpenVINO

# Load network to Inference Engine
ie = Core()
model_onnx = ie.read_model(model=onnx_file_name)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

input_layer_onnx = next(iter(compiled_model_onnx.inputs))
output_layer_onnx = next(iter(compiled_model_onnx.outputs))

# Run inference on the input image
res_onnx = compiled_model_onnx(inputs=[normalized_input_image])[output_layer_onnx]
learn.dls.vocab[np.argmax(res_onnx)]
'J'

Benchmark ONNX model CPU inference speed

%%timeit
compiled_model_onnx(inputs=[normalized_input_image])[output_layer_onnx]
3.62 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Prepare input image for OpenVINO IR model

input_image = scaled_tensor.unsqueeze(dim=0)
input_image.shape
torch.Size([1, 3, 224, 224])

Test OpenVINO IR model

# Load the network in Inference Engine
ie = Core()
model_ir = ie.read_model(model=ir_path)
model_ir.reshape(input_image.shape)
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get input and output layers
input_layer_ir = next(iter(compiled_model_ir.inputs))
output_layer_ir = next(iter(compiled_model_ir.outputs))

# Run inference on the input image
res_ir = compiled_model_ir([input_image])[output_layer_ir]
learn.dls.vocab[np.argmax(res_ir)]
'J'

Benchmark OpenVINO IR model CPU inference speed

%%timeit
compiled_model_ir([input_image])[output_layer_ir]
3.39 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note: The IR model is slightly faster than the ONNX model and half the file size.

Benchmark PyTorch model GPU inference speed

%%timeit
with torch.no_grad(): preds = learn.model.cuda()(batched_tensor.cuda())
1.81 ms ± 5.52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

PyTorch inference with a Titan RTX is still faster than OpenVINO inference with an i7-11700K for a ResNet10 model. However, OpenVINO CPU inference is often faster when using models optimized for mobile devices, like MobileNet.

Benchmark PyTorch model CPU inference speed

%%timeit
with torch.no_grad(): preds = learn.model.cpu()(batched_tensor.cpu())
8.94 ms ± 52.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

OpenVINO is easily faster than PyTorch for CPU inference.

Summary

This post covered how to modify the training code from the fastai-to-unity tutorialto finetune models from the Timm library and export them as OpenVINO IR models. Part 2 will cover creating a dynamic link library (DLL) file in Visual Studio to perform inference with these models using OpenVINO.

Previous: Fastai to Unity Tutorial Pt. 3

Next: How to Create an OpenVINO Plugin for Unity on Windows Pt. 2

Project Resources: GitHub Repository