How to Create an OpenVINO Plugin for Unity on Windows Pt. 1

fastai

openvino

unity

tutorial

Modify the training code from the fastai-to-unity tutorial to export the model to OpenVINO.

Author

Christian Mills

Published

July 17, 2022

Introduction
Overview
Install Dependencies
Select a Model
Modify Transforms
Define Learner
Export the Model
Benchmark OpenVINO Inference
Summary

Introduction

This tutorial is a follow-up to the fastai-to-unity tutorial series and covers using OpenVINO, an open-source toolkit for optimizing model inference, instead of Unity’s Barracuda library. OpenVINO enables significantly faster CPU inference than Barracuda and supports more model types. It also supports GPU inference for integrated and discrete Intel GPUs and will be able to leverage the AI hardware acceleration available in Intel’s upcoming ARC GPUs.

We’ll modify the original tutorial code and create a dynamic link library (DLL) file to access the OpenVINO functionality in Unity.

Overview

This post covers the required modifications to the original training code. We’ll finetune models from the Timm library on the same ASL dataset as the original tutorial, just like in this previous follow-up. Below is a link to the complete modified training code, along with links for running the notebook on Google Colab and Kaggle.

GitHub Repository	Colab	Kaggle
Jupyter Notebook	Open in Colab	Open in Kaggle

Install Dependencies

The pip package for the Timm library is generally more stable than the GitHub repository but may have fewer model types and pretrained weights. However, the latest pip version had some issues running the MobileNetV3 models at the time of writing. Downgrade to version 0.5.4 to use those models.

Recent updates to the fastai library resolve some performance issues with PyTorch so let’s update that too.

We need to install the openvino-dev pip package to convert trained models to OpenVINO’s Intermediate Representation (IR) format.

Uncomment the cell below if running on Google Colab or Kaggle

# %%capture
# !pip3 install -U torch torchvision torchaudio
# !pip3 install -U fastai==2.7.6
# !pip3 install -U kaggle==1.5.12
# !pip3 install -U Pillow==9.1.0
# !pip3 install -U timm==0.6.5 # more stable fewer models
# # !pip3 install -U git+https://github.com/rwightman/pytorch-image-models.git # more models less stable
# !pip3 install openvino-dev==2022.1.0

Note for Colab: You must restart the runtime in order to use newly installed version of Pillow.

Import all fastai computer vision functionality

from fastai.vision.all import *

import fastai

fastai.__version__

'2.7.6'

Disable max rows and columns for pandas

import pandas as pd
pd.set_option('max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

Select a Model

Let’s start by selecting a model from the Timm library to finetune. The available pretrained models depend on the version of the Timm library installed.

Import the Timm library

import timm

timm.__version__

'0.6.5'

Check available pretrained model types

We can check which model types have pretrained weights using the timm.list_models() function.

model_types = list(set([model.split('_')[0] for model in timm.list_models(pretrained=True)]))
model_types.sort()
pd.DataFrame(model_types)

	0
0	adv
1	bat
2	beit
3	botnet26t
4	cait
5	coat
6	convit
7	convmixer
8	convnext
9	crossvit
10	cs3darknet
11	cspdarknet53
12	cspresnet50
13	cspresnext50
14	darknet53
15	deit
16	deit3
17	densenet121
18	densenet161
19	densenet169
20	densenet201
21	densenetblur121d
22	dla102
23	dla102x
24	dla102x2
25	dla169
26	dla34
27	dla46
28	dla46x
29	dla60
30	dla60x
31	dm
32	dpn107
33	dpn131
34	dpn68
35	dpn68b
36	dpn92
37	dpn98
38	eca
39	ecaresnet101d
40	ecaresnet269d
41	ecaresnet26t
42	ecaresnet50d
43	ecaresnet50t
44	ecaresnetlight
45	edgenext
46	efficientnet
47	efficientnetv2
48	ens
49	ese
50	fbnetc
51	fbnetv3
52	gc
53	gcresnet33ts
54	gcresnet50t
55	gcresnext26ts
56	gcresnext50ts
57	gernet
58	ghostnet
59	gluon
60	gmixer
61	gmlp
62	halo2botnet50ts
63	halonet26t
64	halonet50ts
65	haloregnetz
66	hardcorenas
67	hrnet
68	ig
69	inception
70	jx
71	lambda
72	lamhalobotnet50ts
73	lcnet
74	legacy
75	levit
76	mixer
77	mixnet
78	mnasnet
79	mobilenetv2
80	mobilenetv3
81	mobilevit
82	mobilevitv2
83	nasnetalarge
84	nf
85	nfnet
86	pit
87	pnasnet5large
88	poolformer
89	regnetv
90	regnetx
91	regnety
92	regnetz
93	repvgg
94	res2net101
95	res2net50
96	res2next50
97	resmlp
98	resnest101e
99	resnest14d
100	resnest200e
101	resnest269e
102	resnest26d
103	resnest50d
104	resnet101
105	resnet101d
106	resnet10t
107	resnet14t
108	resnet152
109	resnet152d
110	resnet18
111	resnet18d
112	resnet200d
113	resnet26
114	resnet26d
115	resnet26t
116	resnet32ts
117	resnet33ts
118	resnet34
119	resnet34d
120	resnet50
121	resnet50d
122	resnet51q
123	resnet61q
124	resnetaa50
125	resnetblur50
126	resnetrs101
127	resnetrs152
128	resnetrs200
129	resnetrs270
130	resnetrs350
131	resnetrs420
132	resnetrs50
133	resnetv2
134	resnext101
135	resnext26ts
136	resnext50
137	resnext50d
138	rexnet
139	sebotnet33ts
140	sehalonet33ts
141	selecsls42b
142	selecsls60
143	selecsls60b
144	semnasnet
145	sequencer2d
146	seresnet152d
147	seresnet33ts
148	seresnet50
149	seresnext101
150	seresnext101d
151	seresnext26d
152	seresnext26t
153	seresnext26ts
154	seresnext50
155	seresnextaa101d
156	skresnet18
157	skresnet34
158	skresnext50
159	spnasnet
160	ssl
161	swin
162	swinv2
163	swsl
164	tf
165	tinynet
166	tnt
167	tresnet
168	tv
169	twins
170	vgg11
171	vgg13
172	vgg16
173	vgg19
174	visformer
175	vit
176	volo
177	wide
178	xception
179	xception41
180	xception41p
181	xception65
182	xception65p
183	xception71
184	xcit

Timm provides many pretrained models, but not all of them are fast enough for real-time applications. We can filter the results by providing a full or partial model name.

Check available pretrained ConvNeXt models

pd.DataFrame(timm.list_models('convnext*', pretrained=True))

	0
0	convnext_base
1	convnext_base_384_in22ft1k
2	convnext_base_in22ft1k
3	convnext_base_in22k
4	convnext_large
5	convnext_large_384_in22ft1k
6	convnext_large_in22ft1k
7	convnext_large_in22k
8	convnext_small
9	convnext_small_384_in22ft1k
10	convnext_small_in22ft1k
11	convnext_small_in22k
12	convnext_tiny
13	convnext_tiny_384_in22ft1k
14	convnext_tiny_hnf
15	convnext_tiny_in22ft1k
16	convnext_tiny_in22k
17	convnext_xlarge_384_in22ft1k
18	convnext_xlarge_in22ft1k
19	convnext_xlarge_in22k

Let’s go with the convnext_tiny model since we want higher framerates. Each model comes with a set of default configuration parameters. We must keep track of the mean and std values used to normalize the model input.

Inspect the default configuration for the convnext_tiny model

from timm.models import convnext
convnext_model = 'convnext_tiny'
pd.DataFrame.from_dict(convnext.default_cfgs[convnext_model], orient='index')

	0
url	https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth
num_classes	1000
input_size	(3, 224, 224)
pool_size	(7, 7)
crop_pct	0.875
interpolation	bicubic
mean	(0.485, 0.456, 0.406)
std	(0.229, 0.224, 0.225)
first_conv	stem.0
classifier	head.fc

Check available pretrained MobileNetV2 models

pd.DataFrame(timm.list_models('mobilenetv2*', pretrained=True))

	0
0	mobilenetv2_050
1	mobilenetv2_100
2	mobilenetv2_110d
3	mobilenetv2_120d
4	mobilenetv2_140

Inspect the default configuration for the mobilenetv2_100 model

from timm.models import efficientnet
mobilenetv2_model = 'mobilenetv2_100'
pd.DataFrame.from_dict(efficientnet.default_cfgs[mobilenetv2_model], orient='index')

	0
url	https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mobilenetv2_100_ra-b33bc2c4.pth
num_classes	1000
input_size	(3, 224, 224)
pool_size	(7, 7)
crop_pct	0.875
interpolation	bicubic
mean	(0.485, 0.456, 0.406)
std	(0.229, 0.224, 0.225)
first_conv	conv_stem
classifier	classifier

Check available pretrained ResNet models

pd.DataFrame(timm.list_models('resnet*', pretrained=True))

	0
0	resnet10t
1	resnet14t
2	resnet18
3	resnet18d
4	resnet26
5	resnet26d
6	resnet26t
7	resnet32ts
8	resnet33ts
9	resnet34
10	resnet34d
11	resnet50
12	resnet50_gn
13	resnet50d
14	resnet51q
15	resnet61q
16	resnet101
17	resnet101d
18	resnet152
19	resnet152d
20	resnet200d
21	resnetaa50
22	resnetblur50
23	resnetrs50
24	resnetrs101
25	resnetrs152
26	resnetrs200
27	resnetrs270
28	resnetrs350
29	resnetrs420
30	resnetv2_50
31	resnetv2_50d_evos
32	resnetv2_50d_gn
33	resnetv2_50x1_bit_distilled
34	resnetv2_50x1_bitm
35	resnetv2_50x1_bitm_in21k
36	resnetv2_50x3_bitm
37	resnetv2_50x3_bitm_in21k
38	resnetv2_101
39	resnetv2_101x1_bitm
40	resnetv2_101x1_bitm_in21k
41	resnetv2_101x3_bitm
42	resnetv2_101x3_bitm_in21k
43	resnetv2_152x2_bit_teacher
44	resnetv2_152x2_bit_teacher_384
45	resnetv2_152x2_bitm
46	resnetv2_152x2_bitm_in21k
47	resnetv2_152x4_bitm
48	resnetv2_152x4_bitm_in21k

Inspect the default configuration for the resnet10t model

from timm.models import resnet
resnet_model = 'resnet10t'
pd.DataFrame.from_dict(resnet.default_cfgs[resnet_model], orient='index')

	0
url	https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet10t_176_c3-f3215ab1.pth
num_classes	1000
input_size	(3, 176, 176)
pool_size	(6, 6)
crop_pct	0.875
interpolation	bilinear
mean	(0.485, 0.456, 0.406)
std	(0.229, 0.224, 0.225)
first_conv	conv1.0
classifier	fc
test_crop_pct	0.95
test_input_size	(3, 224, 224)

Select a model

# model_type = convnext
# model_name = convnext_model
# model_type = efficientnet
# model_name = mobilenetv2_model
model_type = resnet
model_name = resnet_model

Store normalization stats

mean = model_type.default_cfgs[model_name]['mean']
std = model_type.default_cfgs[model_name]['std']
mean, std

((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

Modify Transforms

We can apply the normalization stats at the end of the batch transforms.

item_tfms = [FlipItem(p=1.0), Resize(input_dims, method=ResizeMethod.Pad, pad_mode=PadMode.Border)]

batch_tfms = [
    Contrast(max_lighting=0.25),
    Saturation(max_lighting=0.25),
    Hue(max_hue=0.05),
    *aug_transforms(
        size=input_dims, 
        mult=1.0,
        do_flip=False,
        flip_vert=False,
        max_rotate=0.0,
        min_zoom=0.5,
        max_zoom=1.5,
        max_lighting=0.5,
        max_warp=0.2, 
        p_affine=0.0,
        pad_mode=PadMode.Border),
    Normalize.from_stats(mean=mean, std=std)
]

Define Learner

The training process is identical to the original tutorial, and we only need to pass the name of the Timm model to the vision_learner object.

learn = vision_learner(dls, model_name, metrics=metrics).to_fp16()

Export the Model

The OpenVINO model conversion script does not support PyTorch models, so we need to export the trained model to ONNX. We can then convert the ONNX model to OpenVINO’s IR format.

Define ONNX file name

onnx_file_name = f"{dataset_path.name}-{learn.arch}.onnx"
onnx_file_name

'asl-and-some-words-resnet10t.onnx'

Export trained model to ONNX

torch.onnx.export(learn.model.cpu(),
                  batched_tensor,
                  onnx_file_name,
                  export_params=True,
                  opset_version=11,
                  do_constant_folding=False,
                  input_names = ['input'],
                  output_names = ['output'],
                  dynamic_axes={'input': {2 : 'height', 3 : 'width'}}
                 )

Now we can define the argument for OpenVINO’s model conversion script.

Import OpenVINO Dependencies

from IPython.display import Markdown, display

from openvino.runtime import Core

Define export directory

output_dir = Path('./')
output_dir

Path('.')

Define path for OpenVINO IR xml model file

The conversion script generates an XML containing information about the model architecture and a BIN file that stores the trained weights. We need both files to perform inference. OpenVINO uses the same name for the BIN file as provided for the XML file.

ir_path = Path(f"{onnx_file_name.split('.')[0]}.xml")
ir_path

Path('asl-and-some-words-resnet10t.xml')

Define arguments for model conversion script

OpenVINO provides the option to include the normalization stats in the IR model. That way, we don’t need to account for different normalization stats when performing inference with multiple models. We can also convert the model to FP16 precision to reduce file size and improve inference speed.

# Construct the command for Model Optimizer
mo_command = f"""mo
                 --input_model "{onnx_file_name}"
                 --input_shape "[1,3, {input_dims[0]}, {input_dims[1]}]"
                 --mean_values="{mean}"
                 --scale_values="{std}"
                 --data_type FP16
                 --output_dir "{output_dir}"
                 """
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert the ONNX model to OpenVINO:")
display(Markdown(f"`{mo_command}`"))

Model Optimizer command to convert the ONNX model to OpenVINO:

mo --input_model "asl-and-some-words-resnet10t.onnx" --input_shape "[1,3, 216, 384]" --mean_values="(0.485, 0.456, 0.406)" --scale_values="(0.229, 0.224, 0.225)" --data_type FP16 --output_dir "."

Convert ONNX model to OpenVINO IR

if not ir_path.exists():
    print("Exporting ONNX model to IR... This may take a few minutes.")
    mo_result = %sx $mo_command
    print("\n".join(mo_result))
else:
    print(f"IR model {ir_path} already exists.")

    Exporting ONNX model to IR... This may take a few minutes.
    Model Optimizer arguments:
    Common parameters:
        - Path to the Input Model:  /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.onnx
        - Path for generated IR:    /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/.
        - IR output name:   asl-and-some-words-resnet10t
        - Log level:    ERROR
        - Batch:    Not specified, inherited from the model
        - Input layers:     Not specified, inherited from the model
        - Output layers:    Not specified, inherited from the model
        - Input shapes:     [1,3, 216, 384]
        - Source layout:    Not specified
        - Target layout:    Not specified
        - Layout:   Not specified
        - Mean values:  (0.485, 0.456, 0.406)
        - Scale values:     (0.229, 0.224, 0.225)
        - Scale factor:     Not specified
        - Precision of IR:  FP16
        - Enable fusing:    True
        - User transformations:     Not specified
        - Reverse input channels:   False
        - Enable IR generation for fixed input shape:   False
        - Use the transformations config file:  None
    Advanced parameters:
        - Force the usage of legacy Frontend of Model Optimizer for model conversion into IR:   False
        - Force the usage of new Frontend of Model Optimizer for model conversion into IR:  False
    OpenVINO runtime found in:  /home/innom-dt/mambaforge/envs/fastai-openvino/lib/python3.9/site-packages/openvino
    OpenVINO runtime version:   2022.1.0-7019-cdb9bec7210-releases/2022/1
    Model Optimizer version:    2022.1.0-7019-cdb9bec7210-releases/2022/1
    [ WARNING ]  
    Detected not satisfied dependencies:
        numpy: installed: 1.23.0, required: < 1.20
    
    Please install required versions of components or run pip installation
    pip install openvino-dev
    [ SUCCESS ] Generated IR version 11 model.
    [ SUCCESS ] XML file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.xml
    [ SUCCESS ] BIN file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.bin
    [ SUCCESS ] Total execution time: 0.43 seconds. 
    [ SUCCESS ] Memory consumed: 123 MB. 
    It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2022_bu_IOTG_OpenVINO-2022-1&content=upg_all&medium=organic or on the GitHub*
    [ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
    Find more information about API v2.0 and IR v11 at https://docs.openvino.ai

Benchmark OpenVINO Inference

Now we can compare inference speed between OpenVINO and PyTorch. OpenVINO supports inference with ONNX models in addition to its IR format.

Get available OpenVINO compute devices

OpenVINO does not support GPU inference with non-Intel GPUs.

devices = ie.available_devices
for device in devices:
    device_name = ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

CPU: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

Create normalized input for ONNX model

normalized_input_image = batched_tensor.cpu().detach().numpy()
normalized_input_image.shape

(1, 3, 224, 224)

Test ONNX model using OpenVINO

# Load network to Inference Engine
ie = Core()
model_onnx = ie.read_model(model=onnx_file_name)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")

input_layer_onnx = next(iter(compiled_model_onnx.inputs))
output_layer_onnx = next(iter(compiled_model_onnx.outputs))

# Run inference on the input image
res_onnx = compiled_model_onnx(inputs=[normalized_input_image])[output_layer_onnx]
learn.dls.vocab[np.argmax(res_onnx)]

'J'

Benchmark ONNX model CPU inference speed

%%timeit
compiled_model_onnx(inputs=[normalized_input_image])[output_layer_onnx]

3.62 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Prepare input image for OpenVINO IR model

input_image = scaled_tensor.unsqueeze(dim=0)
input_image.shape

torch.Size([1, 3, 224, 224])

Test OpenVINO IR model

# Load the network in Inference Engine
ie = Core()
model_ir = ie.read_model(model=ir_path)
model_ir.reshape(input_image.shape)
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get input and output layers
input_layer_ir = next(iter(compiled_model_ir.inputs))
output_layer_ir = next(iter(compiled_model_ir.outputs))

# Run inference on the input image
res_ir = compiled_model_ir([input_image])[output_layer_ir]
learn.dls.vocab[np.argmax(res_ir)]

'J'

Benchmark OpenVINO IR model CPU inference speed

%%timeit
compiled_model_ir([input_image])[output_layer_ir]

3.39 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Note: The IR model is slightly faster than the ONNX model and half the file size.

Benchmark PyTorch model GPU inference speed

%%timeit
with torch.no_grad(): preds = learn.model.cuda()(batched_tensor.cuda())

1.81 ms ± 5.52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

PyTorch inference with a Titan RTX is still faster than OpenVINO inference with an i7-11700K for a ResNet10 model. However, OpenVINO CPU inference is often faster when using models optimized for mobile devices, like MobileNet.

Benchmark PyTorch model CPU inference speed

%%timeit
with torch.no_grad(): preds = learn.model.cpu()(batched_tensor.cpu())

8.94 ms ± 52.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

OpenVINO is easily faster than PyTorch for CPU inference.

Summary

This post covered how to modify the training code from the fastai-to-unity tutorialto finetune models from the Timm library and export them as OpenVINO IR models. Part 2 will cover creating a dynamic link library (DLL) file in Visual Studio to perform inference with these models using OpenVINO.

Previous: Fastai to Unity Tutorial Pt. 3

Next: How to Create an OpenVINO Plugin for Unity on Windows Pt. 2

Project Resources: GitHub Repository