How to Create an OpenVINO Plugin for Unity on Windows Pt. 1
- Introduction
- Overview
- Install Dependencies
- Select a Model
- Modify Transforms
- Define Learner
- Export the Model
- Benchmark OpenVINO Inference
- Summary
Introduction
This tutorial is a follow-up to the fastai-to-unity tutorial series and covers using OpenVINO, an open-source toolkit for optimizing model inference, instead of Unity’s Barracuda library. OpenVINO enables significantly faster CPU inference than Barracuda and supports more model types. It also supports GPU inference for integrated and discrete Intel GPUs and will be able to leverage the AI hardware acceleration available in Intel’s upcoming ARC GPUs.
We’ll modify the original tutorial code and create a dynamic link library (DLL) file to access the OpenVINO functionality in Unity.
Overview
This post covers the required modifications to the original training code. We’ll finetune models from the Timm library on the same ASL dataset as the original tutorial, just like in this previous follow-up. Below is a link to the complete modified training code, along with links for running the notebook on Google Colab and Kaggle.
GitHub Repository | Colab | Kaggle |
---|---|---|
Jupyter Notebook | Open in Colab | Open in Kaggle |
Install Dependencies
The pip package for the Timm library is generally more stable than the GitHub repository but may have fewer model types and pretrained weights. However, the latest pip version had some issues running the MobileNetV3 models at the time of writing. Downgrade to version 0.5.4
to use those models.
Recent updates to the fastai library resolve some performance issues with PyTorch so let’s update that too.
We need to install the openvino-dev
pip package to convert trained models to OpenVINO’s Intermediate Representation (IR) format.
Uncomment the cell below if running on Google Colab or Kaggle
# %%capture
# !pip3 install -U torch torchvision torchaudio
# !pip3 install -U fastai==2.7.6
# !pip3 install -U kaggle==1.5.12
# !pip3 install -U Pillow==9.1.0
# !pip3 install -U timm==0.6.5 # more stable fewer models
# # !pip3 install -U git+https://github.com/rwightman/pytorch-image-models.git # more models less stable
# !pip3 install openvino-dev==2022.1.0
Note for Colab: You must restart the runtime in order to use newly installed version of Pillow.
Import all fastai computer vision functionality
from fastai.vision.all import *
import fastai
fastai.__version__
'2.7.6'
Disable max rows and columns for pandas
import pandas as pd
'max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None) pd.set_option(
Select a Model
Let’s start by selecting a model from the Timm library to finetune. The available pretrained models depend on the version of the Timm library installed.
Import the Timm library
import timm
timm.__version__
'0.6.5'
Check available pretrained model types
We can check which model types have pretrained weights using the timm.list_models()
function.
= list(set([model.split('_')[0] for model in timm.list_models(pretrained=True)]))
model_types
model_types.sort() pd.DataFrame(model_types)
0 | |
---|---|
0 | adv |
1 | bat |
2 | beit |
3 | botnet26t |
4 | cait |
5 | coat |
6 | convit |
7 | convmixer |
8 | convnext |
9 | crossvit |
10 | cs3darknet |
11 | cspdarknet53 |
12 | cspresnet50 |
13 | cspresnext50 |
14 | darknet53 |
15 | deit |
16 | deit3 |
17 | densenet121 |
18 | densenet161 |
19 | densenet169 |
20 | densenet201 |
21 | densenetblur121d |
22 | dla102 |
23 | dla102x |
24 | dla102x2 |
25 | dla169 |
26 | dla34 |
27 | dla46 |
28 | dla46x |
29 | dla60 |
30 | dla60x |
31 | dm |
32 | dpn107 |
33 | dpn131 |
34 | dpn68 |
35 | dpn68b |
36 | dpn92 |
37 | dpn98 |
38 | eca |
39 | ecaresnet101d |
40 | ecaresnet269d |
41 | ecaresnet26t |
42 | ecaresnet50d |
43 | ecaresnet50t |
44 | ecaresnetlight |
45 | edgenext |
46 | efficientnet |
47 | efficientnetv2 |
48 | ens |
49 | ese |
50 | fbnetc |
51 | fbnetv3 |
52 | gc |
53 | gcresnet33ts |
54 | gcresnet50t |
55 | gcresnext26ts |
56 | gcresnext50ts |
57 | gernet |
58 | ghostnet |
59 | gluon |
60 | gmixer |
61 | gmlp |
62 | halo2botnet50ts |
63 | halonet26t |
64 | halonet50ts |
65 | haloregnetz |
66 | hardcorenas |
67 | hrnet |
68 | ig |
69 | inception |
70 | jx |
71 | lambda |
72 | lamhalobotnet50ts |
73 | lcnet |
74 | legacy |
75 | levit |
76 | mixer |
77 | mixnet |
78 | mnasnet |
79 | mobilenetv2 |
80 | mobilenetv3 |
81 | mobilevit |
82 | mobilevitv2 |
83 | nasnetalarge |
84 | nf |
85 | nfnet |
86 | pit |
87 | pnasnet5large |
88 | poolformer |
89 | regnetv |
90 | regnetx |
91 | regnety |
92 | regnetz |
93 | repvgg |
94 | res2net101 |
95 | res2net50 |
96 | res2next50 |
97 | resmlp |
98 | resnest101e |
99 | resnest14d |
100 | resnest200e |
101 | resnest269e |
102 | resnest26d |
103 | resnest50d |
104 | resnet101 |
105 | resnet101d |
106 | resnet10t |
107 | resnet14t |
108 | resnet152 |
109 | resnet152d |
110 | resnet18 |
111 | resnet18d |
112 | resnet200d |
113 | resnet26 |
114 | resnet26d |
115 | resnet26t |
116 | resnet32ts |
117 | resnet33ts |
118 | resnet34 |
119 | resnet34d |
120 | resnet50 |
121 | resnet50d |
122 | resnet51q |
123 | resnet61q |
124 | resnetaa50 |
125 | resnetblur50 |
126 | resnetrs101 |
127 | resnetrs152 |
128 | resnetrs200 |
129 | resnetrs270 |
130 | resnetrs350 |
131 | resnetrs420 |
132 | resnetrs50 |
133 | resnetv2 |
134 | resnext101 |
135 | resnext26ts |
136 | resnext50 |
137 | resnext50d |
138 | rexnet |
139 | sebotnet33ts |
140 | sehalonet33ts |
141 | selecsls42b |
142 | selecsls60 |
143 | selecsls60b |
144 | semnasnet |
145 | sequencer2d |
146 | seresnet152d |
147 | seresnet33ts |
148 | seresnet50 |
149 | seresnext101 |
150 | seresnext101d |
151 | seresnext26d |
152 | seresnext26t |
153 | seresnext26ts |
154 | seresnext50 |
155 | seresnextaa101d |
156 | skresnet18 |
157 | skresnet34 |
158 | skresnext50 |
159 | spnasnet |
160 | ssl |
161 | swin |
162 | swinv2 |
163 | swsl |
164 | tf |
165 | tinynet |
166 | tnt |
167 | tresnet |
168 | tv |
169 | twins |
170 | vgg11 |
171 | vgg13 |
172 | vgg16 |
173 | vgg19 |
174 | visformer |
175 | vit |
176 | volo |
177 | wide |
178 | xception |
179 | xception41 |
180 | xception41p |
181 | xception65 |
182 | xception65p |
183 | xception71 |
184 | xcit |
Timm provides many pretrained models, but not all of them are fast enough for real-time applications. We can filter the results by providing a full or partial model name.
Check available pretrained ConvNeXt models
'convnext*', pretrained=True)) pd.DataFrame(timm.list_models(
0 | |
---|---|
0 | convnext_base |
1 | convnext_base_384_in22ft1k |
2 | convnext_base_in22ft1k |
3 | convnext_base_in22k |
4 | convnext_large |
5 | convnext_large_384_in22ft1k |
6 | convnext_large_in22ft1k |
7 | convnext_large_in22k |
8 | convnext_small |
9 | convnext_small_384_in22ft1k |
10 | convnext_small_in22ft1k |
11 | convnext_small_in22k |
12 | convnext_tiny |
13 | convnext_tiny_384_in22ft1k |
14 | convnext_tiny_hnf |
15 | convnext_tiny_in22ft1k |
16 | convnext_tiny_in22k |
17 | convnext_xlarge_384_in22ft1k |
18 | convnext_xlarge_in22ft1k |
19 | convnext_xlarge_in22k |
Let’s go with the convnext_tiny
model since we want higher framerates. Each model comes with a set of default configuration parameters. We must keep track of the mean and std values used to normalize the model input.
Inspect the default configuration for the convnext_tiny
model
from timm.models import convnext
= 'convnext_tiny'
convnext_model ='index') pd.DataFrame.from_dict(convnext.default_cfgs[convnext_model], orient
0 | |
---|---|
url | https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth |
num_classes | 1000 |
input_size | (3, 224, 224) |
pool_size | (7, 7) |
crop_pct | 0.875 |
interpolation | bicubic |
mean | (0.485, 0.456, 0.406) |
std | (0.229, 0.224, 0.225) |
first_conv | stem.0 |
classifier | head.fc |
Check available pretrained MobileNetV2 models
'mobilenetv2*', pretrained=True)) pd.DataFrame(timm.list_models(
0 | |
---|---|
0 | mobilenetv2_050 |
1 | mobilenetv2_100 |
2 | mobilenetv2_110d |
3 | mobilenetv2_120d |
4 | mobilenetv2_140 |
Inspect the default configuration for the mobilenetv2_100
model
from timm.models import efficientnet
= 'mobilenetv2_100'
mobilenetv2_model ='index') pd.DataFrame.from_dict(efficientnet.default_cfgs[mobilenetv2_model], orient
0 | |
---|---|
url | https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/mobilenetv2_100_ra-b33bc2c4.pth |
num_classes | 1000 |
input_size | (3, 224, 224) |
pool_size | (7, 7) |
crop_pct | 0.875 |
interpolation | bicubic |
mean | (0.485, 0.456, 0.406) |
std | (0.229, 0.224, 0.225) |
first_conv | conv_stem |
classifier | classifier |
Check available pretrained ResNet models
'resnet*', pretrained=True)) pd.DataFrame(timm.list_models(
0 | |
---|---|
0 | resnet10t |
1 | resnet14t |
2 | resnet18 |
3 | resnet18d |
4 | resnet26 |
5 | resnet26d |
6 | resnet26t |
7 | resnet32ts |
8 | resnet33ts |
9 | resnet34 |
10 | resnet34d |
11 | resnet50 |
12 | resnet50_gn |
13 | resnet50d |
14 | resnet51q |
15 | resnet61q |
16 | resnet101 |
17 | resnet101d |
18 | resnet152 |
19 | resnet152d |
20 | resnet200d |
21 | resnetaa50 |
22 | resnetblur50 |
23 | resnetrs50 |
24 | resnetrs101 |
25 | resnetrs152 |
26 | resnetrs200 |
27 | resnetrs270 |
28 | resnetrs350 |
29 | resnetrs420 |
30 | resnetv2_50 |
31 | resnetv2_50d_evos |
32 | resnetv2_50d_gn |
33 | resnetv2_50x1_bit_distilled |
34 | resnetv2_50x1_bitm |
35 | resnetv2_50x1_bitm_in21k |
36 | resnetv2_50x3_bitm |
37 | resnetv2_50x3_bitm_in21k |
38 | resnetv2_101 |
39 | resnetv2_101x1_bitm |
40 | resnetv2_101x1_bitm_in21k |
41 | resnetv2_101x3_bitm |
42 | resnetv2_101x3_bitm_in21k |
43 | resnetv2_152x2_bit_teacher |
44 | resnetv2_152x2_bit_teacher_384 |
45 | resnetv2_152x2_bitm |
46 | resnetv2_152x2_bitm_in21k |
47 | resnetv2_152x4_bitm |
48 | resnetv2_152x4_bitm_in21k |
Inspect the default configuration for the resnet10t
model
from timm.models import resnet
= 'resnet10t'
resnet_model ='index') pd.DataFrame.from_dict(resnet.default_cfgs[resnet_model], orient
0 | |
---|---|
url | https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet10t_176_c3-f3215ab1.pth |
num_classes | 1000 |
input_size | (3, 176, 176) |
pool_size | (6, 6) |
crop_pct | 0.875 |
interpolation | bilinear |
mean | (0.485, 0.456, 0.406) |
std | (0.229, 0.224, 0.225) |
first_conv | conv1.0 |
classifier | fc |
test_crop_pct | 0.95 |
test_input_size | (3, 224, 224) |
Select a model
# model_type = convnext
# model_name = convnext_model
# model_type = efficientnet
# model_name = mobilenetv2_model
= resnet
model_type = resnet_model model_name
Store normalization stats
= model_type.default_cfgs[model_name]['mean']
mean = model_type.default_cfgs[model_name]['std']
std mean, std
((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
Modify Transforms
We can apply the normalization stats at the end of the batch transforms.
= [FlipItem(p=1.0), Resize(input_dims, method=ResizeMethod.Pad, pad_mode=PadMode.Border)]
item_tfms
= [
batch_tfms =0.25),
Contrast(max_lighting=0.25),
Saturation(max_lighting=0.05),
Hue(max_hue*aug_transforms(
=input_dims,
size=1.0,
mult=False,
do_flip=False,
flip_vert=0.0,
max_rotate=0.5,
min_zoom=1.5,
max_zoom=0.5,
max_lighting=0.2,
max_warp=0.0,
p_affine=PadMode.Border),
pad_mode=mean, std=std)
Normalize.from_stats(mean ]
Define Learner
The training process is identical to the original tutorial, and we only need to pass the name of the Timm model to the vision_learner
object.
= vision_learner(dls, model_name, metrics=metrics).to_fp16() learn
Export the Model
The OpenVINO model conversion script does not support PyTorch models, so we need to export the trained model to ONNX. We can then convert the ONNX model to OpenVINO’s IR format.
Define ONNX file name
= f"{dataset_path.name}-{learn.arch}.onnx"
onnx_file_name onnx_file_name
'asl-and-some-words-resnet10t.onnx'
Export trained model to ONNX
torch.onnx.export(learn.model.cpu(),
batched_tensor,
onnx_file_name,=True,
export_params=11,
opset_version=False,
do_constant_folding= ['input'],
input_names = ['output'],
output_names ={'input': {2 : 'height', 3 : 'width'}}
dynamic_axes )
Now we can define the argument for OpenVINO’s model conversion script.
Import OpenVINO Dependencies
from IPython.display import Markdown, display
from openvino.runtime import Core
Define export directory
= Path('./')
output_dir output_dir
Path('.')
Define path for OpenVINO IR xml model file
The conversion script generates an XML containing information about the model architecture and a BIN file that stores the trained weights. We need both files to perform inference. OpenVINO uses the same name for the BIN file as provided for the XML file.
= Path(f"{onnx_file_name.split('.')[0]}.xml")
ir_path ir_path
Path('asl-and-some-words-resnet10t.xml')
Define arguments for model conversion script
OpenVINO provides the option to include the normalization stats in the IR model. That way, we don’t need to account for different normalization stats when performing inference with multiple models. We can also convert the model to FP16 precision to reduce file size and improve inference speed.
# Construct the command for Model Optimizer
= f"""mo
mo_command --input_model "{onnx_file_name}"
--input_shape "[1,3, {input_dims[0]}, {input_dims[1]}]"
--mean_values="{mean}"
--scale_values="{std}"
--data_type FP16
--output_dir "{output_dir}"
"""
= " ".join(mo_command.split())
mo_command print("Model Optimizer command to convert the ONNX model to OpenVINO:")
f"`{mo_command}`")) display(Markdown(
Model Optimizer command to convert the ONNX model to OpenVINO:
mo --input_model "asl-and-some-words-resnet10t.onnx" --input_shape "[1,3, 216, 384]" --mean_values="(0.485, 0.456, 0.406)" --scale_values="(0.229, 0.224, 0.225)" --data_type FP16 --output_dir "."
Convert ONNX model to OpenVINO IR
if not ir_path.exists():
print("Exporting ONNX model to IR... This may take a few minutes.")
= %sx $mo_command
mo_result print("\n".join(mo_result))
else:
print(f"IR model {ir_path} already exists.")
Exporting ONNX model to IR... This may take a few minutes.
Model Optimizer arguments:
Common parameters:
- Path to the Input Model: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.onnx
- Path for generated IR: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/.
- IR output name: asl-and-some-words-resnet10t
- Log level: ERROR
- Batch: Not specified, inherited from the model
- Input layers: Not specified, inherited from the model
- Output layers: Not specified, inherited from the model
- Input shapes: [1,3, 216, 384]
- Source layout: Not specified
- Target layout: Not specified
- Layout: Not specified
- Mean values: (0.485, 0.456, 0.406)
- Scale values: (0.229, 0.224, 0.225)
- Scale factor: Not specified
- Precision of IR: FP16
- Enable fusing: True
- User transformations: Not specified
- Reverse input channels: False
- Enable IR generation for fixed input shape: False
- Use the transformations config file: None
Advanced parameters:
- Force the usage of legacy Frontend of Model Optimizer for model conversion into IR: False
- Force the usage of new Frontend of Model Optimizer for model conversion into IR: False
OpenVINO runtime found in: /home/innom-dt/mambaforge/envs/fastai-openvino/lib/python3.9/site-packages/openvino
OpenVINO runtime version: 2022.1.0-7019-cdb9bec7210-releases/2022/1
Model Optimizer version: 2022.1.0-7019-cdb9bec7210-releases/2022/1
[ WARNING ]
Detected not satisfied dependencies:
numpy: installed: 1.23.0, required: < 1.20
Please install required versions of components or run pip installation
pip install openvino-dev
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.xml
[ SUCCESS ] BIN file: /media/innom-dt/Samsung_T3/My_Environments/jupyter-notebooks/openvino/asl-and-some-words-resnet10t.bin
[ SUCCESS ] Total execution time: 0.43 seconds.
[ SUCCESS ] Memory consumed: 123 MB.
It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2022_bu_IOTG_OpenVINO-2022-1&content=upg_all&medium=organic or on the GitHub*
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai
Benchmark OpenVINO Inference
Now we can compare inference speed between OpenVINO and PyTorch. OpenVINO supports inference with ONNX models in addition to its IR format.
Get available OpenVINO compute devices
OpenVINO does not support GPU inference with non-Intel GPUs.
= ie.available_devices
devices for device in devices:
= ie.get_property(device_name=device, name="FULL_DEVICE_NAME")
device_name print(f"{device}: {device_name}")
CPU: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
Create normalized input for ONNX model
= batched_tensor.cpu().detach().numpy()
normalized_input_image normalized_input_image.shape
(1, 3, 224, 224)
Test ONNX model using OpenVINO
# Load network to Inference Engine
= Core()
ie = ie.read_model(model=onnx_file_name)
model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")
compiled_model_onnx
= next(iter(compiled_model_onnx.inputs))
input_layer_onnx = next(iter(compiled_model_onnx.outputs))
output_layer_onnx
# Run inference on the input image
= compiled_model_onnx(inputs=[normalized_input_image])[output_layer_onnx]
res_onnx learn.dls.vocab[np.argmax(res_onnx)]
'J'
Benchmark ONNX model CPU inference speed
%%timeit
=[normalized_input_image])[output_layer_onnx] compiled_model_onnx(inputs
3.62 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Prepare input image for OpenVINO IR model
= scaled_tensor.unsqueeze(dim=0)
input_image input_image.shape
torch.Size([1, 3, 224, 224])
Test OpenVINO IR model
# Load the network in Inference Engine
= Core()
ie = ie.read_model(model=ir_path)
model_ir
model_ir.reshape(input_image.shape)= ie.compile_model(model=model_ir, device_name="CPU")
compiled_model_ir
# Get input and output layers
= next(iter(compiled_model_ir.inputs))
input_layer_ir = next(iter(compiled_model_ir.outputs))
output_layer_ir
# Run inference on the input image
= compiled_model_ir([input_image])[output_layer_ir]
res_ir learn.dls.vocab[np.argmax(res_ir)]
'J'
Benchmark OpenVINO IR model CPU inference speed
%%timeit
compiled_model_ir([input_image])[output_layer_ir]
3.39 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Note: The IR model is slightly faster than the ONNX model and half the file size.
Benchmark PyTorch model GPU inference speed
%%timeit
with torch.no_grad(): preds = learn.model.cuda()(batched_tensor.cuda())
1.81 ms ± 5.52 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
PyTorch inference with a Titan RTX is still faster than OpenVINO inference with an i7-11700K for a ResNet10 model. However, OpenVINO CPU inference is often faster when using models optimized for mobile devices, like MobileNet.
Benchmark PyTorch model CPU inference speed
%%timeit
with torch.no_grad(): preds = learn.model.cpu()(batched_tensor.cpu())
8.94 ms ± 52.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
OpenVINO is easily faster than PyTorch for CPU inference.
Summary
This post covered how to modify the training code from the fastai-to-unity tutorialto finetune models from the Timm library and export them as OpenVINO IR models. Part 2 will cover creating a dynamic link library (DLL) file in Visual Studio to perform inference with these models using OpenVINO.
Previous: Fastai to Unity Tutorial Pt. 3
Next: How to Create an OpenVINO Plugin for Unity on Windows Pt. 2
Project Resources: GitHub Repository
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.