Fastai to Unity Beginner Tutorial Pt. 1
- Introduction
- Overview
- Install Dependencies
- Configure Kaggle API
- Download Dataset
- Inspect Dataset
- Define Dataloaders
- Define Learner
- Inspect Trained Model
- Implement Processing Steps
- Export the Model
- Summary
Introduction
In this tutorial series, we will walk through training an image classifier using the fastai library and implementing it in a Unity game engine project using the Barracuda inference library. Check out this post for more information about Barracuda. We will then build the Unity project to run in a web browser and host it using GitHub Pages.
The tutorial uses this American Sign Language (ASL) dataset from Kaggle but feel free to follow along with a different dataset. The dataset contains sample images for digits 1-9, letters A-Z, and some common words. One could use a model trained on this dataset to map hand gestures to user input or make an ASL education game.
In-Browser Demo: ASL Classifier
Overview
Part 1 covers how to finetune a ResNet model for image classification using the fastai library and export it to ONNX format. The training code is available in the Jupyter notebook linked below, and links for running the notebook on Google Colab and Kaggle are below as well.
Jupyter Notebook | Colab | Kaggle |
---|---|---|
GitHub Repository | Open In Colab | Open in Kaggle |
Install Dependencies
The training code requires PyTorch for the fastai library, the fastai library itself for training, and the Kaggle API Python package for downloading the dataset. Google Colab uses an older version of Pillow, so update that package when training there.
Uncomment the cell below if running on Google Colab or Kaggle
# %%capture
# !pip3 install -U torch torchvision torchaudio
# !pip3 install -U fastai
# !pip3 install -U kaggle
# !pip3 install -U Pillow
Note for Colab: You must restart the runtime in order to use newly installed version of Pillow.
Import all fastai computer vision functionality
from fastai.vision.all import *
Configure Kaggle API
The Kaggle API tool requires an API Key for a Kaggle account. Sign in or create a Kaggle account using the link below, then click the Create New API Token button.
- Kaggle Account Settings: https://www.kaggle.com/me/account
Kaggle will generate and download a kaggle.json
file containing your username and new API token. Paste the values for each in the code cell below.
Enter Kaggle username and API token
= '{"username":"","key":""}' creds
Save Kaggle credentials if none are present * Source: https://github.com/fastai/fastbook/blob/master/09_tabular.ipynb
= Path('~/.kaggle/kaggle.json').expanduser()
cred_path # Save API key to a json file if it does not already exist
if not cred_path.exists():
=True)
cred_path.parent.mkdir(exist_ok
cred_path.write_text(creds)0o600) cred_path.chmod(
Import Kaggle API
from kaggle import api
(Optional) Define method to display default function arguments
The code cell below defines a method to display the default arguments for a specified function. It’s not required, but I find it convenient for creating quick references in notebooks.
import inspect
import pandas as pd
'max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option(
def inspect_default_args(target, annotations: bool=False):
# Get the argument names
= inspect.getfullargspec(target).args
args # Get the default values
= inspect.getfullargspec(target).defaults
defaults
= ["Default Value"]
index
# Pad defaults
= [None]*(len(args)-len(defaults)) + list(defaults)
defaults if annotations:
"Annotation")
index.append(= inspect.getfullargspec(target).annotations.values()
annotations # Pad annotations
= [None]*(len(args)-len(annotations)) + list(annotations)
annotations = {arg:[df, annot] for arg,df,annot in zip(args, defaults, annotations)}
default_args else:
= {arg:[default] for arg,default in zip(args, defaults)}
default_args
return pd.DataFrame(default_args, index=index).T
Download Dataset
Now that we have our Kaggle credentials set, we need to define the dataset and where to store it.
Define path to dataset
We’ll use the default archive and data folders for the fastai library to store the compressed and uncompressed datasets.
= 'belalelwikel/asl-and-some-words'
kaggle_dataset = URLs.path()
archive_dir = archive_dir/'../data'
dataset_dir = 'asl-and-some-words'
dataset_name = Path(f'{archive_dir}/{dataset_name}.zip')
archive_path = Path(f'{dataset_dir}/{dataset_name}') dataset_path
Define method to extract the dataset from an archive file
def file_extract(fname, dest=None):
"Extract `fname` to `dest` using `tarfile` or `zipfile`."
if dest is None: dest = Path(fname).parent
= str(fname)
fname if fname.endswith('gz'): tarfile.open(fname, 'r:gz').extractall(dest)
elif fname.endswith('zip'): zipfile.ZipFile(fname ).extractall(dest)
else: raise Exception(f'Unrecognized archive: {fname}')
Download the dataset if it is not present
The archive file is over 2GB, so we don’t want to download it more than necessary.
if not archive_path.exists():
=archive_dir)
api.dataset_download_cli(kaggle_dataset, path=archive_path, dest=dataset_path) file_extract(fname
Inspect Dataset
We can start inspecting the dataset once it finishes downloading.
Inspect the dataset path
The training data is in a subfolder named ASL, and there are over 200,000 samples.
dataset_path.ls()
(#1) [Path('/home/innom-dt/.fastai/archive/../data/asl-and-some-words/ASL')]
Get image file paths
= get_image_files(dataset_path/"ASL")
files len(files)
203000
Inspect files
The dataset indicates the object class in both the folder and file names.
0], files[-1] files[
(Path('/home/innom-dt/.fastai/archive/../data/asl-and-some-words/ASL/J/J1491.jpg'),
Path('/home/innom-dt/.fastai/archive/../data/asl-and-some-words/ASL/E/E1063.jpg'))
Inspect class folder names
There are 51 class folders, and the dataset does not predefine a training-validation split.
= [path.name for path in Path(dataset_path/'ASL').ls()]
folder_names
folder_names.sort()print(f"Num classes: {len(folder_names)}")
pd.DataFrame(folder_names)
Num classes: 51
0 | |
---|---|
0 | 1 |
1 | 3 |
2 | 4 |
3 | 5 |
4 | 7 |
5 | 8 |
6 | 9 |
7 | A |
8 | B |
9 | Baby |
10 | Brother |
11 | C |
12 | D |
13 | Dont_like |
14 | E |
15 | F |
16 | Friend |
17 | G |
18 | H |
19 | Help |
20 | House |
21 | I |
22 | J |
23 | K |
24 | L |
25 | Like |
26 | Love |
27 | M |
28 | Make |
29 | More |
30 | N |
31 | Name |
32 | No |
33 | O_OR_0 |
34 | P |
35 | Pay |
36 | Play |
37 | Q |
38 | R |
39 | S |
40 | Stop |
41 | T |
42 | U |
43 | V_OR_2 |
44 | W_OR_6 |
45 | With |
46 | X |
47 | Y |
48 | Yes |
49 | Z |
50 | nothing |
Inspect one of the training images
The sample images all have a resolution of 200x200.
import PIL
= PIL.Image.open(files[0])
img print(f"Image Dims: {img.shape}")
img
Image Dims: (200, 200)
Define Dataloaders
Next, we need to define the Transforms for the DataLoaders object.
Define target input dimensions
The Unity project will take input from a webcam, and most webcams don’t have a square aspect ratio like the training samples. We will need to account for this to get more accurate predictions.
We can train with a square aspect ratio and crop the webcam input in Unity, but that might make users feel cramped when using the application.
Alternatively, we can expand the training images to a more typical aspect ratio like 4:3 or 16:9. This approach will allow us to use the entire webcam input, so we’ll go with this one.
I have a separate tutorial for cropping images on the GPU in Unity for anyone that wants to try the other approach.
Below are some sample input dimensions in different aspect ratios.
# size_1_1 = (224, 224)
# size_3_2 = (224, 336)
# size_4_3 = (216, 288)
= (216, 384)
size_16_9 # size_16_9_l = (288, 512)
Define Transforms
Something else to consider is that the webcam input in Unity mirrors the actual image. Mirrored input would likely not be an issue for something like a pet classifier, but hand orientation matters for ASL. We either need to flip the input image each time in Unity, or we can train the model with pre-flipped images. It is easier to mirror the training images, so we’ll use the FlipItem transform with a probability of 1.0 to flip every training sample.
I have a separate tutorial covering how to flip images on the GPU in Unity for anyone that wants to try that approach.
Since we are resizing to a different aspect ratio, we need to choose a padding method. The default reflection padding might add more fingers, changing an image’s meaning. The zeros padding option might work, but most user backgrounds will not be pure black. Therefore, we’ll go with border padding.
We can add some batch transforms like tweaking the contrast, saturation, hue, zoom, brightness, and warping to help crappify the images. However, we need to disable the do_flip
and max_rotate
options in aug_transforms
.
inspect_default_args(aug_transforms)
Default Value | |
---|---|
mult | 1.0 |
do_flip | True |
flip_vert | False |
max_rotate | 10.0 |
min_zoom | 1.0 |
max_zoom | 1.1 |
max_lighting | 0.2 |
max_warp | 0.2 |
p_affine | 0.75 |
p_lighting | 0.75 |
xtra_tfms | None |
size | None |
mode | bilinear |
pad_mode | reflection |
align_corners | True |
batch | False |
min_scale | 1.0 |
= [FlipItem(p=1.0), Resize(size_16_9, method=ResizeMethod.Pad, pad_mode=PadMode.Border)]
item_tfms
= [
batch_tfms =0.25),
Contrast(max_lighting=0.25),
Saturation(max_lighting=0.05),
Hue(max_hue*aug_transforms(
=size_16_9,
size=1.0,
mult=False,
do_flip=False,
flip_vert=0.0,
max_rotate=0.5,
min_zoom=1.5,
max_zoom=0.5,
max_lighting=0.2,
max_warp=0.0,
p_affine=PadMode.Border)
pad_mode ]
Define batch size
= 128 bs
Define DataLoaders object
We can use the from_folder method to instantiate the DataLoaders object.
inspect_default_args(ImageDataLoaders.from_folder)
Default Value | |
---|---|
cls | None |
path | None |
train | train |
valid | valid |
valid_pct | None |
seed | None |
vocab | None |
item_tfms | None |
batch_tfms | None |
bs | 64 |
val_bs | None |
shuffle | True |
device | None |
= ImageDataLoaders.from_folder(
dls =dataset_path/'ASL',
path=0.2,
valid_pct=bs,
bs=item_tfms,
item_tfms=batch_tfms
batch_tfms )
Verify DataLoaders object
Let’s verify the DataLoaders object works as expected before training a model.
dls.train.show_batch()
We can see that the DataLoaders object applies the transforms to the training split, including mirroring the image. However, it does not appear to mirror images from the validation split.
dls.valid.show_batch()
We can get around this by using a solution provided on the fastai forums to apply the training split transforms to the validation split. It is not strictly necessary to mirror the validation split, but the accuracy metrics would be confusing during training without it.
Apply training split transforms to validation split
with dls.valid.dataset.set_split_idx(0): dls[1].show_batch()
Define Learner
Now we need to define the Learner object for training the model.
Inspect Learner parameters
inspect_default_args(vision_learner)
Default Value | |
---|---|
dls | None |
arch | None |
normalize | True |
n_out | None |
pretrained | True |
loss_func | None |
opt_func | <function Adam at 0x7fa5e274a560> |
lr | 0.001 |
splitter | None |
cbs | None |
metrics | None |
path | None |
model_dir | models |
wd | None |
wd_bn_bias | False |
train_bn | True |
moms | (0.95, 0.85, 0.95) |
cut | None |
n_in | 3 |
init | <function kaiming_normal_ at 0x7fa60b397be0> |
custom_head | None |
concat_pool | True |
lin_ftrs | None |
ps | 0.5 |
pool | True |
first_bn | True |
bn_final | False |
lin_first | False |
y_range | None |
Define model
I recommend sticking with a ResNet18 or ResNet34 model, as the larger models can significantly lower frame rates.
= resnet18 model
Define metrics
= [error_rate, accuracy] metrics
Define Learner object
= vision_learner(dls, model, metrics=metrics).to_fp16() learn
Find learning rate
inspect_default_args(learn.lr_find)
Default Value | |
---|---|
self | None |
start_lr | 0.0 |
end_lr | 10 |
num_it | 100 |
stop_div | True |
show_plot | True |
suggest_funcs | <function valley at 0x7fa5e24996c0> |
Define suggestion methods
= [valley, minimum, steep] suggest_funcs
with dls.valid.dataset.set_split_idx(0): learn.lr_find(suggest_funcs=suggest_funcs)
Define learning rate
= 2e-3
lr lr
0.002
Define number of epochs
= 3 epochs
Fine tune model
After picking a learning rate, we can train the model for a few epochs. Training can take a while on Google Colab and Kaggle.
inspect_default_args(learn.fine_tune)
Default Value | |
---|---|
self | None |
epochs | None |
base_lr | 0.002 |
freeze_epochs | 1 |
lr_mult | 100 |
pct_start | 0.3 |
div | 5.0 |
lr_max | None |
div_final | 100000.0 |
wd | None |
moms | None |
cbs | None |
reset_opt | False |
with dls.valid.dataset.set_split_idx(0): learn.fine_tune(epochs, base_lr=lr)
epoch | train_loss | valid_loss | error_rate | accuracy | time |
---|---|---|---|---|---|
0 | 0.365705 | 0.175888 | 0.056305 | 0.943695 | 04:52 |
epoch | train_loss | valid_loss | error_rate | accuracy | time |
---|---|---|---|---|---|
0 | 0.038334 | 0.021014 | 0.008103 | 0.991897 | 04:56 |
1 | 0.012614 | 0.011383 | 0.004236 | 0.995764 | 04:59 |
2 | 0.006508 | 0.006591 | 0.003325 | 0.996675 | 04:55 |
Inspect Trained Model
Once the model finishes training, we can test it on a sample image and see where it struggles.
Select a test image
import PIL
= files[0]
test_file test_file.name
'J1491.jpg'
= PIL.Image.open(test_file)
test_img test_img
Make a prediction on a single image using a fastai.vision.core.PILImage
Remember that we need to flip the test image before feeding it to the model.
learn.predict(PILImage(test_img.transpose(Image.Transpose.FLIP_LEFT_RIGHT)))
('J',
TensorBase(22),
TensorBase([9.6170e-14, 7.7060e-13, 2.5787e-13, 1.1222e-13, 1.5709e-10, 3.6805e-11,
1.7642e-11, 2.3571e-13, 3.5861e-15, 9.8273e-13, 4.1524e-14, 1.3218e-12,
7.3592e-14, 3.8404e-14, 4.9230e-12, 8.4399e-12, 2.0167e-11, 3.2757e-13,
4.0114e-10, 2.3624e-11, 8.3717e-14, 1.9143e-07, 1.0000e+00, 9.7685e-14,
9.4480e-15, 3.3952e-15, 9.4246e-12, 2.3079e-12, 1.6612e-15, 6.6745e-14,
3.9778e-14, 2.2675e-11, 1.7859e-14, 1.7659e-11, 5.1701e-11, 8.4209e-14,
4.6891e-11, 1.3487e-11, 1.0827e-11, 1.0881e-10, 2.6260e-09, 4.2682e-13,
3.1842e-13, 7.4326e-13, 4.8983e-13, 2.0801e-13, 9.1052e-14, 1.0467e-08,
2.3752e-14, 1.0124e-09, 6.7431e-11]))
Make predictions for a group of images
with dls.valid.dataset.set_split_idx(0): learn.show_results()
Define an Interpretation object
with dls.valid.dataset.set_split_idx(0): interp = Interpretation.from_learner(learn)
Plot top losses
with dls.valid.dataset.set_split_idx(0): interp.plot_top_losses(k=9, figsize=(15,10))
Implement Processing Steps
When we are satisfied with the model, we can start preparing for implementing it in Unity. We will need to apply the same preprocessing and post-processing in Unity that fastai applies automatically. We will verify we understand the processing steps by implementing them in Python first.
Inspect the after_item
pipeline
We don’t need to worry about flipping or padding the image in Unity with the current training approach.
learn.dls.after_item
Pipeline: FlipItem -- {'p': 1.0} -> Resize -- {'size': (384, 216), 'method': 'pad', 'pad_mode': 'border', 'resamples': (<Resampling.BILINEAR: 2>, 0), 'p': 1.0} -> ToTensor
Inspect the after_batch
pipeline
The after_batch
pipeline first scales the image color channel values from \([0,255]\) to \([0,1]\). Unity already uses the range \([0,1]\), so we don’t need to implement this step. We also don’t need to implement any of the image augmentations. However, we do need to normalize the image using the ImageNet stats.
learn.dls.after_batch
Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Warp -- {'magnitude': 0.2, 'p': 1.0, 'draw_x': None, 'draw_y': None, 'size': (216, 384), 'mode': 'bilinear', 'pad_mode': 'border', 'batch': False, 'align_corners': True, 'mode_mask': 'nearest'} -> Contrast -- {'max_lighting': 0.25, 'p': 1.0, 'draw': None, 'batch': False} -> Saturation -- {'max_lighting': 0.25, 'p': 1.0, 'draw': None, 'batch': False} -> Hue -- {'p': 1.0} -> Brightness -- {'max_lighting': 0.5, 'p': 1.0, 'draw': None, 'batch': False} -> Normalize -- {'mean': tensor([[[[0.4850]],
[[0.4560]],
[[0.4060]]]], device='cuda:0'), 'std': tensor([[[[0.2290]],
[[0.2240]],
[[0.2250]]]], device='cuda:0'), 'axes': (0, 2, 3)}
Reset test image
= PIL.Image.open(test_file)
test_img test_img
= test_img.transpose(Image.Transpose.FLIP_LEFT_RIGHT)
test_img test_img
test_img.size
(200, 200)
min(test_img.size)
200
= test_img.size.index(min(test_img.size))
min_dim = 1 - min_dim max_dim
= 224 target_dim
Set input dims
= [0,0]
inp_dims = target_dim
inp_dims[min_dim] = int(test_img.size[max_dim] / (test_img.size[min_dim]/target_dim))
inp_dims[max_dim] inp_dims
[224, 224]
= test_img.resize(inp_dims)
resized_img resized_img
Convert image to tensor
= tensor(resized_img).permute(2, 0, 1)
img_tensor img_tensor.shape, img_tensor
(torch.Size([3, 224, 224]),
tensor([[[ 0, 0, 0, ..., 1, 0, 0],
[ 0, 4, 2, ..., 9, 2, 0],
[ 5, 82, 99, ..., 74, 8, 0],
...,
[ 3, 127, 154, ..., 141, 0, 3],
[ 3, 102, 125, ..., 120, 0, 0],
[ 0, 0, 4, ..., 0, 1, 0]],
[[ 4, 1, 2, ..., 0, 2, 5],
[ 2, 1, 0, ..., 0, 0, 5],
[ 0, 75, 91, ..., 63, 1, 1],
...,
[ 3, 126, 150, ..., 151, 0, 0],
[ 7, 105, 122, ..., 127, 1, 0],
[ 8, 5, 3, ..., 4, 6, 2]],
[[253, 254, 255, ..., 253, 255, 254],
[244, 220, 199, ..., 209, 237, 255],
[212, 222, 180, ..., 188, 211, 251],
...,
[196, 225, 171, ..., 238, 204, 255],
[207, 247, 222, ..., 242, 218, 255],
[223, 203, 193, ..., 219, 247, 254]]], dtype=torch.uint8))
Scale tensor values
= img_tensor.float().div_(255) scaled_tensor
Prepare imagenet mean values
= tensor(imagenet_stats[0]).view(1,1,-1).permute(2, 0, 1)
mean_tensor mean_tensor.shape, mean_tensor
(torch.Size([3, 1, 1]),
tensor([[[0.4850]],
[[0.4560]],
[[0.4060]]]))
Prepare imagenet std values
= tensor(imagenet_stats[1]).view(1,1,-1).permute(2, 0, 1)
std_tensor std_tensor.shape, std_tensor
(torch.Size([3, 1, 1]),
tensor([[[0.2290]],
[[0.2240]],
[[0.2250]]]))
Normalize and batch image tensor
= (scaled_tensor - mean_tensor) / std_tensor
normalized_tensor = normalized_tensor.unsqueeze(dim=0)
batched_tensor batched_tensor.shape, batched_tensor
(torch.Size([1, 3, 224, 224]),
tensor([[[[-2.1179, -2.1179, -2.1179, ..., -2.1008, -2.1179, -2.1179],
[-2.1179, -2.0494, -2.0837, ..., -1.9638, -2.0837, -2.1179],
[-2.0323, -0.7137, -0.4226, ..., -0.8507, -1.9809, -2.1179],
...,
[-2.0665, 0.0569, 0.5193, ..., 0.2967, -2.1179, -2.0665],
[-2.0665, -0.3712, 0.0227, ..., -0.0629, -2.1179, -2.1179],
[-2.1179, -2.1179, -2.0494, ..., -2.1179, -2.1008, -2.1179]],
[[-1.9657, -2.0182, -2.0007, ..., -2.0357, -2.0007, -1.9482],
[-2.0007, -2.0182, -2.0357, ..., -2.0357, -2.0357, -1.9482],
[-2.0357, -0.7227, -0.4426, ..., -0.9328, -2.0182, -2.0182],
...,
[-1.9832, 0.1702, 0.5903, ..., 0.6078, -2.0357, -2.0357],
[-1.9132, -0.1975, 0.1001, ..., 0.1877, -2.0182, -2.0357],
[-1.8957, -1.9482, -1.9832, ..., -1.9657, -1.9307, -2.0007]],
[[ 2.6051, 2.6226, 2.6400, ..., 2.6051, 2.6400, 2.6226],
[ 2.4483, 2.0300, 1.6640, ..., 1.8383, 2.3263, 2.6400],
[ 1.8905, 2.0648, 1.3328, ..., 1.4722, 1.8731, 2.5703],
...,
[ 1.6117, 2.1171, 1.1759, ..., 2.3437, 1.7511, 2.6400],
[ 1.8034, 2.5006, 2.0648, ..., 2.4134, 1.9951, 2.6400],
[ 2.0823, 1.7337, 1.5594, ..., 2.0125, 2.5006, 2.6226]]]]))
Pass tensor to model
with torch.no_grad():
= learn.model(batched_tensor.cuda())
preds preds
TensorBase([[-4.9931e+00, -1.9711e+00, -3.3677e+00, -3.0452e+00, 3.9567e+00,
3.9293e+00, 3.1657e+00, -5.3549e+00, -7.9026e+00, -1.5491e+00,
-2.4086e+00, -2.6251e+00, -4.0321e+00, -7.3666e+00, -1.0557e+00,
-3.2344e-01, 4.7887e+00, -4.8819e+00, 6.5188e+00, 1.1152e+00,
-5.9519e-01, 1.1730e+01, 3.0779e+01, -4.4505e+00, -1.0000e+01,
-9.1124e+00, -3.7176e-01, -4.2437e+00, -8.6924e+00, -1.5119e+00,
-8.4118e+00, 9.1559e-01, -7.6669e+00, 1.7187e+00, 2.0639e+00,
-4.0788e+00, 9.0079e+00, -2.8547e-02, 1.1223e+00, -3.2541e-02,
8.9209e+00, -4.2307e+00, -3.6343e+00, -9.8461e-01, -4.2557e+00,
-2.2238e+00, -5.9167e+00, 7.0386e+00, -7.7322e+00, 4.3321e+00,
-3.1247e-01]], device='cuda:0')
Process model output
=1) torch.nn.functional.softmax(preds, dim
TensorBase([[2.9133e-16, 5.9815e-15, 1.4800e-15, 2.0433e-15, 2.2450e-12, 2.1844e-12,
1.0179e-12, 2.0287e-16, 1.5878e-17, 9.1219e-15, 3.8617e-15, 3.1101e-15,
7.6160e-16, 2.7138e-17, 1.4940e-14, 3.1072e-14, 5.1585e-12, 3.2557e-16,
2.9103e-11, 1.3097e-13, 2.3678e-14, 5.3343e-09, 1.0000e+00, 5.0120e-16,
1.9486e-18, 4.7354e-18, 2.9607e-14, 6.1632e-16, 7.2077e-18, 9.4674e-15,
9.5424e-18, 1.0727e-13, 2.0099e-17, 2.3949e-13, 3.3822e-13, 7.2685e-16,
3.5069e-10, 4.1729e-14, 1.3190e-13, 4.1563e-14, 3.2148e-10, 6.2438e-16,
1.1337e-15, 1.6041e-14, 6.0902e-16, 4.6457e-15, 1.1568e-16, 4.8942e-11,
1.8828e-17, 3.2679e-12, 3.1415e-14]], device='cuda:0')
preds.argmax()
TensorBase(22, device='cuda:0')
=1)[0][preds.argmax()] torch.nn.functional.softmax(preds, dim
TensorBase(1., device='cuda:0')
Get the class labels
learn.dls.vocab
['1', '3', '4', '5', '7', '8', '9', 'A', 'B', 'Baby', 'Brother', 'C', 'D', 'Dont_like', 'E', 'F', 'Friend', 'G', 'H', 'Help', 'House', 'I', 'J', 'K', 'L', 'Like', 'Love', 'M', 'Make', 'More', 'N', 'Name', 'No', 'O_OR_0', 'P', 'Pay', 'Play', 'Q', 'R', 'S', 'Stop', 'T', 'U', 'V_OR_2', 'W_OR_6', 'With', 'X', 'Y', 'Yes', 'Z', 'nothing']
Get the predicted class label
=1).argmax()] learn.dls.vocab[torch.nn.functional.softmax(preds, dim
'J'
Export the Model
The last step is to export the trained model to ONNX format.
Define ONNX file name
= f"{dataset_path.name}-{learn.arch.__name__}.onnx"
onnx_file_name onnx_file_name
'asl-and-some-words-resnet18.onnx'
Export trained model to ONNX
We’ll use an older opset_version to ensure the model is compatible with the Barracuda library. We will also unlock the input dimensions for the model to give ourselves more flexibility in Unity. Although, we’ll want to stick close to the training resolution for the best accuracy.
torch.onnx.export(learn.model.cpu(),
batched_tensor,
onnx_file_name,=True,
export_params=9,
opset_version=True,
do_constant_folding= ['input'],
input_names = ['output'],
output_names ={'input': {2 : 'height', 3 : 'width'}}
dynamic_axes )
Export class labels
We can export the list of class labels to a JSON file and import it into the Unity project. That way, we don’t have to hardcode them, and we can easily swap in models trained on different datasets.
import json
= {"classes": list(learn.dls.vocab)}
class_labels = f"{dataset_path.name}-classes.json"
class_labels_file_name
with open(class_labels_file_name, "w") as write_file:
json.dump(class_labels, write_file)
Summary
In this post, we walked through how to finetune a ResNet model for image classification using the fastai library and export it to ONNX format. Part 2 will cover implementing the trained model in a Unity project using the Barracuda library.
Previous: Getting Started With Deep Learning in Unity
Next: Fastai to Unity Tutorial Pt. 2
Project Resources: GitHub Repository
- I’m Christian Mills, a deep learning consultant specializing in computer vision and practical AI implementations.
- I help clients leverage cutting-edge AI technologies to solve real-world problems.
- Learn more about me or reach out via email at [email protected] to discuss your project.