# Notes on fastai Book Ch. 7

ai
fastai
notes
pytorch
Chapter 7 covers data normalization, progressive resizing, test-time augmentation, mixup, and label smoothing.
Published

March 14, 2022

## Training a State-of-the-Art Model

• the dataset you are given is not necessarily the dataset you want.
• aim to have an iteration speed of no more than a couple of minutes
• the more experiments your can do the better

## Imagenette

from fastai.vision.all import *
python path = untar_data(URLs.IMAGENETTE) path
python parent_label
dblock = DataBlock(blocks=(
# TransformBlock for images
ImageBlock(),
# TransformBlock for single-label categorical target
CategoryBlock()),
# recursively load image files from path
get_items=get_image_files,
# label images using the parent folder name
get_y=parent_label,
# presize images to 460px
item_tfms=Resize(460),
# Batch resize to 224 and perform data augmentations
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = dblock.dataloaders(path, bs=64, num_workers=8)
python xresnet50
python CrossEntropyLossFlat
# Initialize the model without pretrained weights
model = xresnet50(n_out=dls.c)
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
learn.fit_one_cycle(5, 3e-3)
epoch train_loss valid_loss accuracy time
0 1.672769 3.459394 0.301718 00:59
1 1.224001 1.404229 0.552651 01:00
2 0.968035 0.996460 0.660941 01:00
3 0.699550 0.709341 0.771471 01:00
4 0.578120 0.571692 0.820388 01:00
 python # Initialize the model without pretrained weights model = xresnet50(n_out=dls.c) # Use mixed precision learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy).to_fp16() learn.fit_one_cycle(5, 3e-3) ## Normalization - normalized data: has a mean value of 0 and a standard deviation of 1 - it is easier to train models with normalized data - normalization is especially important when using pretrained models - make sure to use the same normalization stats the pretrained model was trained on
x,y = dls.one_batch()
x.mean(dim=[0,2,3]),x.std(dim=[0,2,3])
(TensorImage([0.4498, 0.4448, 0.4141], device='cuda:0'),
TensorImage([0.2893, 0.2792, 0.3022], device='cuda:0'))

#### Normalize

Normalize
fastai.data.transforms.Normalize
python Normalize.from_stats text <bound method Normalize.from_stats of <class 'fastai.data.transforms.Normalize'>>
python def get_dls(bs, size): dblock = DataBlock(blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, get_y=parent_label, item_tfms=Resize(460), batch_tfms=[*aug_transforms(size=size, min_scale=0.75), Normalize.from_stats(*imagenet_stats)]) return dblock.dataloaders(path, bs=bs)
dls = get_dls(64, 224)
python x,y = dls.one_batch() x.mean(dim=[0,2,3]),x.std(dim=[0,2,3]) text (TensorImage([-0.2055, -0.0843, 0.0192], device='cuda:0'), TensorImage([1.1835, 1.1913, 1.2377], device='cuda:0'))
python model = xresnet50(n_out=dls.c) learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy).to_fp16() learn.fit_one_cycle(5, 3e-3)
## Progressive Resizing
- start training with smaller images and end training with larger images - gradually using larger and larger images as you train - used by a team of fast.ai students to win the DAWNBench competition in 2018 - smaller images helps training complete much faster - larger images helps makes accuracy much higher - progressive resizing serves as another form of data augmentation - should result in better generalization - progressive resizing might hurt performance when using transfer learning - most likely to happen if your pretrained model was very similar to your target task and the dataset it was trained on had similar-sized images
dls = get_dls(128, 128)
learn = Learner(dls, xresnet50(n_out=dls.c), loss_func=CrossEntropyLossFlat(),
metrics=accuracy).to_fp16()
learn.fit_one_cycle(4, 3e-3)
epoch train_loss valid_loss accuracy time
0 1.627504 2.495554 0.393951 00:21
1 1.264693 1.233987 0.613518 00:21
2 0.970736 0.958903 0.707618 00:21
3 0.740324 0.659166 0.794996 00:21
 python learn.dls = get_dls(64, 224) learn.fine_tune(5, 1e-3) ## Test Time Augmentation - during inference or validation, creating multiple versions of each image using augmentation, and then taking the average or maximum of the predictions for each augmented version of the image - can result in dramatic improvements in accuracy, depending on the dataset - does not change the time required to train - will increase the amount of time required for validation or inference #### Learner.tta * https://docs.fast.ai/learner.html#Learner.tta * returns predictions using Test Time Augmentation
learn.tta
<bound method Learner.tta of <fastai.learner.Learner object at 0x7f75b4be5f40>>
 python preds,targs = learn.tta() accuracy(preds, targs).item() ## Mixup - a powerful data augmentation technique that can provide dramatically higher accuracy, especially when you don’t have much data and don’t have a pretrained model - introduced in the 2017 paper mixup: Beyond Empirical Risk Minimization - “While data augmentation consistently leads to improved generalization, the procedure is dataset-dependent, and thus requires the use of expert knowledge - Mixup steps 1. Select another image from your dataset at random 2. Pick a weight at random 3. Take a weighted average of the selected image with your image, to serve as your independent variable 4. Take a weighted average of this image’s labels with your image’s labels, to server as your dependent variable - target needs to be one-hot encoded - $$\tilde{x} = \lambda x_{i} + (1 - \lambda) x_{j} \text{, where } x_{i} \text{ and } x_{j} \text{ are raw input vectors}$$ - $$\tilde{y} = \lambda y_{i} + (1 - \lambda) y_{j} \text{, where } y_{i} \text{ and } y_{j} \text{ are one-hot label encodings}$$ - more difficult to train - less prone to overfitting - requires far more epochs to to train to get better accuracy - can be applied to types of data other than photos - can even be used on activations inside of model - resolves the issue where it is not typically possible to achieve a perfect loss score - our labels are 1s and 0s, but the outputs of softmax and sigmoid can never equal 1 or 0 - with Mixup our labels will only be exactly 1 or 0 if two images from the same class are mixed - Mixup is “accidentally” making the labels bigger than 0 or smaller than 1 - can be resolved with Label Smoothing
# Get two images from different classes
church = PILImage.create(get_image_files_sorted(path/'train'/'n03028079')[0])
gas = PILImage.create(get_image_files_sorted(path/'train'/'n03425413')[0])
# Resize images
church = church.resize((256,256))
gas = gas.resize((256,256))

# Scale pixel values to the range [0,1]
tchurch = tensor(church).float() / 255.
tgas = tensor(gas).float() / 255.

_,axs = plt.subplots(1, 3, figsize=(12,4))
# Show the first image
show_image(tchurch, ax=axs[0]);
# Show the second image
show_image(tgas, ax=axs[1]);
# Take the weighted average of the two images
show_image((0.3*tchurch + 0.7*tgas), ax=axs[2]);
python model = xresnet50() learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), metrics=accuracy, cbs=MixUp).to_fp16() learn.fit_one_cycle(15, 3e-3)
python model = xresnet50() learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(), metrics=accuracy).to_fp16() learn.fit_one_cycle(15, 3e-3)
## Label Smoothing, Mixup and Progressive Resizing
python dls = get_dls(128, 128) model = xresnet50() learn = Learner(dls, model, loss_func=LabelSmoothingCrossEntropy(), metrics=accuracy, cbs=MixUp).to_fp16() learn.fit_one_cycle(15, 3e-3)
learn.dls = get_dls(64, 224)
learn.fine_tune(10, 1e-3)
epoch train_loss valid_loss accuracy time
0 1.951753 1.672776 0.789395 00:36
epoch train_loss valid_loss accuracy time
0 1.872399 1.384301 0.892457 00:36
1 1.860005 1.441491 0.864078 00:36
2 1.876859 1.425859 0.867438 00:36
3 1.851872 1.460640 0.863331 00:36
4 1.840423 1.413441 0.880508 00:36
5 1.808990 1.444332 0.863704 00:36
6 1.777755 1.321098 0.910754 00:36
7 1.761589 1.312523 0.912621 00:36
8 1.756679 1.302988 0.919716 00:36
9 1.745481 1.304583 0.918969 00:36

## References

Previous: Notes on fastai Book Ch. 6