Notes on fastai Book Ch. 5

Chapter 5 covers creating a custom DataBlock for an image classifier, pre-sizing, cross-entropy loss, model interpretation, picking learning rates, transfer learning, and discriminative learning rates.

March 14, 2022

Image Classification

  • There are a lot of details you need to get right for your models to be accurate and reliable
  • You must be able to look inside your neural network as it trains and as it makes predictions, find possible problems and know how to fix them

From Dogs and Cats to Pet Breeds

  • In real-life
    1. start with a dataset that we know nothing about
    2. figure out how it is put together
    3. figure out how to extract the data we need from it
    4. figure out what the data looks like
  • Data is usually provided in one of two ways
    • Individual files representing items of data, possibly organized into folder or with filenames representing information about those items
      • text documents
      • images
    • A table of data in which each row is an item and may include filenames providing connections between the data in the table and data in other formats
      • CSV files
    • Exceptions:
      • Domains like Genomics
        • binary database formats
        • network streams

from import *
matplotlib.rc('image', cmap='Greys')

The Oxford-IIIT Pet Dataset

  • a 37 category pet dataset with roughly 200 images for each class
  • images have a large variations in scale, pose and lighting
  • all images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation
python path = untar_data(URLs.PETS) path text Path('/home/innom-dt/.fastai/data/oxford-iiit-pet')
python #hide Path.BASE_PATH = path path
(#2) [Path('images'),Path('annotations')]
python # associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation # Not needed for classification (path/"annotations").ls() text (#7) [Path('annotations/trimaps'),Path('annotations/xmls'),Path('annotations/._trimaps'),Path('annotations/list.txt'),Path('annotations/test.txt'),Path('annotations/README'),Path('annotations/trainval.txt')]
python (path/"images").ls()
Pet breed and species is indicated in the file name for each image * Cat breeds have capitalized file names and dog breeds have lowercase file names
fname = (path/"images").ls()[0]

Regular Expressions

python # Matches all instances of any group of characters before a sequence of digits right before '.jpg' re.findall(r'(.+)_\d+.jpg$',
python get_image_files
pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
                 batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")
python dls.c text 37
python import pandas as pd
#### fastai Categorize:
* * reversible transform of a category string to a vocab id * inherits from DisplayedTransform
#### fastai DisplayedTransform
* * A transform that shows its attributes
0 Abyssinian
1 Bengal
2 Birman
3 Bombay
4 British_Shorthair
5 Egyptian_Mau
6 Maine_Coon
7 Persian
8 Ragdoll
9 Russian_Blue
10 Siamese
11 Sphynx
12 american_bulldog
13 american_pit_bull_terrier
14 basset_hound
15 beagle
16 boxer
17 chihuahua
18 english_cocker_spaniel
19 english_setter
20 german_shorthaired
21 great_pyrenees
22 havanese
23 japanese_chin
24 keeshond
25 leonberger
26 miniature_pinscher
27 newfoundland
28 pomeranian
29 pug
30 saint_bernard
31 samoyed
32 scottish_terrier
33 shiba_inu
34 staffordshire_bull_terrier
35 wheaten_terrier
36 yorkshire_terrier


  • we need our images to have the same dimensions before we collate them into tensors
  • we should compose our augmentation transforms into fewer transformations and transform the images into uniform sizes

Steps for presizing

  1. Resize images to relatively “large” dimensions compared to the target training dimensions
    • Larger images have some spare margin for augmentations that might result in empty zones in the image
    • augmented images are then cropped and resized to a square
      • the crop area is chosen randomly on the training set
  2. Compose all the common augmentation operations into one, and perform a combined operation on the GPU
    • all potentially destructive operations are performed together with a single interpolation at the end
dblock1 = DataBlock(blocks=(ImageBlock(), CategoryBlock()),
python (Path.cwd()/'images'/'grizzly.jpg') text Path('/media/innom-dt/Samsung_T3/Projects/Current_Projects/fastbook/clean/images/grizzly.jpg')
python # Create a test DataLoaders object with 100 copies of the same image dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8) print(len(dls1.items)) print(dls1.categorize.vocab)
# Return elements from the iterable until it is exhausted.
dls1.train.get_idxs = lambda: Inf.ones


python type(Inf.ones)
#### fastai DataLoader.one_batch:
* * Return one batch from DataLoader
python x,y = dls1.one_batch() print(x.shape) print(y.shape) print(y)
<class 'fastai.torch_core.TensorImage'>

fastai TensorImage:





python print(TensorImage) print(TensorImage.affine_coord) print(TensorImage.rotate) print(TensorImage.zoom) print(TensorImage.warp)
#### fastcore Pipeline:
* * A pipeline of composed (for encode/decode) transforms, setup with types * a wrapper for compose_tfms

_,axs = subplots(1, 2)

x1 = TensorImage(x.clone())
x1 = x1.affine_coord(sz=224)
x1 = x1.rotate(draw=30, p=1.)
x1 = x1.zoom(draw=1.2, p=1.)
x1 = x1.warp(draw_x=-0.2, draw_y=0.2, p=1.)

# Go through transforms and combine together affine/coord or lighting transforms
tfms = setup_aug_tfms([Rotate(draw=30, p=1, size=224), Zoom(draw=1.2, p=1., size=224),
                       Warp(draw_x=-0.2, draw_y=0.2, p=1., size=224)])
x = Pipeline(tfms)(x)
#x.affine_coord(coord_tfm=coord_tfm, sz=size, mode=mode, pad_mode=pad_mode)


Checking and Debugging a DataBlock

  • always check your data when creating a new DataBlock
    • make sure the augmentations work as intended
  • once your data looks right, run it through a simple model
    • start getting feedback as soon as possible

dls.show_batch(nrows=1, ncols=3)


pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))


<function, source, bs=4, show_batch=False, **kwargs)>

Setting-up type transforms pipelines
Collecting items from /home/innom-dt/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
    applying PILBase.create gives
      PILImage mode=RGB size=400x500
  Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
    applying partial gives
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives

Final sample: (PILImage mode=RGB size=400x500, TensorCategory(21))

Collecting items from /home/innom-dt/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: ToTensor
    starting from
      (PILImage mode=RGB size=400x500, TensorCategory(21))
    applying ToTensor gives
      (TensorImage of size 3x500x400, TensorCategory(21))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch
Error! It's not possible to collate your items in a batch
Could not collate the 0-th members of your tuples because got the following shapes
torch.Size([3, 500, 400]),torch.Size([3, 334, 500]),torch.Size([3, 375, 500]),torch.Size([3, 500, 375])
python learn = cnn_learner(dls, resnet34, metrics=error_rate) learn.fine_tune(2)
python y.shape
TensorCategory([31, 30,  5, 17,  6,  7,  4, 22,  4, 27,  2, 19, 12, 14, 11,  8,  5, 26, 14, 11, 28, 25, 35,  4, 22, 36, 31,  9, 27, 20, 23, 33,  2, 27,  0, 18, 12, 22, 17, 21, 25, 13, 16, 15, 33, 14, 20, 15,
         8, 18, 36, 32,  7, 26,  4, 20, 36, 14, 25, 32,  4, 14, 25, 17], device='cuda:0')


python Learner.get_preds v <function fastai.learner.Learner.get_preds(self, ds_idx=1, dl=None, with_input=False, with_decoded=False, with_loss=False, act=None, inner=False, reorder=True, cbs=None, save_preds=None, save_targs=None, concat_dim=0)>
python preds,_ = learn.get_preds(dl=[(x,y)]) preds[0]
(37, TensorBase(1.0000))


  • \(\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\)
  • takes in an n-dimensional input Tensor and rescales them so that the elements are all in the range [0,1] and sum to 1.
  • the multi-category equivalent of sigmoid
  • slightly bigger activation values are amplified, making the highest value closer to 1
  • the first part of cross-entropy loss
  • ideal for when we know each image has a definite correct label
    • during training
  • less ideal if we want our model to tell us it does not recognize what is in the image
    • during inference
    • might be better to use multiple binary output columns, each using a sigmoid activation

plot_function(torch.sigmoid, min=-4,max=4)


python # set random seed to get same results across sessions torch.random.manual_seed(42);
python # Create a random set of test activations for a binary classification problem acts = torch.randn((6,2))*2 acts
tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])
python (acts[:,0]-acts[:,1]).sigmoid()
python def softmax(x): return 2.718**x / (2.718**x).sum(dim=1, keepdim=True) softmax(acts)
tensor([0.6025, 0.3975])


python torch.softmax text <function _VariableFunctionsClass.softmax>
python sm_acts = torch.softmax(acts, dim=1) sm_acts
### Log Likelihood
\(y_{i} = \log_{e}{x_{i}}\)
\(\log{(a*b)} = \log{(a)} + \log{(b)}\)
- logarithms increase linearly when the underlying signal increases exponentially or multiplicatively - means multiplication can be replaced with addition, which is easier for working with very larger or very small numbers
The gradient of cross_entropy(a,b) is softmax(a)-b
- the gradient is proportional to the difference between the prediction and the target - the same as mean squared error in regression, since the gradient of (a-b)**2 is 2*(a-b) - since the gradient is linear, we will not see any sudden jumps or exponential increases in gradients
# Sample labels for testing
targ = tensor([0,1,0,1,1,0])
python sm_acts text tensor([[0.6025, 0.3975], [0.5021, 0.4979], [0.1332, 0.8668], [0.9966, 0.0034], [0.5959, 0.4041], [0.3661, 0.6339]])
python idx = range(6) sm_acts[idx, targ]
df = pd.DataFrame(sm_acts, columns=["3","7"])
df['targ'] = targ
df['idx'] = idx
df['loss'] = sm_acts[range(6), targ]
3 7 targ idx loss
0 0.602469 0.397531 0 0 0.602469
1 0.502065 0.497935 1 1 0.497935
2 0.133188 0.866811 0 2 0.133188
3 0.996640 0.003360 1 3 0.003360
4 0.595949 0.404051 1 4 0.404051
5 0.366118 0.633882 0 5 0.366118
python -sm_acts[idx, targ]
#### F.nll_loss
* * The negative log likelihood (nll) loss. * does not actually take the log, because it assumes you have already taken the log * designed to be used after log_softmax
<function torch.nn.functional.nll_loss(input: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, size_average: Optional[bool] = None, ignore_index: int = -100, reduce: Optional[bool] = None, reduction: str = 'mean') -> torch.Tensor>
python F.nll_loss(sm_acts, targ, reduction='none')
python torch.log
plot_function(torch.log, min=0,max=4)



python loss_func = nn.CrossEntropyLoss()
python loss_func(acts, targ)
F.cross_entropy(acts, targ)
python -torch.log(-F.nll_loss(sm_acts, targ, reduction='none')) text tensor([0.5067, 0.6973, 2.0160, 5.6958, 0.9062, 1.0048])
python # Do not take the mean nn.CrossEntropyLoss(reduction='none')(acts, targ)
## Model Interpretation - It is very hard to interpret loss functions directly as they are optimized for differentiation and optimization, not human consumption
#### ClassificationInterpretation * * Interpretation base class for exploring predictions from trained models
#### ClassificationInterpretation.from_learner * * Construct interpretation object from a learner
python interp = ClassificationInterpretation.from_learner(learn) interp.plot_confusion_matrix(figsize=(12,12), dpi=60) png
python interp.most_confused(min_val=4)
## Improving Our Model ### The Learning Rate Finder - picking the right learning rate is one of the most important things we can doe when training a model - a learning rate that is too small can take many, many epochs, increasing both training time and the risk of overfitting - a learning rate that is too big can prevent the model from improving at all - Naval researcher, Leslie Smith, came up with the idea of a learning rate finder in 2015 1. start with a very, very small learning rate 2. use the small learning rate for one mini-batch 3. find what the losses are after that one mini-batch 4. increase the learning rate by a certain percentage 5. repeat steps 2-4 until the loss gets worse 6. select a learning rate that is a bit lower than the highest useful learning rate - Either one order of magnitude less than where the minimum loss was achieved or the last point where the loss was clearly decreasing
# Test using a very high learning rate
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1, base_lr=0.1)
epoch train_loss valid_loss error_rate time
0 2.648785 4.358732 0.443843 00:19
epoch train_loss valid_loss error_rate time
0 4.398980 3.214994 0.839648 00:23

Using a very high learning rate resulted in an increasing error rate


learn = cnn_learner(dls, resnet34, metrics=error_rate)
lr_min, lr_steep = learn.lr_find(suggest_funcs=(minimum, steep))


Note: The plot has a logarithmic scale

python print(f"Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}")
python learn = cnn_learner(dls, resnet34, metrics=error_rate) learn.fine_tune(2, base_lr=lr_steep)
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=3e-3)
epoch train_loss valid_loss error_rate time
0 1.264121 0.360774 0.117727 00:19
epoch train_loss valid_loss error_rate time
0 0.544009 0.415368 0.131935 00:23
1 0.332703 0.216870 0.066306 00:23

Unfreezing and Transfer Learning

  • freezing: only updating the weights in newly added layers while leaving the rest of a pretrained model unchanged
  • Process
    1. add new layers to pretrained model
    2. freeze pretrained layers
    3. train for a few epochs where only the new layers get updated
    4. unfreeze the pretrained layers
    5. train for a more epochs


python Learner.fine_tune text <function fastai.callback.schedule.Learner.fine_tune(self: fastai.learner.Learner, epochs, base_lr=0.002, freeze_epochs=1, lr_mult=100, pct_start=0.3, div=5.0, lr_max=None, div_final=100000.0, wd=None, moms=None, cbs=None, reset_opt=False)>
python learn = cnn_learner(dls, resnet34, metrics=error_rate) # Train new layers for 3 epochs learn.fit_one_cycle(3, 3e-3)
#### Learner.unfreeze() * * Unfreeze the entire model
lr_min, lr_steep = learn.lr_find(suggest_funcs=(minimum, steep))


python lr_min
python learn.fit_one_cycle(6, lr_max=1e-5)
python learn = cnn_learner(dls, resnet34, metrics=error_rate) learn.fit_one_cycle(3, 3e-3) learn.unfreeze() # Set the learning rate for the earliest layer to 1e-6 # Set the learning rate for the last layer to 1e-4 # Scale the learning rate for the in-between layers to gradually increase from 1e-6 up to 1e-4 learn.fit_one_cycle(12, lr_max=slice(1e-6,1e-4))


Note: Accuracy may continue to improve, even when the validation loss starts to get worse * validation loss can get worse when your model gets overconfident, not just when it starts to memorize the training data

Selecting the Number of Epochs

  • you will often find that you are limited by time, rather than generalization and accuracy
  1. you should start with picking a number of epochs that will train in the amount of time that you are happy to wait for
  2. then look at the training and validation loss plots, and your metrics
  3. you will know that you have not trained for too long if they are still getting better even in your final epochs

Deeper Architectures

  • a model with more parameters can generally model your data more accurately
    • lots of caveats to this generalization
    • depends on the specifics of the architectures you are using
    • try a smaller model first
  • more likely to suffer from overfitting
  • requires more GPU memory
    • might need to lower the batch size
  • take longer to train


from fastai.callback.fp16 import *
learn = cnn_learner(dls, resnet50, metrics=error_rate).to_fp16()
learn.fine_tune(6, freeze_epochs=3)
epoch train_loss valid_loss error_rate time
0 1.251300 0.289129 0.081867 00:18
1 0.567936 0.275442 0.083221 00:18
2 0.440096 0.237322 0.071719 00:18
epoch train_loss valid_loss error_rate time
0 0.271949 0.198495 0.065629 00:21
1 0.322747 0.302487 0.093369 00:21
2 0.256723 0.226659 0.071042 00:21
3 0.166247 0.190719 0.064953 00:21
4 0.092291 0.155199 0.050744 00:21
5 0.060924 0.141513 0.048038 00:21


Previous: Notes on fastai Book Ch. 4

Next: Notes on fastai Book Ch. 6