Notes on fastai Book Ch. 15

fastai

notes

pytorch

Chapter 15 provides a deep dive into different application architectures in the fast.ai library.

Author

Christian Mills

Published

March 29, 2022

This post is part of the following series:

Deep Learning for Coders with fastai & PyTorch

Application Architectures Deep Dive
Computer Vision
Natural Language Processing
Tabular
Conclusion
References

#hide
# !pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

#hide
from fastbook import *

import inspect
def print_source(obj):
    for line in inspect.getsource(obj).split("\n"):
        print(line)

Application Architectures Deep Dive

Computer Vision

cnn_learner

Transfer Learning

the head (the final layers) of the pretrained model needs to be cut off and replaced
fastai stores where to cut the included pretrained models in the model_meta dictionary

Head

the part that is specialized for a particular task
generally the part after the adaptive average pooling layer

Body

everything other than the head
includes the stem

pd.DataFrame(model_meta)

	<function xresnet18 at 0x7f8668ee5310>	<function xresnet34 at 0x7f8668ee53a0>	<function xresnet50 at 0x7f8668ee5430>	<function xresnet101 at 0x7f8668ee54c0>	<function xresnet152 at 0x7f8668ee5550>	<function resnet18 at 0x7f866b0d0670>	<function resnet34 at 0x7f866b0d0700>	<function resnet50 at 0x7f866b0d0790>	<function resnet101 at 0x7f866b0d0820>	<function resnet152 at 0x7f866b0d08b0>	<function squeezenet1_0 at 0x7f866b0d7790>	<function squeezenet1_1 at 0x7f866b0d7820>	<function densenet121 at 0x7f866a7b78b0>	<function densenet169 at 0x7f866a7b79d0>	<function densenet201 at 0x7f866a7b7a60>	<function densenet161 at 0x7f866a7b7940>	<function vgg11_bn at 0x7f866b0d7040>	<function vgg13_bn at 0x7f866b0d7160>	<function vgg16_bn at 0x7f866b0d7280>	<function vgg19_bn at 0x7f866b0d73a0>	<function alexnet at 0x7f866b0c2d30>
cut	-4	-4	-4	-4	-4	-2	-2	-2	-2	-2	-1	-1	-1	-1	-1	-1	-2	-2	-2	-2	-2
split	<function _xresnet_split at 0x7f8662906ee0>	<function _xresnet_split at 0x7f8662906ee0>	<function _xresnet_split at 0x7f8662906ee0>	<function _xresnet_split at 0x7f8662906ee0>	<function _xresnet_split at 0x7f8662906ee0>	<function _resnet_split at 0x7f8662906f70>	<function _resnet_split at 0x7f8662906f70>	<function _resnet_split at 0x7f8662906f70>	<function _resnet_split at 0x7f8662906f70>	<function _resnet_split at 0x7f8662906f70>	<function _squeezenet_split at 0x7f866290e040>	<function _squeezenet_split at 0x7f866290e040>	<function _densenet_split at 0x7f866290e0d0>	<function _densenet_split at 0x7f866290e0d0>	<function _densenet_split at 0x7f866290e0d0>	<function _densenet_split at 0x7f866290e0d0>	<function _vgg_split at 0x7f866290e160>	<function _vgg_split at 0x7f866290e160>	<function _vgg_split at 0x7f866290e160>	<function _vgg_split at 0x7f866290e160>	<function _alexnet_split at 0x7f866290e1f0>
stats	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])	([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

model_meta[resnet50]

{'cut': -2,
 'split': <function fastai.vision.learner._resnet_split(m)>,
 'stats': ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])}

print_source(model_meta[resnet50]['split'])

def  _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)

create_head(20,2)

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Flatten(full=False)
  (2): BatchNorm1d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=40, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=2, bias=False)
)

Note: fastai add two linear layers by default for transfer learning * using just one linear layer is unlikely to be enough when transferring a pretrained model to very different domains

create_head

<function fastai.vision.learner.create_head(nf, n_out, lin_ftrs=None, ps=0.5, concat_pool=True, first_bn=True, bn_final=False, lin_first=False, y_range=None)>

print_source(create_head)

def create_head(nf, n_out, lin_ftrs=None, ps=0.5, concat_pool=True, first_bn=True, bn_final=False,
                lin_first=False, y_range=None):
    "Model head that takes `nf` features, runs through `lin_ftrs`, and out `n_out` classes."
    if concat_pool: nf *= 2
    lin_ftrs = [nf, 512, n_out] if lin_ftrs is None else [nf] + lin_ftrs + [n_out]
    bns = [first_bn] + [True]*len(lin_ftrs[1:])
    ps = L(ps)
    if len(ps) == 1: ps = [ps[0]/2] * (len(lin_ftrs)-2) + ps
    actns = [nn.ReLU(inplace=True)] * (len(lin_ftrs)-2) + [None]
    pool = AdaptiveConcatPool2d() if concat_pool else nn.AdaptiveAvgPool2d(1)
    layers = [pool, Flatten()]
    if lin_first: layers.append(nn.Dropout(ps.pop(0)))
    for ni,no,bn,p,actn in zip(lin_ftrs[:-1], lin_ftrs[1:], bns, ps, actns):
        layers += LinBnDrop(ni, no, bn=bn, p=p, act=actn, lin_first=lin_first)
    if lin_first: layers.append(nn.Linear(lin_ftrs[-2], n_out))
    if bn_final: layers.append(nn.BatchNorm1d(lin_ftrs[-1], momentum=0.01))
    if y_range is not None: layers.append(SigmoidRange(*y_range))
    return nn.Sequential(*layers)

One Last Batchnorm

bn_final: setting this to True will cause a batchnorm layher to be added as the final layer
can be useful in helping your model scale appropriately for your output activations

AdaptiveConcatPool2d

fastai.layers.AdaptiveConcatPool2d

print_source(AdaptiveConcatPool2d)

class AdaptiveConcatPool2d(Module):
    "Layer that concats `AdaptiveAvgPool2d` and `AdaptiveMaxPool2d`"
    def __init__(self, size=None):
        self.size = size or 1
        self.ap = nn.AdaptiveAvgPool2d(self.size)
        self.mp = nn.AdaptiveMaxPool2d(self.size)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)

unet_learner

used for generative vision models
use a custom head which progressively increases the dimensions back to the same as the source image
- can use nearest neighbor interpolation
- can use a transposed convolution
  - zero padding is inserted between all the pixels in the input before performing a convolution
  - also known as a stride half convolution
  - fastai ConvLayer(transpose=True)
unets use skip connections pass information from the encoding layers in the body to the decoding layers in the head
U-Net: Convolutional Networks for Biomedical Image Segmentation

Tasks

segmentation
super resolution
colorization
style transfer

A Siamese Network

#hide
from fastai.vision.all import *

path = untar_data(URLs.PETS)
path

Path('/home/innom-dt/.fastai/data/oxford-iiit-pet')

files = get_image_files(path/"images")
files

(#7390) [Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Birman_121.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/shiba_inu_131.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bombay_176.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bengal_199.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/beagle_41.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/beagle_27.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/great_pyrenees_181.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bengal_100.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/keeshond_124.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/havanese_115.jpg')...]

# Custom type to allow us to show siamese image pairs
# Tracks whether the two images belong to the same class
class SiameseImage(fastuple):
    def show(self, ctx=None, **kwargs): 
        img1,img2,same_breed = self
        if not isinstance(img1, Tensor):
            if img2.size != img1.size: img2 = img2.resize(img1.size)
            t1,t2 = tensor(img1),tensor(img2)
            t1,t2 = t1.permute(2,0,1),t2.permute(2,0,1)
        else: t1,t2 = img1,img2
        line = t1.new_zeros(t1.shape[0], t1.shape[1], 10)
        return show_image(torch.cat([t1,line,t2], dim=2), 
                          title=same_breed, ctx=ctx)

def label_func(fname):
    return re.match(r'^(.*)_\d+.jpg$', fname.name).groups()[0]

class SiameseTransform(Transform):
    def __init__(self, files, label_func, splits):
        # Generate list of unique labels
        self.labels = files.map(label_func).unique()
        # Create a dictionary to match labels to filenames
        self.lbl2files = {l: L(f for f in files if label_func(f) == l) 
                          for l in self.labels}
        self.label_func = label_func
        self.valid = {f: self._draw(f) for f in files[splits[1]]}
        
    def encodes(self, f):
        f2,t = self.valid.get(f, self._draw(f))
        img1,img2 = PILImage.create(f),PILImage.create(f2)
        # Create siamese image pair
        return SiameseImage(img1, img2, t)
    
    def _draw(self, f):
        # 50/50 chance of generating a pair of the same class
        same = random.random() < 0.5
        cls = self.label_func(f)
        if not same: 
            cls = random.choice(L(l for l in self.labels if l != cls)) 
        return random.choice(self.lbl2files[cls]),same

splits = RandomSplitter()(files)
tfm = SiameseTransform(files, label_func, splits)
tls = TfmdLists(files, tfm, splits=splits)
dls = tls.dataloaders(after_item=[Resize(224), ToTensor], 
    after_batch=[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)])

class SiameseModel(Module):
    def __init__(self, encoder, head):
        self.encoder,self.head = encoder,head
    
    def forward(self, x1, x2):
        ftrs = torch.cat([self.encoder(x1), self.encoder(x2)], dim=1)
        return self.head(ftrs)

encoder = create_body(resnet34, cut=-2)

create_body

<function fastai.vision.learner.create_body(arch, n_in=3, pretrained=True, cut=None)>

print_source(create_body)

def create_body(arch, n_in=3, pretrained=True, cut=None):
    "Cut off the body of a typically pretrained `arch` as determined by `cut`"
    model = arch(pretrained=pretrained)
    _update_first_layer(model, n_in, pretrained)
    #cut = ifnone(cut, cnn_config(arch)['cut'])
    if cut is None:
        ll = list(enumerate(model.children()))
        cut = next(i for i,o in reversed(ll) if has_pool_type(o))
    if   isinstance(cut, int): return nn.Sequential(*list(model.children())[:cut])
    elif callable(cut): return cut(model)
    else: raise NameError("cut must be either integer or a function")

head = create_head(512*2, 2, ps=0.5)
head

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Flatten(full=False)
  (2): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25, inplace=False)
  (4): Linear(in_features=2048, out_features=512, bias=False)
  (5): ReLU(inplace=True)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5, inplace=False)
  (8): Linear(in_features=512, out_features=2, bias=False)
)

model = SiameseModel(encoder, head)

def loss_func(out, targ):
    return nn.CrossEntropyLoss()(out, targ.long())

# Tell fastai how to split the model into parameter groups
def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

learn = Learner(dls, model, loss_func=loss_func, 
                splitter=siamese_splitter, metrics=accuracy)
learn.freeze()

Learner.freeze

<function fastai.learner.Learner.freeze(self: fastai.learner.Learner)>

print_source(Learner.freeze)

@patch
def freeze(self:Learner): self.freeze_to(-1)

print_source(Learner.freeze_to)

@patch
def freeze_to(self:Learner, n):
    if self.opt is None: self.create_opt()
    self.opt.freeze_to(n)
    self.opt.clear_state()

learn.fit_one_cycle(4, 3e-3)

epoch	train_loss	valid_loss	accuracy	time
0	0.530529	0.281408	0.887686	00:29
1	0.377506	0.224826	0.912043	00:29
2	0.276916	0.195273	0.928958	00:29
3	0.242797	0.170715	0.933018	00:29

learn.unfreeze()
learn.fit_one_cycle(4, slice(1e-6,1e-4))

epoch	train_loss	valid_loss	accuracy	time
0	0.249987	0.160208	0.939784	00:38
1	0.236774	0.157880	0.941137	00:38
2	0.222469	0.151024	0.945196	00:38
3	0.218679	0.160581	0.939107	00:38

Natural Language Processing

We can convert an AWD-LSTM language model into a transfer learning classifier by selecting stack RNN for the encoder
Universal Language Model Fine-tuning for Text Classification
- divide the document into fixed-length batches of size b
- the model is initialized at the beginning of each batch with the final state of the previous batch
- keep track of the hidden states for mean and max-pooling
- gradients are backpropogated to the batches whose hidden states contributed to the final prediction
  - use variable lenght backpropogation sequences
the classifier contains a for-loop, which loops over each batch of a sequence
- need to gather data in batches
- each text needs to be treated separately as they have their own labels
- it is likely the texts will not all be the same length
  - we won’t be able to put them all in the same array
  - need to use padding
    - when grabbing a bunch of texts, determine which one has the greatest length
    - fill the ones that are shorter with the special character xxpad.
    - make sure texts of similar sizes are put together to minimize excess padding
the state is maintained across batches
the activations of each batch are stored
at the end, we use the same average and max concatenated pooling trick used for computer vision models

Tabular

from fastai.tabular.all import *

TabularModel

fastai.tabular.model.TabularModel

print_source(TabularModel)

class TabularModel(Module):
    "Basic model for tabular data."
    def __init__(self, emb_szs, n_cont, out_sz, layers, ps=None, embed_p=0.,
                 y_range=None, use_bn=True, bn_final=False, bn_cont=True, act_cls=nn.ReLU(inplace=True),
                 lin_first=True):
        ps = ifnone(ps, [0]*len(layers))
        if not is_listy(ps): ps = [ps]*len(layers)
        self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
        self.emb_drop = nn.Dropout(embed_p)
        self.bn_cont = nn.BatchNorm1d(n_cont) if bn_cont else None
        n_emb = sum(e.embedding_dim for e in self.embeds)
        self.n_emb,self.n_cont = n_emb,n_cont
        sizes = [n_emb + n_cont] + layers + [out_sz]
        actns = [act_cls for _ in range(len(sizes)-2)] + [None]
        _layers = [LinBnDrop(sizes[i], sizes[i+1], bn=use_bn and (i!=len(actns)-1 or bn_final), p=p, act=a, lin_first=lin_first)
                       for i,(p,a) in enumerate(zip(ps+[0.],actns))]
        if y_range is not None: _layers.append(SigmoidRange(*y_range))
        self.layers = nn.Sequential(*_layers)

    def forward(self, x_cat, x_cont=None):
        if self.n_emb != 0:
            x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
            x = torch.cat(x, 1)
            x = self.emb_drop(x)
        if self.n_cont != 0:
            if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
            x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
        return self.layers(x)

def forward(self, x_cat, x_cont=None):
    # Check if there are any embeddings to deal with
    if self.n_emb != 0:
        # Get the activations of each embedding matrix
        x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
        # Concatenate embeddings to a single tensor
        x = torch.cat(x, 1)
        # Apply dropout
        x = self.emb_drop(x)
    # Check if there are any continuous variables to deal with 
    if self.n_cont != 0:
        # Pass continuous variables through batch normalization layer
        if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
        # Concatenate continuous variables with embedding activations
        x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
    # Pass concatenated input through linear layers
    return self.layers(x)

Conclusion

deep learning can be challenging because your data, memory, and time are typically limited
train a smaller model when memory is limited
if you are not able to overfit your model to your data, you are not taking advantage of the capacity of your model
You should first get to a point where you can overfit
Steps to reduce overfitting in order of priority
1. More data
- add more labels to data you already have
- find additional tasks your model could be asked to solve
- create additional synthetic data by using more or different augmentation techniques
1. Data augmentation
  - Mixup
2. Generalizable architecture
  - Add batch normalization
3. Regularization
  - Adding dropout to the last layer or two is often sufficient
  - Adding dropout of different types throughout your model can help even more
4. Reduce architecture complexity
  - Should be the last thing you try

References

Previous: Notes on fastai Book Ch. 14

Next: Notes on fastai Book Ch. 16

About Me:

I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.

Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.