Notes on fastai Book Ch. 15
ai
fastai
notes
pytorch
Chapter 15 provides a deep dive into different application architectures in the fast.ai library.
This post is part of the following series:
- Application Architectures Deep Dive
- Computer Vision
- Natural Language Processing
- Tabular
- Conclusion
- References
#hide
# !pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
#hide
from fastbook import *
import inspect
def print_source(obj):
for line in inspect.getsource(obj).split("\n"):
print(line)
Application Architectures Deep Dive
Computer Vision
cnn_learner
Transfer Learning
- the head (the final layers) of the pretrained model needs to be cut off and replaced
- fastai stores where to cut the included pretrained models in the model_meta dictionary
Head
- the part that is specialized for a particular task
- generally the part after the adaptive average pooling layer
Body
- everything other than the head
- includes the stem
pd.DataFrame(model_meta)
<function xresnet18 at 0x7f8668ee5310> | <function xresnet34 at 0x7f8668ee53a0> | <function xresnet50 at 0x7f8668ee5430> | <function xresnet101 at 0x7f8668ee54c0> | <function xresnet152 at 0x7f8668ee5550> | <function resnet18 at 0x7f866b0d0670> | <function resnet34 at 0x7f866b0d0700> | <function resnet50 at 0x7f866b0d0790> | <function resnet101 at 0x7f866b0d0820> | <function resnet152 at 0x7f866b0d08b0> | <function squeezenet1_0 at 0x7f866b0d7790> | <function squeezenet1_1 at 0x7f866b0d7820> | <function densenet121 at 0x7f866a7b78b0> | <function densenet169 at 0x7f866a7b79d0> | <function densenet201 at 0x7f866a7b7a60> | <function densenet161 at 0x7f866a7b7940> | <function vgg11_bn at 0x7f866b0d7040> | <function vgg13_bn at 0x7f866b0d7160> | <function vgg16_bn at 0x7f866b0d7280> | <function vgg19_bn at 0x7f866b0d73a0> | <function alexnet at 0x7f866b0c2d30> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cut | -4 | -4 | -4 | -4 | -4 | -2 | -2 | -2 | -2 | -2 | -1 | -1 | -1 | -1 | -1 | -1 | -2 | -2 | -2 | -2 | -2 |
split | <function _xresnet_split at 0x7f8662906ee0> | <function _xresnet_split at 0x7f8662906ee0> | <function _xresnet_split at 0x7f8662906ee0> | <function _xresnet_split at 0x7f8662906ee0> | <function _xresnet_split at 0x7f8662906ee0> | <function _resnet_split at 0x7f8662906f70> | <function _resnet_split at 0x7f8662906f70> | <function _resnet_split at 0x7f8662906f70> | <function _resnet_split at 0x7f8662906f70> | <function _resnet_split at 0x7f8662906f70> | <function _squeezenet_split at 0x7f866290e040> | <function _squeezenet_split at 0x7f866290e040> | <function _densenet_split at 0x7f866290e0d0> | <function _densenet_split at 0x7f866290e0d0> | <function _densenet_split at 0x7f866290e0d0> | <function _densenet_split at 0x7f866290e0d0> | <function _vgg_split at 0x7f866290e160> | <function _vgg_split at 0x7f866290e160> | <function _vgg_split at 0x7f866290e160> | <function _vgg_split at 0x7f866290e160> | <function _alexnet_split at 0x7f866290e1f0> |
stats | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) | ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) |
model_meta[resnet50]
{'cut': -2,
'split': <function fastai.vision.learner._resnet_split(m)>,
'stats': ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])}
'split']) print_source(model_meta[resnet50][
def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
20,2) create_head(
Sequential(
(0): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=1)
(mp): AdaptiveMaxPool2d(output_size=1)
)
(1): Flatten(full=False)
(2): BatchNorm1d(40, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.25, inplace=False)
(4): Linear(in_features=40, out_features=512, bias=False)
(5): ReLU(inplace=True)
(6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.5, inplace=False)
(8): Linear(in_features=512, out_features=2, bias=False)
)
Note: fastai add two linear layers by default for transfer learning * using just one linear layer is unlikely to be enough when transferring a pretrained model to very different domains
create_head
<function fastai.vision.learner.create_head(nf, n_out, lin_ftrs=None, ps=0.5, concat_pool=True, first_bn=True, bn_final=False, lin_first=False, y_range=None)>
print_source(create_head)
def create_head(nf, n_out, lin_ftrs=None, ps=0.5, concat_pool=True, first_bn=True, bn_final=False,
lin_first=False, y_range=None):
"Model head that takes `nf` features, runs through `lin_ftrs`, and out `n_out` classes."
if concat_pool: nf *= 2
lin_ftrs = [nf, 512, n_out] if lin_ftrs is None else [nf] + lin_ftrs + [n_out]
bns = [first_bn] + [True]*len(lin_ftrs[1:])
ps = L(ps)
if len(ps) == 1: ps = [ps[0]/2] * (len(lin_ftrs)-2) + ps
actns = [nn.ReLU(inplace=True)] * (len(lin_ftrs)-2) + [None]
pool = AdaptiveConcatPool2d() if concat_pool else nn.AdaptiveAvgPool2d(1)
layers = [pool, Flatten()]
if lin_first: layers.append(nn.Dropout(ps.pop(0)))
for ni,no,bn,p,actn in zip(lin_ftrs[:-1], lin_ftrs[1:], bns, ps, actns):
layers += LinBnDrop(ni, no, bn=bn, p=p, act=actn, lin_first=lin_first)
if lin_first: layers.append(nn.Linear(lin_ftrs[-2], n_out))
if bn_final: layers.append(nn.BatchNorm1d(lin_ftrs[-1], momentum=0.01))
if y_range is not None: layers.append(SigmoidRange(*y_range))
return nn.Sequential(*layers)
One Last Batchnorm
- bn_final: setting this to True will cause a batchnorm layher to be added as the final layer
- can be useful in helping your model scale appropriately for your output activations
AdaptiveConcatPool2d
fastai.layers.AdaptiveConcatPool2d
print_source(AdaptiveConcatPool2d)
class AdaptiveConcatPool2d(Module):
"Layer that concats `AdaptiveAvgPool2d` and `AdaptiveMaxPool2d`"
def __init__(self, size=None):
self.size = size or 1
self.ap = nn.AdaptiveAvgPool2d(self.size)
self.mp = nn.AdaptiveMaxPool2d(self.size)
def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
unet_learner
- used for generative vision models
- use a custom head which progressively increases the dimensions back to the same as the source image
- can use nearest neighbor interpolation
- can use a transposed convolution
- zero padding is inserted between all the pixels in the input before performing a convolution
- also known as a stride half convolution
- fastai
ConvLayer(transpose=True)
- unets use skip connections pass information from the encoding layers in the body to the decoding layers in the head
- U-Net: Convolutional Networks for Biomedical Image Segmentation
Tasks
- segmentation
- super resolution
- colorization
- style transfer
A Siamese Network
#hide
from fastai.vision.all import *
= untar_data(URLs.PETS)
path path
Path('/home/innom-dt/.fastai/data/oxford-iiit-pet')
= get_image_files(path/"images")
files files
(#7390) [Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Birman_121.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/shiba_inu_131.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bombay_176.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bengal_199.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/beagle_41.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/beagle_27.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/great_pyrenees_181.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/Bengal_100.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/keeshond_124.jpg'),Path('/home/innom-dt/.fastai/data/oxford-iiit-pet/images/havanese_115.jpg')...]
# Custom type to allow us to show siamese image pairs
# Tracks whether the two images belong to the same class
class SiameseImage(fastuple):
def show(self, ctx=None, **kwargs):
= self
img1,img2,same_breed if not isinstance(img1, Tensor):
if img2.size != img1.size: img2 = img2.resize(img1.size)
= tensor(img1),tensor(img2)
t1,t2 = t1.permute(2,0,1),t2.permute(2,0,1)
t1,t2 else: t1,t2 = img1,img2
= t1.new_zeros(t1.shape[0], t1.shape[1], 10)
line return show_image(torch.cat([t1,line,t2], dim=2),
=same_breed, ctx=ctx) title
def label_func(fname):
return re.match(r'^(.*)_\d+.jpg$', fname.name).groups()[0]
class SiameseTransform(Transform):
def __init__(self, files, label_func, splits):
# Generate list of unique labels
self.labels = files.map(label_func).unique()
# Create a dictionary to match labels to filenames
self.lbl2files = {l: L(f for f in files if label_func(f) == l)
for l in self.labels}
self.label_func = label_func
self.valid = {f: self._draw(f) for f in files[splits[1]]}
def encodes(self, f):
= self.valid.get(f, self._draw(f))
f2,t = PILImage.create(f),PILImage.create(f2)
img1,img2 # Create siamese image pair
return SiameseImage(img1, img2, t)
def _draw(self, f):
# 50/50 chance of generating a pair of the same class
= random.random() < 0.5
same = self.label_func(f)
cls if not same:
= random.choice(L(l for l in self.labels if l != cls))
cls return random.choice(self.lbl2files[cls]),same
= RandomSplitter()(files)
splits = SiameseTransform(files, label_func, splits)
tfm = TfmdLists(files, tfm, splits=splits)
tls = tls.dataloaders(after_item=[Resize(224), ToTensor],
dls =[IntToFloatTensor, Normalize.from_stats(*imagenet_stats)]) after_batch
class SiameseModel(Module):
def __init__(self, encoder, head):
self.encoder,self.head = encoder,head
def forward(self, x1, x2):
= torch.cat([self.encoder(x1), self.encoder(x2)], dim=1)
ftrs return self.head(ftrs)
= create_body(resnet34, cut=-2) encoder
create_body
<function fastai.vision.learner.create_body(arch, n_in=3, pretrained=True, cut=None)>
print_source(create_body)
def create_body(arch, n_in=3, pretrained=True, cut=None):
"Cut off the body of a typically pretrained `arch` as determined by `cut`"
model = arch(pretrained=pretrained)
_update_first_layer(model, n_in, pretrained)
#cut = ifnone(cut, cnn_config(arch)['cut'])
if cut is None:
ll = list(enumerate(model.children()))
cut = next(i for i,o in reversed(ll) if has_pool_type(o))
if isinstance(cut, int): return nn.Sequential(*list(model.children())[:cut])
elif callable(cut): return cut(model)
else: raise NameError("cut must be either integer or a function")
= create_head(512*2, 2, ps=0.5)
head head
Sequential(
(0): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=1)
(mp): AdaptiveMaxPool2d(output_size=1)
)
(1): Flatten(full=False)
(2): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.25, inplace=False)
(4): Linear(in_features=2048, out_features=512, bias=False)
(5): ReLU(inplace=True)
(6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.5, inplace=False)
(8): Linear(in_features=512, out_features=2, bias=False)
)
= SiameseModel(encoder, head) model
def loss_func(out, targ):
return nn.CrossEntropyLoss()(out, targ.long())
# Tell fastai how to split the model into parameter groups
def siamese_splitter(model):
return [params(model.encoder), params(model.head)]
= Learner(dls, model, loss_func=loss_func,
learn =siamese_splitter, metrics=accuracy)
splitter learn.freeze()
Learner.freeze
<function fastai.learner.Learner.freeze(self: fastai.learner.Learner)>
print_source(Learner.freeze)
@patch
def freeze(self:Learner): self.freeze_to(-1)
print_source(Learner.freeze_to)
@patch
def freeze_to(self:Learner, n):
if self.opt is None: self.create_opt()
self.opt.freeze_to(n)
self.opt.clear_state()
4, 3e-3) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.530529 | 0.281408 | 0.887686 | 00:29 |
1 | 0.377506 | 0.224826 | 0.912043 | 00:29 |
2 | 0.276916 | 0.195273 | 0.928958 | 00:29 |
3 | 0.242797 | 0.170715 | 0.933018 | 00:29 |
learn.unfreeze()4, slice(1e-6,1e-4)) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.249987 | 0.160208 | 0.939784 | 00:38 |
1 | 0.236774 | 0.157880 | 0.941137 | 00:38 |
2 | 0.222469 | 0.151024 | 0.945196 | 00:38 |
3 | 0.218679 | 0.160581 | 0.939107 | 00:38 |
Natural Language Processing
- We can convert an AWD-LSTM language model into a transfer learning classifier by selecting stack RNN for the encoder
- Universal Language Model Fine-tuning for Text Classification
- divide the document into fixed-length batches of size b
- the model is initialized at the beginning of each batch with the final state of the previous batch
- keep track of the hidden states for mean and max-pooling
- gradients are backpropogated to the batches whose hidden states contributed to the final prediction
- use variable lenght backpropogation sequences
- the classifier contains a for-loop, which loops over each batch of a sequence
- need to gather data in batches
- each text needs to be treated separately as they have their own labels
- it is likely the texts will not all be the same length
- we won’t be able to put them all in the same array
- need to use padding
- when grabbing a bunch of texts, determine which one has the greatest length
- fill the ones that are shorter with the special character
xxpad
. - make sure texts of similar sizes are put together to minimize excess padding
- the state is maintained across batches
- the activations of each batch are stored
- at the end, we use the same average and max concatenated pooling trick used for computer vision models
Tabular
from fastai.tabular.all import *
TabularModel
fastai.tabular.model.TabularModel
print_source(TabularModel)
class TabularModel(Module):
"Basic model for tabular data."
def __init__(self, emb_szs, n_cont, out_sz, layers, ps=None, embed_p=0.,
y_range=None, use_bn=True, bn_final=False, bn_cont=True, act_cls=nn.ReLU(inplace=True),
lin_first=True):
ps = ifnone(ps, [0]*len(layers))
if not is_listy(ps): ps = [ps]*len(layers)
self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
self.emb_drop = nn.Dropout(embed_p)
self.bn_cont = nn.BatchNorm1d(n_cont) if bn_cont else None
n_emb = sum(e.embedding_dim for e in self.embeds)
self.n_emb,self.n_cont = n_emb,n_cont
sizes = [n_emb + n_cont] + layers + [out_sz]
actns = [act_cls for _ in range(len(sizes)-2)] + [None]
_layers = [LinBnDrop(sizes[i], sizes[i+1], bn=use_bn and (i!=len(actns)-1 or bn_final), p=p, act=a, lin_first=lin_first)
for i,(p,a) in enumerate(zip(ps+[0.],actns))]
if y_range is not None: _layers.append(SigmoidRange(*y_range))
self.layers = nn.Sequential(*_layers)
def forward(self, x_cat, x_cont=None):
if self.n_emb != 0:
x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
x = torch.cat(x, 1)
x = self.emb_drop(x)
if self.n_cont != 0:
if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
return self.layers(x)
def forward(self, x_cat, x_cont=None):
# Check if there are any embeddings to deal with
if self.n_emb != 0:
# Get the activations of each embedding matrix
= [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
x # Concatenate embeddings to a single tensor
= torch.cat(x, 1)
x # Apply dropout
= self.emb_drop(x)
x # Check if there are any continuous variables to deal with
if self.n_cont != 0:
# Pass continuous variables through batch normalization layer
if self.bn_cont is not None: x_cont = self.bn_cont(x_cont)
# Concatenate continuous variables with embedding activations
= torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
x # Pass concatenated input through linear layers
return self.layers(x)
Conclusion
- deep learning can be challenging because your data, memory, and time are typically limited
- train a smaller model when memory is limited
- if you are not able to overfit your model to your data, you are not taking advantage of the capacity of your model
- You should first get to a point where you can overfit
- Steps to reduce overfitting in order of priority
- More data
- add more labels to data you already have
- find additional tasks your model could be asked to solve
- create additional synthetic data by using more or different augmentation techniques
- Data augmentation
- Mixup
- Generalizable architecture
- Add batch normalization
- Regularization
- Adding dropout to the last layer or two is often sufficient
- Adding dropout of different types throughout your model can help even more
- Reduce architecture complexity
- Should be the last thing you try
References
Previous: Notes on fastai Book Ch. 14
Next: Notes on fastai Book Ch. 16
About Me:
I’m Christian Mills, a deep learning consultant specializing in practical AI implementations. I help clients leverage cutting-edge AI technologies to solve real-world problems.
Interested in working together? Fill out my Quick AI Project Assessment form or learn more about me.