Data

path = untar_data(URLs.IMAGEWOOF_320)

lbl_dict = dict(
  n02086240= 'Shih-Tzu',
  n02087394= 'Rhodesian ridgeback',
  n02088364= 'Beagle',
  n02089973= 'English foxhound',
  n02093754= 'Australian terrier',
  n02096294= 'Border terrier',
  n02099601= 'Golden retriever',
  n02105641= 'Old English sheepdog',
  n02111889= 'Samoyed',
  n02115641= 'Dingo'
)

dblock = DataBlock(blocks=(ImageBlock,CategoryBlock),
                   get_items=get_image_files,
                   splitter=GrandparentSplitter(valid_name='val'),
                   get_y=Pipeline([parent_label,lbl_dict.__getitem__]),
                   item_tfms=Resize(320),
                   batch_tfms=[*aug_transforms(size=224),Normalize.from_stats(*imagenet_stats)])
dls = dblock.dataloaders(path,bs=128)
dls.show_batch()

Learner

You can find the model weights here. It's a resnet34, trained for 9 epochs, reaching around 96% accuracy

exp_name='resnet34'
save_model = SaveModelCallback(monitor='error_rate',fname=exp_name)
learn = cnn_learner(dls,resnet34,metrics=error_rate,model_dir='/content/models',opt_func=ranger)
# learn.load(exp_name)
learn.fit_flat_cos(5,lr=1e-3,cbs=save_model)
learn.unfreeze()
learn.fit_flat_cos(5,lr=1e-4,cbs=save_model)
epoch train_loss valid_loss error_rate time
0 1.220760 0.242801 0.074828 01:26
1 0.489359 0.185332 0.057266 01:22
2 0.302766 0.175212 0.049885 01:22
3 0.231190 0.164808 0.049885 01:22
4 0.189565 0.165088 0.051413 01:22
epoch train_loss valid_loss error_rate time
0 0.198867 0.171080 0.052176 01:49
1 0.180592 0.181552 0.050904 01:48
2 0.155455 0.176730 0.050649 01:48
3 0.123675 0.188765 0.053703 01:48
4 0.100803 0.179967 0.054212 01:48

Grad-CAM

  1. gradient of score of class c, $y^c$ wrt feature map activations $A^k$ global-average-pooled over width and height dimensions

$$\alpha^c_k = \frac{1}{Z}\sum_i\sum_j \frac{\partial y^c}{\partial A^k_{ij}}$$

  1. weighted combination of forward activation maps, followed by ReLU $$L_{Grad-CAM}^C = ReLU (\sum_k \alpha^c_kA^k)$$
m = learn.model.eval()

We don't want to change the shuffle beviour of original valid dataloader, thus creating a copy

valid_dl = dls.valid
valid_dl.shuffle=True
xb,yb = valid_dl.one_batch()
dls.show_batch((xb,yb))
valid_dl.shuffle=False
idx=3
xb,yb = xb[idx][None],yb[idx][None]
x_dec,y_dec = valid_dl.decode_batch((xb,yb))[0]
show_image(x_dec,title=y_dec,figsize=(5,5));
hook,hook_g = hook_output(m[0]), hook_output(m[0],grad=True)
m.zero_grad()
preds = m(xb)
preds[0,preds.argmax().item()].backward(retain_graph=True)
dls.vocab[preds.argmax().item()]
'Samoyed'
hook.stored[0].shape, hook_g.stored[0].shape
(torch.Size([512, 7, 7]), torch.Size([1, 512, 7, 7]))
acts, grads = hook.stored[0], hook_g.stored[0][0]
alpha = grads.mean((1,2),keepdim=True); alpha.shape
torch.Size([512, 1, 1])
gcam = F.relu((alpha * acts).sum(0)); gcam.shape
torch.Size([7, 7])

Putting it together

def generate_gradcam(model,xb,yb=None,layer_idx:list=[0],with_pred=False):
  """Show Grad-CAM for a given image
    `xb,yb`: input batch 
    `layer_idx`: list of indices to reach target layer
  """
  m = model.eval()
  hook_layer = get_module(m,layer_idx)
  with hook_output(hook_layer,grad=True) as hook_g:
    with hook_output(hook_layer) as hook:      
      m.zero_grad()
      y_pred = m(xb)
      
      if yb is None:
        y = y_pred.argmax().item()
      else: y = yb.item()

      y_pred[0,y].backward(retain_graph=True)
      acts = hook.stored[0]
    grads = hook_g.stored[0]    
    alpha = grads.mean((2,3))
    gcam = F.relu(torch.einsum('ab,bcd->acd',alpha,acts))[0]      
    if with_pred: return gcam,y

show_heatmap[source]

show_heatmap(cam, sz, ax=None, alpha=0.6, interpolation='bilinear', cmap='magma')

@delegates(show_heatmap)
def show_gradcam(dl:DataLoader,xb,yb,gcam,sz=224,merge=True,**kwargs):
  x_dec,y_dec = dl.decode_batch((xb,yb))[0]
  imsize = 5 if merge else 7
  _,axs = subplots(1,1 if merge else 2,figsize=(imsize,imsize))
  show_image(x_dec,ax=axs[0],title=y_dec)
  alpha= 0.6 if merge else 1.
  show_heatmap(gcam,sz=sz,ax=axs[int(not merge)],alpha=alpha,**kwargs)
gcam = generate_gradcam(m,xb,yb)
show_gradcam(valid_dl,xb,yb,gcam)
show_gradcam(valid_dl,xb,yb,gcam,merge=False,interpolation='spline36')

Custom test image

Fastai2 specific steps have been described in the CAM notebook, so borrowing the same code here

url = 'https://t2conline.com/wp-content/uploads/2020/01/shutterstock_1124417876.jpg'
fname = 'test-shih-tzu.jpg'
download_url(url,dest=fname)
img = PILImage.create(fname); img.show(figsize=(5,5));

create_batch[source]

create_batch(dls:DataLoaders, fname, lbl_idx, size=None, method=None)

Create test_batch from filename and label index Refer dls.vocab to find validation index In case you want to Resize an image, use size and method parameters Default method of cropping is set to 'squish' cosidering the use-case

for idx,label in enumerate(dls.vocab):
  print(f'{idx:<2} : {label}')
0  : Australian terrier
1  : Beagle
2  : Border terrier
3  : Dingo
4  : English foxhound
5  : Golden retriever
6  : Old English sheepdog
7  : Rhodesian ridgeback
8  : Samoyed
9  : Shih-Tzu
xb,yb = create_batch(dls,fname,9)
gcam = generate_gradcam(learn.model,xb,yb)
show_gradcam(valid_dl,xb,yb,gcam)
show_gradcam(valid_dl,xb,yb,gcam,merge=False,interpolation='spline36')

The novelty with Grad-CAM is that, we can look at the activations of any layer. Let's first have a brief view of the architecture and then decide which layer to visualize

arch_summary(learn.model,verbose=True)
[0 ] Sequential       : 8   layers
      Conv2d
      BatchNorm2d
      ReLU
      MaxPool2d
      Sequential
      Sequential
      Sequential
      Sequential
[1 ] Sequential       : 9   layers
      AdaptiveConcatPool2d
      Flatten
      BatchNorm1d
      Dropout
      Linear
      ReLU
      BatchNorm1d
      Dropout
      Linear

With m[0], we just looked at the activations of first sequential layer. Let's visualize the 2nd last block of our first sequential layer

gcam = generate_gradcam(learn.model,xb,yb,layer_idx=[0,-2])
show_gradcam(valid_dl,xb,yb,gcam)

Seems like model was not yet sure at this layer, but in the very next layer, it was pretty confident where the subject is! (as shown above)

Let's try one more. This time, "Australian Terrier"

url2 = "http://www.pets4homes.co.uk/images/breeds/197/large/0cdc3b81526ed5c3fa2cdf13b2d1cc36.jpg"
fname2 = 'test-australian-terrier.jpg'
download_url(url2,dest=fname2,overwrite=True)
img_2 = PILImage.create(fname2); img_2.show(figsize=(5,5));
xb,yb = create_batch(dls,fname2,0)
gcam,y_pred = generate_gradcam(learn.model,xb,yb,with_pred=True)
print(f"Prediction: {dls.vocab[y_pred]}")
show_gradcam(valid_dl,xb,yb,gcam)
Prediction: Australian terrier

Interesting! the model was able to classify it correctly, but it considered the features that we normally won't have thought of. Especially, I've seen with many examples, those mane hairs are particularly important in case of "Australian Terrier", also, I'm not an expert in identifying the dog breeds, but these are just my observations

show_gradcam(valid_dl,xb,yb,gcam,merge=False,interpolation='spline36')

GuidedBackprop

Reference implementations:

There certain naming conventions followed for the hooks

  1. Forward Hook →

    • input : current layer's input
    • output : current layer's output
  2. Backward Hook ←

    • grad_in : gradient of the loss wrt layer's input (coming from previous layer)
    • grad_out : gradient of the loss wrt layer's output (will be passed to the next layer)
import gc

class GuidedBackprop[source]

GuidedBackprop(model, act_cls:Module='ReLU')

Produces gradients generated with guided back propagation from the given image

gbviz = GuidedBackprop(learn.model)
gbprop = gbviz.guided_backprop(xb,yb)
show_image(min_max_scale(gbprop),figsize=(5,5));

to_grayscale[source]

to_grayscale(im_tensor)

show_image(to_grayscale(gbprop),figsize=(5,5),cmap='gray')
<matplotlib.axes._subplots.AxesSubplot at 0x7f5e06426320>

Guided backprop is lookting at their faces, but as we observed, there's something more important than the face of dog in case of "Australian Terrier", which Grad-CAM was able to discriminate. Let's fuse both of them

gcam_up = F.interpolate(gcam[None,None],size=(224,224),mode='bilinear',align_corners=True)[0]
guided_gcam = gcam_up * gbprop
show_image(min_max_scale(guided_gcam),figsize=(5,5));

There you go! we got the class discriminative features. Of course, this was just experimental. I'll come up with fine-grained classification examples some other day. Stay tuned!