model specifications not coherent with the MLB paper #18

backpropper · 2017-10-27T00:03:25Z

The model configuration is not the same as described in the paper. There is a softmax layer missing at the end of the model. The paper concatenates the attention * vision features for all the glimpses and then pass it through a single linear layer. You use non-linearity both times before and after fusion.

Cadene · 2017-10-27T17:22:16Z

We use the softmax layer, it is included in our NLL loss.
You are right. It is a bit different from the original MLB implementation. Note that we used this pytorch implementation for the VQA2 challenge and used our private torch7 implementation for the MUTAN paper (which doesn't have this mistake).
Also I just noticed that I forgot to merge our fix (we have one public repo and one private).

# in MLBAtt
       if self.opt['attention']['glimpse_type'] == 'old':
            self.list_linear_v_fusion = nn.ModuleList([
                nn.Linear(self.opt['dim_v'],
                          self.opt['fusion']['dim_h'])
                for i in range(self.opt['attention']['nb_glimpses'])])
        
            self.linear_q_fusion = nn.Linear(self.opt['dim_q'],
                                             self.opt['fusion']['dim_h']
                                             * self.opt['attention']['nb_glimpses'])
            self.linear_classif = nn.Linear(self.opt['fusion']['dim_h']
                                            * self.opt['attention']['nb_glimpses'],
                                            self.num_classes)
        else:
            self.linear_v_fusion = nn.Linear(self.opt['dim_v'] * \
                                             self.opt['attention']['nb_glimpses'],
                                             self.opt['fusion']['dim_h'])            
            self.linear_q_fusion = nn.Linear(self.opt['dim_q'],
                                             self.opt['fusion']['dim_h'])
            self.linear_classif = nn.Linear(self.opt['fusion']['dim_h'],
                                            self.num_classes)

# in MutanAtt
        if self.opt['attention']['glimpse_type'] == 'old':
            self.list_linear_v_fusion = nn.ModuleList([
                nn.Linear(self.opt['dim_v'],
                          int(self.opt['fusion']['dim_hv'] /
                              self.opt['attention']['nb_glimpses']))
                for i in range(self.opt['attention']['nb_glimpses'])])
        else:
            self.linear_v_fusion = nn.Linear(self.opt['dim_v'] * \
                                             self.opt['attention']['nb_glimpses'],
                                             self.opt['fusion']['dim_hv'])

# in def _fusion_glimpses(self, list_v_att, x_q_vec):
        list_v = []
        if self.opt['attention']['glimpse_type'] == 'old':
            for glimpse_id, x_v_att in enumerate(list_v_att):
                x_v = F.dropout(x_v_att,
                                p=self.opt['fusion']['dropout_v'],
                                training=self.training)
                x_v = self.list_linear_v_fusion[glimpse_id](x_v)
                if 'activation_v' in self.opt['fusion']:
                    x_v = getattr(F, self.opt['fusion']['activation_v'])(x_v)
                list_v.append(x_v)
            x_v = torch.cat(list_v, 1)
        else:
            x_v = torch.cat(list_v_att, 1)
            x_v = F.dropout(x_v,
                            p=self.opt['fusion']['dropout_v'],
                            training=self.training)
            x_v = self.linear_v_fusion(x_v)
            if 'activation_v' in self.opt['fusion']:
                x_v = getattr(F, self.opt['fusion']['activation_v'])(x_v)

We can chose if we want to include some non-linearity https://github.com/Cadene/vqa.pytorch/blob/master/vqa/models/fusion.py#L36. Note that in practice the MLB authors use the non linearity.

backpropper · 2017-10-27T19:18:24Z

Oh I thought you used nn.NLLLoss but I see you use nn.CrossEntropy. Thats fine.
So will you merging that private repo?
Yeah I am saying that the MLB authors choose to use the non-linearity either before or after the fusion and not both times.

Cadene · 2017-11-02T10:24:48Z

I will try to do so in the coming month.

Cadene mentioned this issue Jan 5, 2018

Can you tell me what accuracy have you achieved? #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model specifications not coherent with the MLB paper #18

model specifications not coherent with the MLB paper #18

backpropper commented Oct 27, 2017

Cadene commented Oct 27, 2017 •

edited

Loading

backpropper commented Oct 27, 2017

Cadene commented Nov 2, 2017

model specifications not coherent with the MLB paper #18

model specifications not coherent with the MLB paper #18

Comments

backpropper commented Oct 27, 2017

Cadene commented Oct 27, 2017 • edited Loading

backpropper commented Oct 27, 2017

Cadene commented Nov 2, 2017

Cadene commented Oct 27, 2017 •

edited

Loading