Skip to content

Sequence to sequence encoder-decoder attention not looking at encoder hidden states #2121

Closed
@hotzjacobb

Description

@hotzjacobb

Hi,

I believe that the code/notebook example of a sequence to sequence model that uses attention should be changed.

Attention (as described in Bahdanau 2014) serves as a mechanism for looking at encoder hidden states (now often referred to as cross-attention when talking about Transformers). Theoretically, its justification is that it allows a variable size representation of information depending on input sequence length (which is variable), so that there is not a representation bottleneck at the output of the final hidden layer for the encoder.

The tutorial/code currently computes attention by using the decoder's hidden state after the first input and the embedding of the decoder's first output. I believe that this is incorrect.

I think it should instead be using the decoder's prior hidden state and the encoder's hidden states so that the model knows what to focus on from the input sequence for the current timestep of the decoder's output.

I can submit a PR to change this in the tutorial if correct; thanks.

cc @pytorch/team-text-core @Nayef211

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions