Skip to content

Issues with se2seq tutorial (batch training) #2840

Open
@gavril0

Description

@gavril0

Add Link

Link to the tutorial:

https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Describe the bug

The tutorial was markedly changed in June 2023, see commit 6c03bb3 which aimed at fixing the implementation of attention among other things (#2468). In doing so, several other things have been changed:

  • adding dataloader which returns a batch of zero-padded sequences to train the network
  • the foward() function of the Decoder process input one word at the time in parallel for all sentences
    in the batch until MAX_LENGTH is reached.

I am not a torch expert but I think that the embedding layers in the encoder and decoder should have been modified to recognize padding (padding_idx=0 is missing). Using zero-padded sequence as input might also have other implications during learning but I am not sure. Can you confirm that the implementation is correct?

As a result of these change, the text does not describe well the code. I think that it would be nice to include a discussion of zero-padding and the implications of using batches on the code in the tutorial. I am also curious if there is really a gain in using a batch since most sentences are short.

Finally, I found a mention in the text about using teacher_forcing_ratio which is not included in the code. The tutorial or the code need to be adjusted.

If this is useful, I found another implementation of the same tutorial which seems to be a fork from a previous version (it was archived in 2021):

  • It does not does not use batches
  • It includes teacher_forcing_ratio to select the amount of forced teaching
  • It implements both Luong et al and Bahdanau et al. models of attention

Describe your environment

I appreciate this tutorial as it provides a simple introduction to Seq2Seq models with a small dataset. I am actually trying to port this tutorial in R with torch package.

cc @albanD

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugcoreTutorials of any level of difficulty related to the core pytorch functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions