Skip to content

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec #1075

Open
@Liranbz

Description

@Liranbz

Hi,
Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def get_word2vec(self):
        word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
        return word2vec
    
    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.get_word2vec[word]
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

the dimension size of this word2vec is 300 dimensions
Is I need to change other things in my Encoder?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    TextIssues relating to text tutorialsmedium

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions