Skip to content

Ch11 Understanding TF-IDF normalization #230

Open
@intelligencethink

Description

@intelligencethink

The explanation of tfidf shown at page326 as below.

def tfidf(term, document, dataset):
term_freq = document.count(term)
doc_freq = math.log(sum(doc.count(term) for doc in dataset) + 1)
return term_freq / doc_freq

Is it right? According to the formula, the total number of documents in the dataset is not shown in doc_freq.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions