machine_learning.word_frequency_functions¶
Functions¶
|
Calculate the number of documents in a corpus that contain a |
|
Return an integer denoting the importance |
|
Return the number of times a term occurs within |
|
Combine the term frequency |
Module Contents¶
- machine_learning.word_frequency_functions.document_frequency(term: str, corpus: str) tuple[int, int] ¶
Calculate the number of documents in a corpus that contain a given term @params : term, the term to search each document for, and corpus, a collection of
documents. Each document should be separated by a newline.
- @returnsthe number of documents in the corpus that contain the term you are
searching for and the number of documents in the corpus
@examples : >>> document_frequency(“first”, “This is the first document in the corpus.nThIsis the second document in the corpus.nTHIS is the third document in the corpus.”) (1, 3)
- machine_learning.word_frequency_functions.inverse_document_frequency(df: int, n: int, smoothing=False) float ¶
Return an integer denoting the importance of a word. This measure of importance is calculated by log10(N/df), where N is the number of documents and df is the Document Frequency. @params : df, the Document Frequency, N, the number of documents in the corpus and smoothing, if True return the idf-smooth @returns : log10(N/df) or 1+log10(N/1+df) @examples : >>> inverse_document_frequency(3, 0) Traceback (most recent call last):
…
ValueError: log10(0) is undefined. >>> inverse_document_frequency(1, 3) 0.477 >>> inverse_document_frequency(0, 3) Traceback (most recent call last):
…
ZeroDivisionError: df must be > 0 >>> inverse_document_frequency(0, 3,True) 1.477
- machine_learning.word_frequency_functions.term_frequency(term: str, document: str) int ¶
Return the number of times a term occurs within a given document. @params: term, the term to search a document for, and document,
the document to search within
- @returns: an integer representing the number of times a term is
found within the document
@examples: >>> term_frequency(“to”, “To be, or not to be”) 2
- machine_learning.word_frequency_functions.tf_idf(tf: int, idf: int) float ¶
Combine the term frequency and inverse document frequency functions to calculate the originality of a term. This ‘originality’ is calculated by multiplying the term frequency and the inverse document frequency : tf-idf = TF * IDF @params : tf, the term frequency, and idf, the inverse document frequency @examples : >>> tf_idf(2, 0.477) 0.954