strings.top_k_frequent_words

Finds the top K most frequent words from the provided word list.

This implementation aims to show how to solve the problem using the Heap class already present in this repository. Computing order statistics is, in fact, a typical usage of heaps.

This is mostly shown for educational purposes, since the problem can be solved in a few lines using collections.Counter from the Python standard library:

from collections import Counter def top_k_frequent_words(words, k_value):

return [x[0] for x in Counter(words).most_common(k_value)]

Classes

WordCount

Functions

top_k_frequent_words(→ list[str])

Returns the k_value most frequently occurring words,

Module Contents

class strings.top_k_frequent_words.WordCount(word: str, count: int)
__eq__(other: object) bool
>>> WordCount('a', 1).__eq__(WordCount('b', 1))
True
>>> WordCount('a', 1).__eq__(WordCount('a', 1))
True
>>> WordCount('a', 1).__eq__(WordCount('a', 2))
False
>>> WordCount('a', 1).__eq__(WordCount('b', 2))
False
>>> WordCount('a', 1).__eq__(1)
NotImplemented
__lt__(other: object) bool
>>> WordCount('a', 1).__lt__(WordCount('b', 1))
False
>>> WordCount('a', 1).__lt__(WordCount('a', 1))
False
>>> WordCount('a', 1).__lt__(WordCount('a', 2))
True
>>> WordCount('a', 1).__lt__(WordCount('b', 2))
True
>>> WordCount('a', 2).__lt__(WordCount('a', 1))
False
>>> WordCount('a', 2).__lt__(WordCount('b', 1))
False
>>> WordCount('a', 1).__lt__(1)
NotImplemented
count
word
strings.top_k_frequent_words.top_k_frequent_words(words: list[str], k_value: int) list[str]

Returns the k_value most frequently occurring words, in non-increasing order of occurrence. In this context, a word is defined as an element in the provided list.

In case k_value is greater than the number of distinct words, a value of k equal to the number of distinct words will be considered, instead.

>>> top_k_frequent_words(['a', 'b', 'c', 'a', 'c', 'c'], 3)
['c', 'a', 'b']
>>> top_k_frequent_words(['a', 'b', 'c', 'a', 'c', 'c'], 2)
['c', 'a']
>>> top_k_frequent_words(['a', 'b', 'c', 'a', 'c', 'c'], 1)
['c']
>>> top_k_frequent_words(['a', 'b', 'c', 'a', 'c', 'c'], 0)
[]
>>> top_k_frequent_words([], 1)
[]
>>> top_k_frequent_words(['a', 'a'], 2)
['a']