data_structures.hashing.bloom_filter¶

See https://en.wikipedia.org/wiki/Bloom_filter

The use of this data structure is to test membership in a set. Compared to Python’s built-in set() it is more space-efficient. In the following example, only 8 bits of memory will be used: >>> bloom = Bloom(size=8)

Initially, the filter contains all zeros: >>> bloom.bitstring ‘00000000’

When an element is added, two bits are set to 1 since there are 2 hash functions in this implementation: >>> “Titanic” in bloom False >>> bloom.add(“Titanic”) >>> bloom.bitstring ‘01100000’ >>> “Titanic” in bloom True

However, sometimes only one bit is added because both hash functions return the same value >>> bloom.add(“Avatar”) >>> “Avatar” in bloom True >>> bloom.format_hash(“Avatar”) ‘00000100’ >>> bloom.bitstring ‘01100100’

Not added elements should return False … >>> not_present_films = (“The Godfather”, “Interstellar”, “Parasite”, “Pulp Fiction”) >>> { … film: bloom.format_hash(film) for film in not_present_films … } # doctest: +NORMALIZE_WHITESPACE {‘The Godfather’: ‘00000101’,

‘Interstellar’: ‘00000011’, ‘Parasite’: ‘00010010’, ‘Pulp Fiction’: ‘10000100’}

>>> any(film in bloom for film in not_present_films)
False

but sometimes there are false positives: >>> “Ratatouille” in bloom True >>> bloom.format_hash(“Ratatouille”) ‘01100000’

The probability increases with the number of elements added. The probability decreases with the number of bits in the bitarray. >>> bloom.estimated_error_rate 0.140625 >>> bloom.add(“The Godfather”) >>> bloom.estimated_error_rate 0.25 >>> bloom.bitstring ‘01100101’

Attributes¶

HASH_FUNCTIONS

Classes¶

Bloom

Module Contents¶

class data_structures.hashing.bloom_filter.Bloom(size: int = 8)¶

__contains__(other: str) → bool¶

add(value: str) → None¶

exists(value: str) → bool¶

format_bin(bitarray: int) → str¶

format_hash(value: str) → str¶

hash_(value: str) → int¶

bitarray = 0¶

property bitstring: str¶

property estimated_error_rate: float¶

size = 8¶

data_structures.hashing.bloom_filter.HASH_FUNCTIONS¶

data_structures.hashing.bloom_filter¶

Attributes¶

Classes¶

Module Contents¶

thealgorithms-python

Navigation

Related Topics