compression.burrows_wheeler

https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform

The Burrows-Wheeler transform (BWT, also called block-sorting compression) rearranges a character string into runs of similar characters. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding. More importantly, the transformation is reversible, without needing to store any additional data except the position of the first original character. The BWT is thus a “free” method of improving the efficiency of text compression algorithms, costing only some extra computation.

Attributes

entry_msg

Classes

BWTTransformDict

dict() -> new empty dictionary

Functions

all_rotations(→ list[str])

bwt_transform(→ BWTTransformDict)

reverse_bwt(→ str)

Module Contents

class compression.burrows_wheeler.BWTTransformDict

Bases: TypedDict

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

bwt_string: str
idx_original_string: int
compression.burrows_wheeler.all_rotations(s: str) list[str]
Parameters:

s – The string that will be rotated len(s) times.

Returns:

A list with the rotations.

Raises:

TypeError – If s is not an instance of str.

Examples:

>>> all_rotations("^BANANA|") 
['^BANANA|', 'BANANA|^', 'ANANA|^B', 'NANA|^BA', 'ANA|^BAN', 'NA|^BANA',
'A|^BANAN', '|^BANANA']
>>> all_rotations("a_asa_da_casa") 
['a_asa_da_casa', '_asa_da_casaa', 'asa_da_casaa_', 'sa_da_casaa_a',
'a_da_casaa_as', '_da_casaa_asa', 'da_casaa_asa_', 'a_casaa_asa_d',
'_casaa_asa_da', 'casaa_asa_da_', 'asaa_asa_da_c', 'saa_asa_da_ca',
'aa_asa_da_cas']
>>> all_rotations("panamabanana") 
['panamabanana', 'anamabananap', 'namabananapa', 'amabananapan',
'mabananapana', 'abananapanam', 'bananapanama', 'ananapanamab',
'nanapanamaba', 'anapanamaban', 'napanamabana', 'apanamabanan']
>>> all_rotations(5)
Traceback (most recent call last):
    ...
TypeError: The parameter s type must be str.
compression.burrows_wheeler.bwt_transform(s: str) BWTTransformDict
Parameters:

s – The string that will be used at bwt algorithm

Returns:

the string composed of the last char of each row of the ordered

rotations and the index of the original string at ordered rotations list :raises TypeError: If the s parameter type is not str :raises ValueError: If the s parameter is empty Examples:

>>> bwt_transform("^BANANA")
{'bwt_string': 'BNN^AAA', 'idx_original_string': 6}
>>> bwt_transform("a_asa_da_casa")
{'bwt_string': 'aaaadss_c__aa', 'idx_original_string': 3}
>>> bwt_transform("panamabanana")
{'bwt_string': 'mnpbnnaaaaaa', 'idx_original_string': 11}
>>> bwt_transform(4)
Traceback (most recent call last):
    ...
TypeError: The parameter s type must be str.
>>> bwt_transform('')
Traceback (most recent call last):
    ...
ValueError: The parameter s must not be empty.
compression.burrows_wheeler.reverse_bwt(bwt_string: str, idx_original_string: int) str
Parameters:
  • bwt_string – The string returned from bwt algorithm execution

  • idx_original_string – A 0-based index of the string that was used to

generate bwt_string at ordered rotations list :return: The string used to generate bwt_string when bwt was executed :raises TypeError: If the bwt_string parameter type is not str :raises ValueError: If the bwt_string parameter is empty :raises TypeError: If the idx_original_string type is not int or if not possible to cast it to int :raises ValueError: If the idx_original_string value is lower than 0 or greater than len(bwt_string) - 1

>>> reverse_bwt("BNN^AAA", 6)
'^BANANA'
>>> reverse_bwt("aaaadss_c__aa", 3)
'a_asa_da_casa'
>>> reverse_bwt("mnpbnnaaaaaa", 11)
'panamabanana'
>>> reverse_bwt(4, 11)
Traceback (most recent call last):
    ...
TypeError: The parameter bwt_string type must be str.
>>> reverse_bwt("", 11)
Traceback (most recent call last):
    ...
ValueError: The parameter bwt_string must not be empty.
>>> reverse_bwt("mnpbnnaaaaaa", "asd") 
Traceback (most recent call last):
    ...
TypeError: The parameter idx_original_string type must be int or passive
of cast to int.
>>> reverse_bwt("mnpbnnaaaaaa", -1)
Traceback (most recent call last):
    ...
ValueError: The parameter idx_original_string must not be lower than 0.
>>> reverse_bwt("mnpbnnaaaaaa", 12) 
Traceback (most recent call last):
    ...
ValueError: The parameter idx_original_string must be lower than
len(bwt_string).
>>> reverse_bwt("mnpbnnaaaaaa", 11.0)
'panamabanana'
>>> reverse_bwt("mnpbnnaaaaaa", 11.4)
'panamabanana'
compression.burrows_wheeler.entry_msg = 'Provide a string that I will generate its BWT transform: '