dynamic_programming.smith_waterman

https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm The Smith-Waterman algorithm is a dynamic programming algorithm used for sequence alignment. It is particularly useful for finding similarities between two sequences, such as DNA or protein sequences. In this implementation, gaps are penalized linearly, meaning that the score is reduced by a fixed amount for each gap introduced in the alignment. However, it’s important to note that the Smith-Waterman algorithm supports other gap penalty methods as well.

Attributes

query

Functions

score_function(→ int)

Calculate the score for a character pair based on whether they match or mismatch.

smith_waterman(→ list[list[int]])

Perform the Smith-Waterman local sequence alignment algorithm.

traceback(→ str)

Perform traceback to find the optimal local alignment.

Module Contents

dynamic_programming.smith_waterman.score_function(source_char: str, target_char: str, match: int = 1, mismatch: int = -1, gap: int = -2) int

Calculate the score for a character pair based on whether they match or mismatch. Returns 1 if the characters match, -1 if they mismatch, and -2 if either of the characters is a gap. >>> score_function(‘A’, ‘A’) 1 >>> score_function(‘A’, ‘C’) -1 >>> score_function(‘-’, ‘A’) -2 >>> score_function(‘A’, ‘-‘) -2 >>> score_function(‘-’, ‘-‘) -2

dynamic_programming.smith_waterman.smith_waterman(query: str, subject: str, match: int = 1, mismatch: int = -1, gap: int = -2) list[list[int]]

Perform the Smith-Waterman local sequence alignment algorithm. Returns a 2D list representing the score matrix. Each value in the matrix corresponds to the score of the best local alignment ending at that point. >>> smith_waterman(‘ACAC’, ‘CA’) [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]] >>> smith_waterman(‘acac’, ‘ca’) [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]] >>> smith_waterman(‘ACAC’, ‘ca’) [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]] >>> smith_waterman(‘acac’, ‘CA’) [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]] >>> smith_waterman(‘ACAC’, ‘’) [[0], [0], [0], [0], [0]] >>> smith_waterman(‘’, ‘CA’) [[0, 0, 0]] >>> smith_waterman(‘ACAC’, ‘CA’) [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]]

>>> smith_waterman('acac', 'ca')
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]]
>>> smith_waterman('ACAC', 'ca')
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]]
>>> smith_waterman('acac', 'CA')
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]]
>>> smith_waterman('ACAC', '')
[[0], [0], [0], [0], [0]]
>>> smith_waterman('', 'CA')
[[0, 0, 0]]
>>> smith_waterman('AGT', 'AGT')
[[0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3]]
>>> smith_waterman('AGT', 'GTA')
[[0, 0, 0, 0], [0, 0, 0, 1], [0, 1, 0, 0], [0, 0, 2, 0]]
>>> smith_waterman('AGT', 'GTC')
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0]]
>>> smith_waterman('AGT', 'G')
[[0, 0], [0, 0], [0, 1], [0, 0]]
>>> smith_waterman('G', 'AGT')
[[0, 0, 0, 0], [0, 0, 1, 0]]
>>> smith_waterman('AGT', 'AGTCT')
[[0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 2, 0, 0, 0], [0, 0, 0, 3, 1, 1]]
>>> smith_waterman('AGTCT', 'AGT')
[[0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3], [0, 0, 0, 1], [0, 0, 0, 1]]
>>> smith_waterman('AGTCT', 'GTC')
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 1, 0, 0], [0, 0, 2, 0], [0, 0, 0, 3], [0, 0, 1, 1]]
dynamic_programming.smith_waterman.traceback(score: list[list[int]], query: str, subject: str) str

Perform traceback to find the optimal local alignment. Starts from the highest scoring cell in the matrix and traces back recursively until a 0 score is found. Returns the alignment strings. >>> traceback([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]], ‘ACAC’, ‘CA’) ‘CAnCA’ >>> traceback([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]], ‘acac’, ‘ca’) ‘CAnCA’ >>> traceback([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]], ‘ACAC’, ‘ca’) ‘CAnCA’ >>> traceback([[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 0, 2], [0, 1, 0]], ‘acac’, ‘CA’) ‘CAnCA’ >>> traceback([[0, 0, 0]], ‘ACAC’, ‘’) ‘’

dynamic_programming.smith_waterman.query = 'HEAGAWGHEE'