polymerist.genutils.sequences.similarity.edits

For calculating the edit distance between sequences and inspecting the edits needed to go between them

Attributes

T

Classes

EditOperation

For annotating distinct kinds of sequence edits and their associated index offsets

EditInfo

for bundling together information about a sequence edit step

Functions

compute_wf_matrix(...)

Compute (N+1)x(M+1) matrix of Levenshtein distances between all partial prefices of a pair of sequences

traverse_wf_matrix(, end_idxs, ...)

Takes a Wagner-Fischer Levenshtein distance matrix and returns the indices of the minimal path through the matrix

describe_edits(→ Generator[str, None, None])

Describes step-by-step the insertion, deletion, or substitution operations needed to transform one sequence into another

levenshtein_distance(→ int)

Compute the Levenshtein (edit) distance between a pair of sequences with elements of compatible type

Module Contents

polymerist.genutils.sequences.similarity.edits.T
class polymerist.genutils.sequences.similarity.edits.EditOperation(*args, **kwds)[source]

Bases: enum.Enum

For annotating distinct kinds of sequence edits and their associated index offsets

NULL = 0
INSERTION = 1
DELETION = 2
SUBSTITUTION = 3
property bits: tuple[int, int]

Convert the integer value of the Enum field into its binary bits

offsets
class polymerist.genutils.sequences.similarity.edits.EditInfo[source]

for bundling together information about a sequence edit step

edit_op: EditOperation
indices: tuple[int, int]
distance: int
polymerist.genutils.sequences.similarity.edits.compute_wf_matrix(seq1: Sequence[T], seq2: Sequence[T], int_type: Type = int) numpy.ndarray[polymerist.genutils.typetools.numpytypes.Shape[polymerist.genutils.typetools.numpytypes.N, polymerist.genutils.typetools.numpytypes.M], int][source]

Compute (N+1)x(M+1) matrix of Levenshtein distances between all partial prefices of a pair of sequences where N and M are the lengths of the first and second sequence, respectively. Implements the Wagner-Fischer algorithm

polymerist.genutils.sequences.similarity.edits.traverse_wf_matrix(wf_matrix: numpy.ndarray[polymerist.genutils.typetools.numpytypes.Shape[polymerist.genutils.typetools.numpytypes.N, polymerist.genutils.typetools.numpytypes.M], int], begin_idxs: tuple[int, int] = (0, 0), end_idxs: tuple[int, int] = (-1, -1)) Generator[list[EditInfo], None, None][source]

Takes a Wagner-Fischer Levenshtein distance matrix and returns the indices of the minimal path through the matrix from the origin (i.e. empty sequences) to the

polymerist.genutils.sequences.similarity.edits.describe_edits(seq1: Sequence[T], seq2: Sequence[T], int_type: Type = int, indicator: str = ' -> ', delimiter: str = '\n') Generator[str, None, None][source]

Describes step-by-step the insertion, deletion, or substitution operations needed to transform one sequence into another

polymerist.genutils.sequences.similarity.edits.levenshtein_distance(seq1: Sequence[T], seq2: Sequence[T], int_type: Type = int) int[source]

Compute the Levenshtein (edit) distance between a pair of sequences with elements of compatible type Denotes the minimal number of insertion, deletion, or substitution operations needed to transform either sequence into the other