polymerist.genutils.sequences.discernment

Tools for solving the DISCERNMENT (Determination of Index Sequences from Complete Enumeration of Ransom Notes - Multiset Extension with Nonlexical Types) problem

DISCERNMENT problem definition:

Given a “word” (a sequence of N symbols of type T), and a mapped sequence of “bins” (ordered collection of multisets of type T, each assigned a label of type L), enumerate all N-tuples of labels such that the symbols of the words could be drawn from the bins with those labels in that order

Submodules

Classes

DISCERNMENTSolver

Encapsulation class for solving generalized ransom-note index enumeration problems for arbitrary words, bins, and solving algorithms

DISCERNMENTStrategyStack

Stack-based implementation of generalized ransom note enumeration strategy

DISCERNMENTStrategyCartesian

Naive implementation of generalized ransom note enumeration strategy based on cartesian products

DISCERNMENTStrategyRecursive

Recursive implementation of generalized ransom note enumeration strategy via divide-and-conquer approach

SymbolInventory

Representation of a map from a set of symbols of type T and a set of bin labels of type L to integer counts

Package Contents

class polymerist.genutils.sequences.discernment.DISCERNMENTSolver(symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L] | Sequence[Iterable[polymerist.genutils.sequences.discernment.inventory.T]] | Mapping[polymerist.genutils.sequences.discernment.inventory.L, Sequence[Iterable[polymerist.genutils.sequences.discernment.inventory.T]]], strategy: polymerist.genutils.sequences.discernment.strategies.DISCERNMENTStrategy = DISCERNMENTStrategyStack())[source]

Encapsulation class for solving generalized ransom-note index enumeration problems for arbitrary words, bins, and solving algorithms

strategy
property symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L]

Cast symbol inventory as copy to avoid mutation during partial traversals (i.e. peek at first item)

enumerate_choices(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[polymerist.genutils.sequences.discernment.inventory.L, None, None][source]

Enumerate all possible choices using specified solution strategy

choice_solutions_exist(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], ignore_multiplicities: bool = False, unique_bins: bool = False) bool[source]

Precheck to see if a solution exists without attemting full enumeration

class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyStack[source]

Bases: DISCERNMENTStrategy

Stack-based implementation of generalized ransom note enumeration strategy

Pushes all valid symbol-index pairs onto a stack for the current word positions, then advances and does the same until reaching the final symbol, backtracking and restoring symbols afterwards

Fastest of all strategies across benchmark (except Cartesian specifically when ignore_multiplicities=True)

enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]

Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn

Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing

For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin

*** If unique_bins=True, will only allow each bin to be sampled from once,

EVEN if that bin contains symbols appearing later in the word

class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyCartesian[source]

Bases: DISCERNMENTStrategy

Naive implementation of generalized ransom note enumeration strategy based on cartesian products

Generates all possible index sequences ignoring multiplicity, then checks them one-by-one to see if they’re valid

Roughly 1-2 OOM slower (prefactor) than Recursive and Stack unless ignoring multiplicity in which case this is ~1 OOM faster

enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]

Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn

Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing

For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin

*** If unique_bins=True, will only allow each bin to be sampled from once,

EVEN if that bin contains symbols appearing later in the word

class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyRecursive[source]

Bases: DISCERNMENTStrategy

Recursive implementation of generalized ransom note enumeration strategy via divide-and-conquer approach

Breaks sequence down into first and all remaining symbols (“head” and “tail”, respectively) Yields solution as all indices where the head occurs + recursive solutions to the tail sequence enumeration

Easiest to analyze and reason about, but imposes cap on word length due to Python call stack size Performs intermediately in benchmark (faster than Cartesian, but slower than Stack)

enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False, _buffer: tuple[int] = None) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]

Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn

Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing

For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin

*** If unique_bins=True, will only allow each bin to be sampled from once,

EVEN if that bin contains symbols appearing later in the word

class polymerist.genutils.sequences.discernment.SymbolInventory(*args, _number_of_symbols: int | None = None, _number_of_bins: int | None = None, _symbol_index_map: dict[T, int] | None = None, _bin_index_map: dict[L, int] | None = None, **kwargs)[source]

Bases: dict, Generic[T, L]

Representation of a map from a set of symbols of type T and a set of bin labels of type L to integer counts Implemented as a dict, keyed by symbol, whose values count the number of occurrences of that symbol in all bins that symbol occurs

Data structure with specific methods which are useful in solving the generalized ransom note enumeration problem

classmethod from_bin_sequence_mapping(choice_bin_map: Mapping[L, Sequence[Iterable[T]]], _symbol_index_map: dict[T, int] | None = None, _bin_index_map: dict[L, int] | None = None) SymbolInventory[T, L][source]

Initialize inventory from a mapping of labels to bins

classmethod from_bin_sequence(choice_bins: Sequence[Iterable[T]]) SymbolInventory[T, int][source]

Initialize inventory from an ordered sequence of bins Special case of mapping instantiation for sequences of bins; simply uses the index of a bin as its label

classmethod from_bins(choice_bins: Sequence[Iterable[T]] | dict[L, Sequence[Iterable[T]]]) SymbolInventory[T, L][source]

“Smart” initialization method which dispatches to from_bin_sequence() or from_bin_sequence_mapping(), depending on the nature of the “choice_bins” container provided

property number_of_symbols: int

Number of unique symbols present in the SymbolInventory

property number_of_bins: int

Number of unique symbols present in the SymbolInventory

property symbol_index_map: dict[T, int]

Arbitrary one-to-one mapping between all symbols present in the inventory and integer indices

property bin_index_map: dict[T, int]

Arbitrary one-to-one mapping between all bins present in the inventory and integer indices

contains_word(sequence: Sequence[T], ignore_multiplicities: bool = False) bool[source]

Check if a word (i.e. sequence of symbols of type T) could possibly be produced from a SymbolInventory

Returns bool; True indicated the word can be made from the inventory False return indicates the sequence contains symbols not present in the inventory OR contains more of one particular symbol than are present in the whole inventory

deepcopy() SymbolInventory[T, L][source]

Create a deep copy of the current SymbolInventory

property involution: SymbolInventory[L, T]

Returns a new SymbolInventory which switches the order of precedence of the symbols and bin labels So-called because self.involution.involution returns a SymbolInventory identical to self (by construction)

property occurence_matrix: list[list[int]] | numpy.ndarray[Any, int]

Creates an NxM matrix (N is the number of symbols, and M is the number of bins) which represents a SymbolInventory Matrix element A_ij denotes the number of occurences of the i-th symbol (according to an arbitrary numbering) in the j-th bin

Will attempt to return matrix as a 2-D numpy array, but will default to a doubly-nested list if numpy is not found to be installed