polymerist.genutils.sequences.discernment
Tools for solving the DISCERNMENT (Determination of Index Sequences from Complete Enumeration of Ransom Notes - Multiset Extension with Nonlexical Types) problem
- DISCERNMENT problem definition:
Given a “word” (a sequence of N symbols of type T), and a mapped sequence of “bins” (ordered collection of multisets of type T, each assigned a label of type L), enumerate all N-tuples of labels such that the symbols of the words could be drawn from the bins with those labels in that order
Submodules
Classes
Encapsulation class for solving generalized ransom-note index enumeration problems for arbitrary words, bins, and solving algorithms |
|
Stack-based implementation of generalized ransom note enumeration strategy |
|
Naive implementation of generalized ransom note enumeration strategy based on cartesian products |
|
Recursive implementation of generalized ransom note enumeration strategy via divide-and-conquer approach |
|
Representation of a map from a set of symbols of type T and a set of bin labels of type L to integer counts |
Package Contents
- class polymerist.genutils.sequences.discernment.DISCERNMENTSolver(symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L] | Sequence[Iterable[polymerist.genutils.sequences.discernment.inventory.T]] | Mapping[polymerist.genutils.sequences.discernment.inventory.L, Sequence[Iterable[polymerist.genutils.sequences.discernment.inventory.T]]], strategy: polymerist.genutils.sequences.discernment.strategies.DISCERNMENTStrategy = DISCERNMENTStrategyStack())[source]
Encapsulation class for solving generalized ransom-note index enumeration problems for arbitrary words, bins, and solving algorithms
- strategy
- property symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L]
Cast symbol inventory as copy to avoid mutation during partial traversals (i.e. peek at first item)
- enumerate_choices(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[polymerist.genutils.sequences.discernment.inventory.L, None, None][source]
Enumerate all possible choices using specified solution strategy
- class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyStack[source]
Bases:
DISCERNMENTStrategyStack-based implementation of generalized ransom note enumeration strategy
Pushes all valid symbol-index pairs onto a stack for the current word positions, then advances and does the same until reaching the final symbol, backtracking and restoring symbols afterwards
Fastest of all strategies across benchmark (except Cartesian specifically when ignore_multiplicities=True)
- enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]
Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn
Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing
For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin
- *** If unique_bins=True, will only allow each bin to be sampled from once,
EVEN if that bin contains symbols appearing later in the word
- class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyCartesian[source]
Bases:
DISCERNMENTStrategyNaive implementation of generalized ransom note enumeration strategy based on cartesian products
Generates all possible index sequences ignoring multiplicity, then checks them one-by-one to see if they’re valid
Roughly 1-2 OOM slower (prefactor) than Recursive and Stack unless ignoring multiplicity in which case this is ~1 OOM faster
- enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]
Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn
Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing
For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin
- *** If unique_bins=True, will only allow each bin to be sampled from once,
EVEN if that bin contains symbols appearing later in the word
- class polymerist.genutils.sequences.discernment.DISCERNMENTStrategyRecursive[source]
Bases:
DISCERNMENTStrategyRecursive implementation of generalized ransom note enumeration strategy via divide-and-conquer approach
Breaks sequence down into first and all remaining symbols (“head” and “tail”, respectively) Yields solution as all indices where the head occurs + recursive solutions to the tail sequence enumeration
Easiest to analyze and reason about, but imposes cap on word length due to Python call stack size Performs intermediately in benchmark (faster than Cartesian, but slower than Stack)
- enumerate_choice_labels(word: Sequence[polymerist.genutils.sequences.discernment.inventory.T], symbol_inventory: polymerist.genutils.sequences.discernment.inventory.SymbolInventory[polymerist.genutils.sequences.discernment.inventory.T, polymerist.genutils.sequences.discernment.inventory.L], ignore_multiplicities: bool = False, unique_bins: bool = False, _buffer: tuple[int] = None) Generator[tuple[polymerist.genutils.sequences.discernment.inventory.L, Ellipsis], None, None][source]
Takes a word (N-element sequence of type T) and a symbol inventory (map from symbols and bin labels to counts), Exhaustively enumerates all N-tuples of indices corresponding to ordering of bins from which the word could be drawn
Support modifications to base behavior: *** If ignore_multiplicities=True, will not respect the counts of elements in each bin when drawing
For a given symbol, this allows a bin containing the symbol to appear more times than that symbol is present in that bin
- *** If unique_bins=True, will only allow each bin to be sampled from once,
EVEN if that bin contains symbols appearing later in the word
- class polymerist.genutils.sequences.discernment.SymbolInventory(*args, _number_of_symbols: int | None = None, _number_of_bins: int | None = None, _symbol_index_map: dict[T, int] | None = None, _bin_index_map: dict[L, int] | None = None, **kwargs)[source]
Bases:
dict,Generic[T,L]Representation of a map from a set of symbols of type T and a set of bin labels of type L to integer counts Implemented as a dict, keyed by symbol, whose values count the number of occurrences of that symbol in all bins that symbol occurs
Data structure with specific methods which are useful in solving the generalized ransom note enumeration problem
- classmethod from_bin_sequence_mapping(choice_bin_map: Mapping[L, Sequence[Iterable[T]]], _symbol_index_map: dict[T, int] | None = None, _bin_index_map: dict[L, int] | None = None) SymbolInventory[T, L][source]
Initialize inventory from a mapping of labels to bins
- classmethod from_bin_sequence(choice_bins: Sequence[Iterable[T]]) SymbolInventory[T, int][source]
Initialize inventory from an ordered sequence of bins Special case of mapping instantiation for sequences of bins; simply uses the index of a bin as its label
- classmethod from_bins(choice_bins: Sequence[Iterable[T]] | dict[L, Sequence[Iterable[T]]]) SymbolInventory[T, L][source]
“Smart” initialization method which dispatches to from_bin_sequence() or from_bin_sequence_mapping(), depending on the nature of the “choice_bins” container provided
- property number_of_symbols: int
Number of unique symbols present in the SymbolInventory
- property number_of_bins: int
Number of unique symbols present in the SymbolInventory
- property symbol_index_map: dict[T, int]
Arbitrary one-to-one mapping between all symbols present in the inventory and integer indices
- property bin_index_map: dict[T, int]
Arbitrary one-to-one mapping between all bins present in the inventory and integer indices
- contains_word(sequence: Sequence[T], ignore_multiplicities: bool = False) bool[source]
Check if a word (i.e. sequence of symbols of type T) could possibly be produced from a SymbolInventory
Returns bool; True indicated the word can be made from the inventory False return indicates the sequence contains symbols not present in the inventory OR contains more of one particular symbol than are present in the whole inventory
- deepcopy() SymbolInventory[T, L][source]
Create a deep copy of the current SymbolInventory
- property involution: SymbolInventory[L, T]
Returns a new SymbolInventory which switches the order of precedence of the symbols and bin labels So-called because self.involution.involution returns a SymbolInventory identical to self (by construction)
- property occurence_matrix: list[list[int]] | numpy.ndarray[Any, int]
Creates an NxM matrix (N is the number of symbols, and M is the number of bins) which represents a SymbolInventory Matrix element A_ij denotes the number of occurences of the i-th symbol (according to an arbitrary numbering) in the j-th bin
Will attempt to return matrix as a 2-D numpy array, but will default to a doubly-nested list if numpy is not found to be installed