Fitness Functions¶
Chi Squared¶
Chi Squared Scoring function.
-
lantern.fitness.chisquared.
ChiSquared
(target_frequency)¶ Score a text by comparing its frequency distribution against another.
Note
It is easy to be penalised without knowing it when using this scorer. English frequency ngrams are capital letters, meaning when using it any text you score against must be all capitals for it to give correct results. I am aware of the issue and will work on a fix.
Todo
Maybe include paramter for ngram size. Havent had a use case for this yet. Once there is evidence it is needed, I will add it.
Example
>>> fitness = ChiSquared(english.unigrams) >>> fitness("ABC") -32.2
Parameters: target_frequency (dict) – symbol to frequency mapping of the distribution to compare with
Corpus¶
Score plaintext based on number of words identified are in the corpus.
-
class
lantern.fitness.corpus.
Corpus
(corpus)¶ Scoring function based on existance of words in a corpus.
Todo
This is fairly broken. I’m not happy with this implementation and will be changing it in the future when I revisit weighted mean scoring
-
__call__
(text)¶ Score based on number of words not in the corpus.
Example
>>> fitness = Corpus(["example"]) >>> fitness("example") 0
>>> fitness("different") -2.0
Parameters: text (str) – The text to score Returns: Corpus score for text
-
__init__
(corpus)¶ Build function with set of words from a corpus.
Parameters: corpus (collection) – collection of words to use
-
Ngram¶
Fitness scoring using ngram frequency.
-
lantern.fitness.ngram.
NgramScorer
(frequency_map)¶ Compute the score of a text by using the frequencies of ngrams.
Example
>>> fitness = NgramScorer(english.unigrams) >>> fitness("ABC") -4.3622319742618245
Parameters: frequency_map (dict) – ngram to frequency mapping
-
lantern.fitness.ngram.
english
= <lantern.structures.dynamicdict.DynamicDict object>¶ English ngram scorers.
Pattern Match¶
Fitness scoring using pattern matching.
-
lantern.fitness.patternmatch.
PatternMatch
(regex)¶ Compute the score of a text by determing if a pattern matches.
Example
>>> fitness = PatternMatch("flag{.*}") >>> fitness("flag{example}") 0
>>> fitness("junk") -1
Parameters: regex (str) – regular expression string to use as a pattern