Fitness Functions

Chi Squared

Chi Squared Scoring function.

lantern.fitness.chisquared.ChiSquared(target_frequency)

Score a text by comparing its frequency distribution against another.

Note

It is easy to be penalised without knowing it when using this scorer. English frequency ngrams are capital letters, meaning when using it any text you score against must be all capitals for it to give correct results. I am aware of the issue and will work on a fix.

Todo

Maybe include paramter for ngram size. Havent had a use case for this yet. Once there is evidence it is needed, I will add it.

Example

>>> fitness = ChiSquared(english.unigrams)
>>> fitness("ABC")
-32.2
Parameters:target_frequency (dict) – symbol to frequency mapping of the distribution to compare with

Corpus

Score plaintext based on number of words identified are in the corpus.

class lantern.fitness.corpus.Corpus(corpus)

Scoring function based on existance of words in a corpus.

Todo

This is fairly broken. I’m not happy with this implementation and will be changing it in the future when I revisit weighted mean scoring

__call__(text)

Score based on number of words not in the corpus.

Example

>>> fitness = Corpus(["example"])
>>> fitness("example")
0
>>> fitness("different")
-2.0
Parameters:text (str) – The text to score
Returns:Corpus score for text
__init__(corpus)

Build function with set of words from a corpus.

Parameters:corpus (collection) – collection of words to use

Ngram

Fitness scoring using ngram frequency.

lantern.fitness.ngram.NgramScorer(frequency_map)

Compute the score of a text by using the frequencies of ngrams.

Example

>>> fitness = NgramScorer(english.unigrams)
>>> fitness("ABC")
-4.3622319742618245
Parameters:frequency_map (dict) – ngram to frequency mapping
lantern.fitness.ngram.english = <lantern.structures.dynamicdict.DynamicDict object>

English ngram scorers.

Pattern Match

Fitness scoring using pattern matching.

lantern.fitness.patternmatch.PatternMatch(regex)

Compute the score of a text by determing if a pattern matches.

Example

>>> fitness = PatternMatch("flag{.*}")
>>> fitness("flag{example}")
0
>>> fitness("junk")
-1
Parameters:regex (str) – regular expression string to use as a pattern