Analyzer

This module contains two classes:
  • Analyzer: a generic analyzer. It can be fed both from text strings and from files. You can also store a representation of the state of the analyzer to be retrieved later, with the from_file class method or the load method.
  • EnglishAnalyzer: an special analyzer for the English language.
class freqens.analyzer.Analyzer(content=None)

The class that performs the analysis. You can feed an analyzer from different sources (strings, files... ) so that it extracts the target frequency distribution and ask it to score supplied content based on frequency similarity

choose_best(strings, n=1)

Returns the n strings whose frequency distribution is most similar to the one fed to the analyzer.

Parameters:
  • strings – an iterator with the strings where the Analyzer will looked for the best strings.
  • n – an integer specifying the number of strings which will be returned.
Returns:

an iterable containing the n best strings sorted by frequency similarity

discard(chars)

Removes the chars in chars from the counter

Parameters:chars – an interable consisting of the chars whose frequency will be set to 0
feed(content)

Feeds the analyzer with a string

Parameters:content – the string to be fed to the analyzer
feed_from_raw_file(filename)

Feeds the analyzer with the content of a file Every character will be taken into account, including newline chars.

Parameters:filename – the path of the file that will be fed to the analyzer
classmethod from_file(filename)

Reads a frequency distribution from a JSON file as stored by store method

classmethod from_raw_file(filename)

Returns an analyzer whose frequency distribution is read from the file content

keys()

Returns the characters whose frequency is greater than 0

load(filename)

Loads a frequency distribution file and adds it to the current distribution

score(content)

Assigns a score to any string. The smaller, the more similar frequency distribution. 0 means that the frequency distributions of both the content and the analyzer are equal.

Parameters:content – the string to be scored.
Returns:a float number
serialize()

Returns a json representation of the analyzer

Returns:a string containing a json representation of the absolute frequencies the analyzer has been fed with.
store(filename)

Stores the json representation of the analyzer to a file

transform_keys(transformation)

Maps the keys to other new keys to get a new frequency distribution

The relative frequency of keys that map to the same key will be added in order to get the new frequency distribution.

Parameters:transformation – a callable object that maps chars to chars
class freqens.analyzer.EnglishAnalyzer(blank_spaces=True, case_sensitive=True, just_alpha=False)

An analyzer for the english language

freqens.analyzer.counter_distance(counter1, counter2)

Euclidean distance on the frequency distribution space